Training: 2022-04-11 11:31:24,318-rank_id: 0 Training: 2022-04-11 11:31:51,754-: margin_list [1.0, 0.0, 0.4] Training: 2022-04-11 11:31:51,755-: network mbf Training: 2022-04-11 11:31:51,755-: resume False Training: 2022-04-11 11:31:51,755-: output work_dirs/glint360k_mbf Training: 2022-04-11 11:31:51,755-: embedding_size 512 Training: 2022-04-11 11:31:51,755-: sample_rate 1.0 Training: 2022-04-11 11:31:51,755-: interclass_filtering_threshold0 Training: 2022-04-11 11:31:51,755-: fp16 True Training: 2022-04-11 11:31:51,755-: batch_size 128 Training: 2022-04-11 11:31:51,756-: optimizer sgd Training: 2022-04-11 11:31:51,756-: lr 0.1 Training: 2022-04-11 11:31:51,756-: momentum 0.9 Training: 2022-04-11 11:31:51,756-: weight_decay 0.0001 Training: 2022-04-11 11:31:51,756-: verbose 2000 Training: 2022-04-11 11:31:51,756-: frequent 10 Training: 2022-04-11 11:31:51,756-: dali False Training: 2022-04-11 11:31:51,756-: rec /train_tmp/glint360k Training: 2022-04-11 11:31:51,756-: num_classes 360232 Training: 2022-04-11 11:31:51,756-: num_image 17091657 Training: 2022-04-11 11:31:51,756-: num_epoch 20 Training: 2022-04-11 11:31:51,756-: warmup_epoch 0 Training: 2022-04-11 11:31:51,756-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-04-11 11:31:51,756-: total_batch_size 1024 Training: 2022-04-11 11:31:51,756-: warmup_step 0 Training: 2022-04-11 11:31:51,756-: total_step 333820 Training: 2022-04-11 11:33:15,630-Reducer buckets have been rebuilt in this iteration. Training: 2022-04-11 11:33:17,713-Speed 8787.19 samples/sec Loss 42.3996 LearningRate 0.1000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 4096 Required: 122 hours Training: 2022-04-11 11:33:18,859-Speed 8943.89 samples/sec Loss 42.5296 LearningRate 0.1000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 4096 Required: 86 hours Training: 2022-04-11 11:33:19,964-Speed 9273.25 samples/sec Loss 42.6236 LearningRate 0.1000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 4096 Required: 67 hours Training: 2022-04-11 11:33:21,087-Speed 9116.49 samples/sec Loss 42.7848 LearningRate 0.1000 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 4096 Required: 56 hours Training: 2022-04-11 11:33:22,208-Speed 9140.96 samples/sec Loss 42.9586 LearningRate 0.1000 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 4096 Required: 49 hours Training: 2022-04-11 11:33:23,363-Speed 8880.50 samples/sec Loss 43.0868 LearningRate 0.1000 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 4096 Required: 43 hours Training: 2022-04-11 11:33:24,474-Speed 9219.27 samples/sec Loss 43.2540 LearningRate 0.1000 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 4096 Required: 39 hours Training: 2022-04-11 11:33:25,605-Speed 9063.97 samples/sec Loss 43.1604 LearningRate 0.0999 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 4096 Required: 36 hours Training: 2022-04-11 11:33:26,769-Speed 8797.35 samples/sec Loss 43.0091 LearningRate 0.0999 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-04-11 11:33:27,892-Speed 9126.50 samples/sec Loss 43.0253 LearningRate 0.0999 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-11 11:33:28,960-Speed 9602.65 samples/sec Loss 43.3264 LearningRate 0.0999 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-11 11:33:30,198-Speed 8271.76 samples/sec Loss 43.0532 LearningRate 0.0999 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-11 11:33:31,227-Speed 9958.09 samples/sec Loss 43.3245 LearningRate 0.0999 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-11 11:33:32,344-Speed 9172.97 samples/sec Loss 43.0899 LearningRate 0.0999 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-11 11:33:33,489-Speed 8951.67 samples/sec Loss 43.3627 LearningRate 0.0999 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-11 11:33:34,629-Speed 8991.83 samples/sec Loss 42.9427 LearningRate 0.0999 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-11 11:33:35,735-Speed 9262.29 samples/sec Loss 42.8606 LearningRate 0.0999 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-11 11:33:36,817-Speed 9473.41 samples/sec Loss 42.9219 LearningRate 0.0999 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-11 11:33:37,905-Speed 9412.27 samples/sec Loss 43.0300 LearningRate 0.0999 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-11 11:33:39,044-Speed 8997.29 samples/sec Loss 42.8388 LearningRate 0.0999 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-11 11:33:40,127-Speed 9460.20 samples/sec Loss 42.7754 LearningRate 0.0999 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-11 11:33:41,237-Speed 9234.45 samples/sec Loss 42.6151 LearningRate 0.0999 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-11 11:33:42,329-Speed 9383.76 samples/sec Loss 42.6262 LearningRate 0.0999 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-11 11:33:43,419-Speed 9399.88 samples/sec Loss 42.5744 LearningRate 0.0999 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-11 11:33:44,472-Speed 9729.27 samples/sec Loss 42.6049 LearningRate 0.0998 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-11 11:33:45,529-Speed 9699.90 samples/sec Loss 42.3859 LearningRate 0.0998 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-11 11:33:46,603-Speed 9540.58 samples/sec Loss 42.2667 LearningRate 0.0998 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-11 11:33:47,693-Speed 9399.71 samples/sec Loss 42.2248 LearningRate 0.0998 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-11 11:33:48,804-Speed 9221.25 samples/sec Loss 42.0976 LearningRate 0.0998 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-11 11:33:49,874-Speed 9575.45 samples/sec Loss 42.0218 LearningRate 0.0998 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 11:33:50,978-Speed 9282.55 samples/sec Loss 41.9394 LearningRate 0.0998 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 11:33:52,084-Speed 9264.44 samples/sec Loss 41.9499 LearningRate 0.0998 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 11:33:53,176-Speed 9379.86 samples/sec Loss 41.8290 LearningRate 0.0998 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 11:33:54,248-Speed 9558.24 samples/sec Loss 41.8004 LearningRate 0.0998 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 11:33:55,351-Speed 9293.91 samples/sec Loss 41.7503 LearningRate 0.0998 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 11:33:56,449-Speed 9332.67 samples/sec Loss 41.6263 LearningRate 0.0998 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 11:33:57,561-Speed 9208.24 samples/sec Loss 41.5018 LearningRate 0.0998 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 11:33:58,621-Speed 9671.25 samples/sec Loss 41.5668 LearningRate 0.0998 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 11:33:59,684-Speed 9632.53 samples/sec Loss 41.4467 LearningRate 0.0998 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 11:34:00,803-Speed 9169.17 samples/sec Loss 41.4013 LearningRate 0.0998 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 11:34:01,924-Speed 9140.94 samples/sec Loss 41.2523 LearningRate 0.0997 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 11:34:03,026-Speed 9302.61 samples/sec Loss 41.3443 LearningRate 0.0997 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 11:34:04,096-Speed 9569.18 samples/sec Loss 41.3764 LearningRate 0.0997 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 11:34:05,185-Speed 9415.58 samples/sec Loss 41.2882 LearningRate 0.0997 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 11:34:06,279-Speed 9366.75 samples/sec Loss 41.2735 LearningRate 0.0997 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 11:34:07,447-Speed 8768.55 samples/sec Loss 41.1040 LearningRate 0.0997 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 11:34:08,568-Speed 9145.66 samples/sec Loss 41.0232 LearningRate 0.0997 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 11:34:09,637-Speed 9584.77 samples/sec Loss 41.0009 LearningRate 0.0997 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 11:34:10,700-Speed 9646.95 samples/sec Loss 40.8902 LearningRate 0.0997 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 11:34:11,786-Speed 9431.41 samples/sec Loss 40.7500 LearningRate 0.0997 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 11:34:12,894-Speed 9245.96 samples/sec Loss 40.7766 LearningRate 0.0997 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 11:34:13,937-Speed 9829.37 samples/sec Loss 40.7121 LearningRate 0.0997 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 11:34:15,048-Speed 9226.87 samples/sec Loss 40.6090 LearningRate 0.0997 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 11:34:16,108-Speed 9666.96 samples/sec Loss 40.5855 LearningRate 0.0997 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:34:17,171-Speed 9635.27 samples/sec Loss 40.5653 LearningRate 0.0997 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:34:18,268-Speed 9343.05 samples/sec Loss 40.4747 LearningRate 0.0997 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:34:19,391-Speed 9121.43 samples/sec Loss 40.3522 LearningRate 0.0997 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:34:20,456-Speed 9622.83 samples/sec Loss 40.3887 LearningRate 0.0996 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:34:21,592-Speed 9019.80 samples/sec Loss 40.2842 LearningRate 0.0996 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:34:22,695-Speed 9287.78 samples/sec Loss 40.1904 LearningRate 0.0996 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:34:23,840-Speed 8948.52 samples/sec Loss 40.0410 LearningRate 0.0996 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:34:24,917-Speed 9523.70 samples/sec Loss 40.1066 LearningRate 0.0996 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:34:25,971-Speed 9722.78 samples/sec Loss 40.0238 LearningRate 0.0996 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:34:27,052-Speed 9478.21 samples/sec Loss 39.9026 LearningRate 0.0996 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:34:28,109-Speed 9691.78 samples/sec Loss 40.0262 LearningRate 0.0996 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:34:29,179-Speed 9576.41 samples/sec Loss 39.8761 LearningRate 0.0996 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:34:30,265-Speed 9438.76 samples/sec Loss 39.7657 LearningRate 0.0996 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:34:31,317-Speed 9738.25 samples/sec Loss 39.7470 LearningRate 0.0996 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:34:32,374-Speed 9695.30 samples/sec Loss 39.5880 LearningRate 0.0996 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:33,431-Speed 9693.54 samples/sec Loss 39.6011 LearningRate 0.0996 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:34,515-Speed 9445.64 samples/sec Loss 39.4537 LearningRate 0.0996 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:35,622-Speed 9260.37 samples/sec Loss 39.4143 LearningRate 0.0996 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:36,753-Speed 9062.49 samples/sec Loss 39.4386 LearningRate 0.0996 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:37,852-Speed 9315.90 samples/sec Loss 39.2995 LearningRate 0.0996 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:38,898-Speed 9797.14 samples/sec Loss 39.2248 LearningRate 0.0995 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:39,952-Speed 9726.92 samples/sec Loss 39.2457 LearningRate 0.0995 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:40,982-Speed 9951.72 samples/sec Loss 39.1575 LearningRate 0.0995 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:42,015-Speed 9916.22 samples/sec Loss 39.0314 LearningRate 0.0995 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:43,081-Speed 9615.79 samples/sec Loss 39.0970 LearningRate 0.0995 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:44,124-Speed 9821.94 samples/sec Loss 39.0166 LearningRate 0.0995 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:45,174-Speed 9758.53 samples/sec Loss 39.0490 LearningRate 0.0995 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:46,240-Speed 9607.50 samples/sec Loss 38.7588 LearningRate 0.0995 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:47,344-Speed 9281.40 samples/sec Loss 38.7828 LearningRate 0.0995 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:48,442-Speed 9330.52 samples/sec Loss 38.7587 LearningRate 0.0995 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:49,530-Speed 9417.79 samples/sec Loss 38.7793 LearningRate 0.0995 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:50,629-Speed 9324.35 samples/sec Loss 38.5799 LearningRate 0.0995 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:51,699-Speed 9573.77 samples/sec Loss 38.5452 LearningRate 0.0995 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:52,814-Speed 9192.81 samples/sec Loss 38.5473 LearningRate 0.0995 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:53,899-Speed 9441.89 samples/sec Loss 38.3906 LearningRate 0.0995 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:54,939-Speed 9855.11 samples/sec Loss 38.3346 LearningRate 0.0995 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:56,064-Speed 9109.98 samples/sec Loss 38.2635 LearningRate 0.0994 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:57,209-Speed 8942.94 samples/sec Loss 38.2549 LearningRate 0.0994 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:58,337-Speed 9084.14 samples/sec Loss 38.1453 LearningRate 0.0994 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:34:59,425-Speed 9422.30 samples/sec Loss 38.1507 LearningRate 0.0994 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:35:00,526-Speed 9311.86 samples/sec Loss 37.9890 LearningRate 0.0994 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:35:01,615-Speed 9404.35 samples/sec Loss 38.0052 LearningRate 0.0994 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:35:02,660-Speed 9804.46 samples/sec Loss 37.9400 LearningRate 0.0994 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:03,742-Speed 9465.14 samples/sec Loss 37.8194 LearningRate 0.0994 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:04,846-Speed 9286.35 samples/sec Loss 37.7295 LearningRate 0.0994 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:05,934-Speed 9418.60 samples/sec Loss 37.6577 LearningRate 0.0994 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 524288 Required: 12 hours Training: 2022-04-11 11:35:07,015-Speed 9479.56 samples/sec Loss 37.6427 LearningRate 0.0994 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:08,098-Speed 9457.02 samples/sec Loss 37.7354 LearningRate 0.0994 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:09,183-Speed 9445.79 samples/sec Loss 37.5299 LearningRate 0.0994 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:10,303-Speed 9151.65 samples/sec Loss 37.5562 LearningRate 0.0994 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:11,393-Speed 9398.64 samples/sec Loss 37.3747 LearningRate 0.0994 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:12,477-Speed 9450.83 samples/sec Loss 37.3130 LearningRate 0.0994 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:13,575-Speed 9327.09 samples/sec Loss 37.2564 LearningRate 0.0994 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:14,714-Speed 9001.19 samples/sec Loss 37.3192 LearningRate 0.0993 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:15,820-Speed 9267.50 samples/sec Loss 37.1651 LearningRate 0.0993 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:16,883-Speed 9636.97 samples/sec Loss 37.0751 LearningRate 0.0993 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:17,947-Speed 9626.88 samples/sec Loss 36.9957 LearningRate 0.0993 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:19,076-Speed 9080.30 samples/sec Loss 37.0190 LearningRate 0.0993 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:20,184-Speed 9251.19 samples/sec Loss 36.9417 LearningRate 0.0993 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:21,226-Speed 9830.29 samples/sec Loss 36.8264 LearningRate 0.0993 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:22,308-Speed 9471.05 samples/sec Loss 36.8428 LearningRate 0.0993 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:23,433-Speed 9103.34 samples/sec Loss 36.6717 LearningRate 0.0993 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:24,508-Speed 9530.40 samples/sec Loss 36.7140 LearningRate 0.0993 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:25,580-Speed 9559.44 samples/sec Loss 36.6191 LearningRate 0.0993 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:26,626-Speed 9798.49 samples/sec Loss 36.5196 LearningRate 0.0993 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:27,730-Speed 9281.89 samples/sec Loss 36.3867 LearningRate 0.0993 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:28,815-Speed 9442.52 samples/sec Loss 36.5123 LearningRate 0.0993 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:29,888-Speed 9545.75 samples/sec Loss 36.3375 LearningRate 0.0993 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:30,972-Speed 9457.84 samples/sec Loss 36.3926 LearningRate 0.0993 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:32,039-Speed 9596.86 samples/sec Loss 36.2685 LearningRate 0.0993 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:33,129-Speed 9401.41 samples/sec Loss 36.1549 LearningRate 0.0992 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:34,200-Speed 9567.13 samples/sec Loss 36.1515 LearningRate 0.0992 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:35,265-Speed 9623.66 samples/sec Loss 36.0292 LearningRate 0.0992 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:36,345-Speed 9486.40 samples/sec Loss 35.9985 LearningRate 0.0992 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:37,450-Speed 9273.37 samples/sec Loss 35.9019 LearningRate 0.0992 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:38,506-Speed 9707.68 samples/sec Loss 35.8413 LearningRate 0.0992 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:39,548-Speed 9837.54 samples/sec Loss 35.7904 LearningRate 0.0992 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:40,593-Speed 9805.59 samples/sec Loss 35.7528 LearningRate 0.0992 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:41,672-Speed 9491.06 samples/sec Loss 35.7582 LearningRate 0.0992 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:42,773-Speed 9310.69 samples/sec Loss 35.5771 LearningRate 0.0992 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:43,862-Speed 9410.46 samples/sec Loss 35.4164 LearningRate 0.0992 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:44,955-Speed 9374.11 samples/sec Loss 35.4513 LearningRate 0.0992 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:46,031-Speed 9515.81 samples/sec Loss 35.4516 LearningRate 0.0992 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:47,127-Speed 9346.49 samples/sec Loss 35.2492 LearningRate 0.0992 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:48,227-Speed 9314.60 samples/sec Loss 35.4169 LearningRate 0.0992 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:35:49,247-Speed 10050.07 samples/sec Loss 35.3666 LearningRate 0.0992 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:50,367-Speed 9150.26 samples/sec Loss 35.2298 LearningRate 0.0992 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:51,412-Speed 9802.42 samples/sec Loss 35.1054 LearningRate 0.0991 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:52,452-Speed 9852.02 samples/sec Loss 34.9619 LearningRate 0.0991 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:53,558-Speed 9260.97 samples/sec Loss 35.0683 LearningRate 0.0991 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:54,665-Speed 9255.69 samples/sec Loss 34.9896 LearningRate 0.0991 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:55,788-Speed 9124.10 samples/sec Loss 34.5901 LearningRate 0.0991 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:56,883-Speed 9360.61 samples/sec Loss 34.7220 LearningRate 0.0991 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:57,979-Speed 9346.61 samples/sec Loss 34.6491 LearningRate 0.0991 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:35:59,098-Speed 9155.53 samples/sec Loss 34.7042 LearningRate 0.0991 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:36:00,166-Speed 9594.95 samples/sec Loss 34.6305 LearningRate 0.0991 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:01,269-Speed 9295.83 samples/sec Loss 34.5935 LearningRate 0.0991 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:02,380-Speed 9220.66 samples/sec Loss 34.4681 LearningRate 0.0991 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:03,437-Speed 9693.33 samples/sec Loss 34.4340 LearningRate 0.0991 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:04,492-Speed 9703.19 samples/sec Loss 34.3773 LearningRate 0.0991 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:05,574-Speed 9473.03 samples/sec Loss 34.3005 LearningRate 0.0991 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:06,644-Speed 9575.16 samples/sec Loss 34.2316 LearningRate 0.0991 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:07,737-Speed 9376.85 samples/sec Loss 34.1170 LearningRate 0.0991 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:08,824-Speed 9424.79 samples/sec Loss 34.1382 LearningRate 0.0990 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:09,926-Speed 9303.70 samples/sec Loss 34.0386 LearningRate 0.0990 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:11,012-Speed 9432.02 samples/sec Loss 33.9903 LearningRate 0.0990 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:36:12,109-Speed 9343.99 samples/sec Loss 33.7938 LearningRate 0.0990 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:13,181-Speed 9557.72 samples/sec Loss 33.8424 LearningRate 0.0990 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:14,273-Speed 9376.10 samples/sec Loss 33.6816 LearningRate 0.0990 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:15,344-Speed 9567.98 samples/sec Loss 33.7177 LearningRate 0.0990 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:16,455-Speed 9221.22 samples/sec Loss 33.6289 LearningRate 0.0990 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:17,545-Speed 9403.30 samples/sec Loss 33.5413 LearningRate 0.0990 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:18,601-Speed 9703.03 samples/sec Loss 33.4596 LearningRate 0.0990 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:19,717-Speed 9182.20 samples/sec Loss 33.3738 LearningRate 0.0990 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:20,821-Speed 9283.74 samples/sec Loss 33.4612 LearningRate 0.0990 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:21,921-Speed 9313.25 samples/sec Loss 33.3738 LearningRate 0.0990 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:23,002-Speed 9485.04 samples/sec Loss 33.4399 LearningRate 0.0990 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:24,062-Speed 9660.11 samples/sec Loss 33.2144 LearningRate 0.0990 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:25,133-Speed 9564.03 samples/sec Loss 33.1580 LearningRate 0.0990 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:26,234-Speed 9305.92 samples/sec Loss 32.9919 LearningRate 0.0990 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:27,350-Speed 9187.71 samples/sec Loss 32.9931 LearningRate 0.0989 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:28,452-Speed 9298.98 samples/sec Loss 33.0209 LearningRate 0.0989 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:29,565-Speed 9200.66 samples/sec Loss 32.7947 LearningRate 0.0989 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:30,663-Speed 9336.27 samples/sec Loss 32.8844 LearningRate 0.0989 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:31,760-Speed 9337.08 samples/sec Loss 32.7610 LearningRate 0.0989 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:32,878-Speed 9166.64 samples/sec Loss 32.7524 LearningRate 0.0989 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 524288 Required: 11 hours Training: 2022-04-11 11:36:33,996-Speed 9169.01 samples/sec Loss 32.5941 LearningRate 0.0989 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:35,073-Speed 9507.41 samples/sec Loss 32.6007 LearningRate 0.0989 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:36,129-Speed 9700.83 samples/sec Loss 32.5418 LearningRate 0.0989 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:37,155-Speed 9992.45 samples/sec Loss 32.6607 LearningRate 0.0989 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:36:38,207-Speed 9734.39 samples/sec Loss 32.3708 LearningRate 0.0989 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:36:39,257-Speed 9760.87 samples/sec Loss 32.3397 LearningRate 0.0989 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:36:40,335-Speed 9507.46 samples/sec Loss 32.3313 LearningRate 0.0989 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:36:41,429-Speed 9368.00 samples/sec Loss 32.3307 LearningRate 0.0989 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:36:42,526-Speed 9337.89 samples/sec Loss 32.2800 LearningRate 0.0989 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:36:43,587-Speed 9657.83 samples/sec Loss 32.1815 LearningRate 0.0989 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:36:44,663-Speed 9520.41 samples/sec Loss 32.1612 LearningRate 0.0989 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:36:45,759-Speed 9355.43 samples/sec Loss 31.9223 LearningRate 0.0988 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:36:46,843-Speed 9450.90 samples/sec Loss 31.9629 LearningRate 0.0988 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:36:47,940-Speed 9339.64 samples/sec Loss 31.9720 LearningRate 0.0988 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:49,028-Speed 9418.06 samples/sec Loss 31.8727 LearningRate 0.0988 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:50,119-Speed 9395.04 samples/sec Loss 31.7402 LearningRate 0.0988 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:51,171-Speed 9736.71 samples/sec Loss 31.6748 LearningRate 0.0988 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:52,272-Speed 9304.66 samples/sec Loss 31.6774 LearningRate 0.0988 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:36:53,338-Speed 9615.02 samples/sec Loss 31.8078 LearningRate 0.0988 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 11:37:15,243-[lfw][2000]XNorm: 21.302492 Training: 2022-04-11 11:37:15,244-[lfw][2000]Accuracy-Flip: 0.94083+-0.01044 Training: 2022-04-11 11:37:15,244-[lfw][2000]Accuracy-Highest: 0.94083 Training: 2022-04-11 11:37:40,497-[cfp_fp][2000]XNorm: 20.083306 Training: 2022-04-11 11:37:40,497-[cfp_fp][2000]Accuracy-Flip: 0.73143+-0.01352 Training: 2022-04-11 11:37:40,498-[cfp_fp][2000]Accuracy-Highest: 0.73143 Training: 2022-04-11 11:38:02,464-[agedb_30][2000]XNorm: 19.667832 Training: 2022-04-11 11:38:02,465-[agedb_30][2000]Accuracy-Flip: 0.74450+-0.02003 Training: 2022-04-11 11:38:02,465-[agedb_30][2000]Accuracy-Highest: 0.74450 Training: 2022-04-11 11:38:03,542-Speed 145.86 samples/sec Loss 31.6493 LearningRate 0.0988 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:04,633-Speed 9396.38 samples/sec Loss 31.4240 LearningRate 0.0988 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:05,731-Speed 9329.03 samples/sec Loss 31.3945 LearningRate 0.0988 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:06,783-Speed 9738.87 samples/sec Loss 31.4165 LearningRate 0.0988 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:07,830-Speed 9783.14 samples/sec Loss 31.4865 LearningRate 0.0988 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:08,925-Speed 9361.46 samples/sec Loss 31.1778 LearningRate 0.0988 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:09,988-Speed 9637.98 samples/sec Loss 31.2587 LearningRate 0.0988 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:11,085-Speed 9344.99 samples/sec Loss 31.2665 LearningRate 0.0988 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:12,163-Speed 9504.96 samples/sec Loss 31.0855 LearningRate 0.0988 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:13,282-Speed 9160.56 samples/sec Loss 30.9602 LearningRate 0.0987 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:14,367-Speed 9438.18 samples/sec Loss 31.0281 LearningRate 0.0987 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:15,432-Speed 9620.37 samples/sec Loss 30.9674 LearningRate 0.0987 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:16,530-Speed 9337.18 samples/sec Loss 30.9391 LearningRate 0.0987 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:17,644-Speed 9199.14 samples/sec Loss 30.8284 LearningRate 0.0987 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:18,751-Speed 9252.19 samples/sec Loss 30.8331 LearningRate 0.0987 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:19,840-Speed 9402.93 samples/sec Loss 30.8226 LearningRate 0.0987 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:20,911-Speed 9573.57 samples/sec Loss 30.5586 LearningRate 0.0987 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:22,007-Speed 9350.71 samples/sec Loss 30.5003 LearningRate 0.0987 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:23,090-Speed 9464.59 samples/sec Loss 30.7346 LearningRate 0.0987 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:24,216-Speed 9098.24 samples/sec Loss 30.3343 LearningRate 0.0987 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:25,286-Speed 9573.57 samples/sec Loss 30.4900 LearningRate 0.0987 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:26,388-Speed 9299.88 samples/sec Loss 30.5228 LearningRate 0.0987 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:27,465-Speed 9509.59 samples/sec Loss 30.2343 LearningRate 0.0987 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:28,563-Speed 9331.44 samples/sec Loss 30.1264 LearningRate 0.0987 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:29,633-Speed 9579.75 samples/sec Loss 30.1347 LearningRate 0.0987 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:30,713-Speed 9481.36 samples/sec Loss 30.0713 LearningRate 0.0987 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:31,771-Speed 9689.52 samples/sec Loss 30.0619 LearningRate 0.0986 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:32,816-Speed 9802.15 samples/sec Loss 30.1006 LearningRate 0.0986 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:33,895-Speed 9496.54 samples/sec Loss 29.8213 LearningRate 0.0986 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:34,988-Speed 9371.36 samples/sec Loss 29.9877 LearningRate 0.0986 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:36,066-Speed 9507.48 samples/sec Loss 29.8158 LearningRate 0.0986 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:37,161-Speed 9355.25 samples/sec Loss 29.8065 LearningRate 0.0986 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:38,239-Speed 9507.44 samples/sec Loss 29.7894 LearningRate 0.0986 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:39,331-Speed 9377.40 samples/sec Loss 29.6188 LearningRate 0.0986 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:40,412-Speed 9492.49 samples/sec Loss 29.7129 LearningRate 0.0986 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:41,505-Speed 9375.06 samples/sec Loss 29.4236 LearningRate 0.0986 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:42,593-Speed 9409.32 samples/sec Loss 29.5886 LearningRate 0.0986 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:38:43,692-Speed 9326.12 samples/sec Loss 29.2220 LearningRate 0.0986 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:44,797-Speed 9274.61 samples/sec Loss 29.4497 LearningRate 0.0986 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:45,924-Speed 9091.93 samples/sec Loss 29.4958 LearningRate 0.0986 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:47,011-Speed 9426.79 samples/sec Loss 29.3085 LearningRate 0.0986 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:48,087-Speed 9520.95 samples/sec Loss 29.3025 LearningRate 0.0986 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:49,154-Speed 9600.91 samples/sec Loss 29.3635 LearningRate 0.0985 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:50,233-Speed 9492.13 samples/sec Loss 29.1482 LearningRate 0.0985 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:51,280-Speed 9795.03 samples/sec Loss 29.0147 LearningRate 0.0985 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:52,349-Speed 9575.79 samples/sec Loss 29.1862 LearningRate 0.0985 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:53,395-Speed 9802.06 samples/sec Loss 28.8512 LearningRate 0.0985 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:54,463-Speed 9594.88 samples/sec Loss 28.9959 LearningRate 0.0985 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:55,546-Speed 9456.99 samples/sec Loss 28.8837 LearningRate 0.0985 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:38:56,654-Speed 9247.07 samples/sec Loss 28.9614 LearningRate 0.0985 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:38:57,740-Speed 9440.23 samples/sec Loss 28.7684 LearningRate 0.0985 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:38:58,824-Speed 9449.20 samples/sec Loss 28.7972 LearningRate 0.0985 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:38:59,917-Speed 9380.55 samples/sec Loss 28.6234 LearningRate 0.0985 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:01,054-Speed 9012.10 samples/sec Loss 28.7611 LearningRate 0.0985 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:02,124-Speed 9575.43 samples/sec Loss 28.6291 LearningRate 0.0985 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:03,232-Speed 9244.25 samples/sec Loss 28.5612 LearningRate 0.0985 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:04,279-Speed 9781.87 samples/sec Loss 28.5539 LearningRate 0.0985 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:05,401-Speed 9136.73 samples/sec Loss 28.5903 LearningRate 0.0985 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:06,447-Speed 9794.01 samples/sec Loss 28.5194 LearningRate 0.0985 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:07,533-Speed 9434.32 samples/sec Loss 28.4560 LearningRate 0.0984 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:08,641-Speed 9241.18 samples/sec Loss 28.3076 LearningRate 0.0984 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:09,704-Speed 9645.98 samples/sec Loss 28.2020 LearningRate 0.0984 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:10,799-Speed 9354.14 samples/sec Loss 28.2341 LearningRate 0.0984 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:11,869-Speed 9575.73 samples/sec Loss 28.0553 LearningRate 0.0984 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:12,933-Speed 9632.53 samples/sec Loss 28.0487 LearningRate 0.0984 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:14,060-Speed 9086.66 samples/sec Loss 28.0750 LearningRate 0.0984 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:15,206-Speed 8941.79 samples/sec Loss 27.8443 LearningRate 0.0984 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:16,326-Speed 9154.13 samples/sec Loss 27.9489 LearningRate 0.0984 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:17,410-Speed 9446.34 samples/sec Loss 28.0030 LearningRate 0.0984 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:18,528-Speed 9167.71 samples/sec Loss 27.9432 LearningRate 0.0984 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:19,634-Speed 9259.79 samples/sec Loss 27.9509 LearningRate 0.0984 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:20,750-Speed 9187.54 samples/sec Loss 27.8057 LearningRate 0.0984 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:21,815-Speed 9617.32 samples/sec Loss 27.7531 LearningRate 0.0984 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:22,931-Speed 9182.47 samples/sec Loss 27.6869 LearningRate 0.0984 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:24,006-Speed 9526.62 samples/sec Loss 27.5178 LearningRate 0.0984 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:25,029-Speed 10015.83 samples/sec Loss 27.5282 LearningRate 0.0984 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:26,073-Speed 9820.85 samples/sec Loss 27.5394 LearningRate 0.0983 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:27,146-Speed 9553.98 samples/sec Loss 27.5231 LearningRate 0.0983 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-04-11 11:39:28,255-Speed 9234.42 samples/sec Loss 27.5008 LearningRate 0.0983 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:29,352-Speed 9338.33 samples/sec Loss 27.4459 LearningRate 0.0983 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:30,453-Speed 9306.79 samples/sec Loss 27.1967 LearningRate 0.0983 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:31,510-Speed 9690.65 samples/sec Loss 27.3284 LearningRate 0.0983 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:32,575-Speed 9626.86 samples/sec Loss 27.2653 LearningRate 0.0983 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:33,611-Speed 9886.26 samples/sec Loss 27.1966 LearningRate 0.0983 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:34,649-Speed 9874.45 samples/sec Loss 27.1315 LearningRate 0.0983 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:35,718-Speed 9587.95 samples/sec Loss 27.1248 LearningRate 0.0983 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:36,829-Speed 9221.84 samples/sec Loss 27.1346 LearningRate 0.0983 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:37,915-Speed 9437.15 samples/sec Loss 26.9945 LearningRate 0.0983 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:39,000-Speed 9440.38 samples/sec Loss 26.9756 LearningRate 0.0983 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:40,097-Speed 9344.81 samples/sec Loss 26.8477 LearningRate 0.0983 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:41,171-Speed 9539.52 samples/sec Loss 26.8870 LearningRate 0.0983 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:42,258-Speed 9427.45 samples/sec Loss 26.6652 LearningRate 0.0983 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:43,316-Speed 9684.71 samples/sec Loss 26.5680 LearningRate 0.0983 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:39:44,374-Speed 9677.62 samples/sec Loss 26.6828 LearningRate 0.0982 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:39:45,452-Speed 9511.06 samples/sec Loss 26.6652 LearningRate 0.0982 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:39:46,569-Speed 9169.44 samples/sec Loss 26.4338 LearningRate 0.0982 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:39:47,637-Speed 9589.44 samples/sec Loss 26.4327 LearningRate 0.0982 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:39:48,730-Speed 9378.20 samples/sec Loss 26.4036 LearningRate 0.0982 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:39:49,828-Speed 9324.49 samples/sec Loss 26.3230 LearningRate 0.0982 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:39:50,937-Speed 9248.72 samples/sec Loss 26.4164 LearningRate 0.0982 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:39:52,048-Speed 9221.44 samples/sec Loss 26.2399 LearningRate 0.0982 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:39:53,136-Speed 9411.06 samples/sec Loss 26.3143 LearningRate 0.0982 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:39:54,230-Speed 9367.21 samples/sec Loss 26.2874 LearningRate 0.0982 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:55,322-Speed 9389.51 samples/sec Loss 26.2631 LearningRate 0.0982 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:56,426-Speed 9276.65 samples/sec Loss 26.0831 LearningRate 0.0982 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:57,515-Speed 9416.20 samples/sec Loss 25.9044 LearningRate 0.0982 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:58,592-Speed 9512.59 samples/sec Loss 26.0520 LearningRate 0.0982 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:39:59,670-Speed 9507.19 samples/sec Loss 26.0527 LearningRate 0.0982 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:00,747-Speed 9512.21 samples/sec Loss 26.0106 LearningRate 0.0982 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:01,835-Speed 9419.05 samples/sec Loss 26.0056 LearningRate 0.0982 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:02,886-Speed 9745.99 samples/sec Loss 25.8853 LearningRate 0.0981 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:04,010-Speed 9112.70 samples/sec Loss 25.8001 LearningRate 0.0981 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:05,095-Speed 9447.08 samples/sec Loss 25.8179 LearningRate 0.0981 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:06,184-Speed 9407.05 samples/sec Loss 25.7573 LearningRate 0.0981 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:07,294-Speed 9228.03 samples/sec Loss 25.7964 LearningRate 0.0981 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:08,392-Speed 9338.44 samples/sec Loss 25.5342 LearningRate 0.0981 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:09,482-Speed 9392.22 samples/sec Loss 25.5955 LearningRate 0.0981 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:10,587-Speed 9275.48 samples/sec Loss 25.6632 LearningRate 0.0981 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:11,668-Speed 9477.49 samples/sec Loss 25.6035 LearningRate 0.0981 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:12,736-Speed 9597.40 samples/sec Loss 25.4692 LearningRate 0.0981 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:13,827-Speed 9387.46 samples/sec Loss 25.4972 LearningRate 0.0981 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:14,929-Speed 9301.07 samples/sec Loss 25.3767 LearningRate 0.0981 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:16,005-Speed 9526.43 samples/sec Loss 25.4692 LearningRate 0.0981 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:17,070-Speed 9619.72 samples/sec Loss 25.3869 LearningRate 0.0981 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:18,167-Speed 9339.55 samples/sec Loss 25.3993 LearningRate 0.0981 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:19,212-Speed 9801.86 samples/sec Loss 25.3644 LearningRate 0.0981 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:20,323-Speed 9225.96 samples/sec Loss 25.3812 LearningRate 0.0981 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:21,395-Speed 9591.06 samples/sec Loss 25.0313 LearningRate 0.0980 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:22,469-Speed 9543.98 samples/sec Loss 25.1362 LearningRate 0.0980 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:23,554-Speed 9435.29 samples/sec Loss 25.1974 LearningRate 0.0980 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:24,696-Speed 8972.28 samples/sec Loss 24.9678 LearningRate 0.0980 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:25,779-Speed 9460.17 samples/sec Loss 25.1927 LearningRate 0.0980 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:26,871-Speed 9383.44 samples/sec Loss 24.7233 LearningRate 0.0980 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:27,935-Speed 9636.86 samples/sec Loss 24.9191 LearningRate 0.0980 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:29,027-Speed 9380.91 samples/sec Loss 24.8747 LearningRate 0.0980 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:40:30,099-Speed 9554.38 samples/sec Loss 24.7252 LearningRate 0.0980 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:31,181-Speed 9473.79 samples/sec Loss 24.7704 LearningRate 0.0980 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:32,242-Speed 9648.62 samples/sec Loss 24.7138 LearningRate 0.0980 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:33,352-Speed 9232.60 samples/sec Loss 24.7365 LearningRate 0.0980 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:34,475-Speed 9122.57 samples/sec Loss 24.8114 LearningRate 0.0980 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:35,526-Speed 9750.10 samples/sec Loss 24.4197 LearningRate 0.0980 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:36,593-Speed 9607.15 samples/sec Loss 24.4058 LearningRate 0.0980 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:37,627-Speed 9909.12 samples/sec Loss 24.5961 LearningRate 0.0980 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:40:38,689-Speed 9645.79 samples/sec Loss 24.4074 LearningRate 0.0979 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:39,742-Speed 9730.23 samples/sec Loss 24.5136 LearningRate 0.0979 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:40,835-Speed 9378.08 samples/sec Loss 24.4123 LearningRate 0.0979 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:41,976-Speed 8979.70 samples/sec Loss 24.2460 LearningRate 0.0979 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:43,039-Speed 9636.98 samples/sec Loss 24.3612 LearningRate 0.0979 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:44,127-Speed 9419.95 samples/sec Loss 24.2599 LearningRate 0.0979 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:45,215-Speed 9411.79 samples/sec Loss 24.3556 LearningRate 0.0979 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:46,322-Speed 9258.64 samples/sec Loss 24.1862 LearningRate 0.0979 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:47,394-Speed 9554.79 samples/sec Loss 24.2493 LearningRate 0.0979 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:48,490-Speed 9349.15 samples/sec Loss 24.1878 LearningRate 0.0979 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:49,556-Speed 9614.34 samples/sec Loss 24.0111 LearningRate 0.0979 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:40:50,627-Speed 9565.01 samples/sec Loss 23.9228 LearningRate 0.0979 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:40:51,739-Speed 9216.69 samples/sec Loss 23.9314 LearningRate 0.0979 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:40:52,773-Speed 9909.68 samples/sec Loss 23.9337 LearningRate 0.0979 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:40:53,862-Speed 9402.73 samples/sec Loss 23.9387 LearningRate 0.0979 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:40:54,973-Speed 9227.95 samples/sec Loss 23.7981 LearningRate 0.0979 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:40:56,089-Speed 9179.20 samples/sec Loss 23.9713 LearningRate 0.0979 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:40:57,208-Speed 9157.64 samples/sec Loss 23.9104 LearningRate 0.0978 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:40:58,311-Speed 9293.94 samples/sec Loss 23.7161 LearningRate 0.0978 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:40:59,401-Speed 9393.31 samples/sec Loss 23.8549 LearningRate 0.0978 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:00,462-Speed 9659.49 samples/sec Loss 23.7585 LearningRate 0.0978 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:01,528-Speed 9614.76 samples/sec Loss 23.6619 LearningRate 0.0978 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:02,607-Speed 9488.70 samples/sec Loss 23.5985 LearningRate 0.0978 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:03,676-Speed 9590.58 samples/sec Loss 23.5304 LearningRate 0.0978 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:04,767-Speed 9383.92 samples/sec Loss 23.4844 LearningRate 0.0978 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:05,889-Speed 9136.40 samples/sec Loss 23.5951 LearningRate 0.0978 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:06,977-Speed 9419.57 samples/sec Loss 23.5941 LearningRate 0.0978 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:08,100-Speed 9124.25 samples/sec Loss 23.5238 LearningRate 0.0978 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:09,226-Speed 9101.63 samples/sec Loss 23.5394 LearningRate 0.0978 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:10,267-Speed 9842.09 samples/sec Loss 23.4746 LearningRate 0.0978 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:11,368-Speed 9306.08 samples/sec Loss 23.4492 LearningRate 0.0978 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:12,484-Speed 9175.36 samples/sec Loss 23.3254 LearningRate 0.0978 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:13,575-Speed 9393.64 samples/sec Loss 23.4848 LearningRate 0.0978 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:14,621-Speed 9797.33 samples/sec Loss 23.4905 LearningRate 0.0978 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:15,715-Speed 9371.64 samples/sec Loss 23.2275 LearningRate 0.0977 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:16,785-Speed 9570.87 samples/sec Loss 23.1681 LearningRate 0.0977 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:17,875-Speed 9402.67 samples/sec Loss 23.1944 LearningRate 0.0977 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:19,000-Speed 9107.73 samples/sec Loss 23.1811 LearningRate 0.0977 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:20,097-Speed 9342.89 samples/sec Loss 23.1209 LearningRate 0.0977 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:21,153-Speed 9701.62 samples/sec Loss 23.0708 LearningRate 0.0977 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:22,232-Speed 9500.73 samples/sec Loss 23.1382 LearningRate 0.0977 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:23,310-Speed 9495.88 samples/sec Loss 23.1086 LearningRate 0.0977 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:24,452-Speed 8975.53 samples/sec Loss 23.0812 LearningRate 0.0977 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:25,592-Speed 8990.64 samples/sec Loss 22.8504 LearningRate 0.0977 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:26,750-Speed 8846.44 samples/sec Loss 22.9490 LearningRate 0.0977 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:27,832-Speed 9471.54 samples/sec Loss 22.9126 LearningRate 0.0977 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:41:28,903-Speed 9566.29 samples/sec Loss 22.8571 LearningRate 0.0977 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:29,980-Speed 9510.94 samples/sec Loss 22.7082 LearningRate 0.0977 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:31,073-Speed 9377.33 samples/sec Loss 22.9047 LearningRate 0.0977 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:32,161-Speed 9415.93 samples/sec Loss 22.5243 LearningRate 0.0977 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:33,249-Speed 9413.58 samples/sec Loss 22.7221 LearningRate 0.0977 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:34,342-Speed 9369.79 samples/sec Loss 22.4994 LearningRate 0.0976 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:35,419-Speed 9522.06 samples/sec Loss 22.5968 LearningRate 0.0976 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:36,519-Speed 9313.16 samples/sec Loss 22.6006 LearningRate 0.0976 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:37,628-Speed 9235.46 samples/sec Loss 22.6227 LearningRate 0.0976 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:38,724-Speed 9348.41 samples/sec Loss 22.5620 LearningRate 0.0976 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:41:39,815-Speed 9393.46 samples/sec Loss 22.4115 LearningRate 0.0976 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:42:01,886-[lfw][4000]XNorm: 18.150763 Training: 2022-04-11 11:42:01,887-[lfw][4000]Accuracy-Flip: 0.97550+-0.00633 Training: 2022-04-11 11:42:01,887-[lfw][4000]Accuracy-Highest: 0.97550 Training: 2022-04-11 11:42:27,367-[cfp_fp][4000]XNorm: 15.910640 Training: 2022-04-11 11:42:27,368-[cfp_fp][4000]Accuracy-Flip: 0.81714+-0.01902 Training: 2022-04-11 11:42:27,368-[cfp_fp][4000]Accuracy-Highest: 0.81714 Training: 2022-04-11 11:42:49,334-[agedb_30][4000]XNorm: 17.389619 Training: 2022-04-11 11:42:49,334-[agedb_30][4000]Accuracy-Flip: 0.84050+-0.02569 Training: 2022-04-11 11:42:49,334-[agedb_30][4000]Accuracy-Highest: 0.84050 Training: 2022-04-11 11:42:50,402-Speed 145.07 samples/sec Loss 22.2854 LearningRate 0.0976 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:42:51,432-Speed 9949.98 samples/sec Loss 22.3653 LearningRate 0.0976 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:42:52,497-Speed 9616.18 samples/sec Loss 22.4593 LearningRate 0.0976 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:42:53,579-Speed 9466.73 samples/sec Loss 22.2937 LearningRate 0.0976 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:42:54,715-Speed 9023.90 samples/sec Loss 22.4776 LearningRate 0.0976 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:42:55,794-Speed 9497.56 samples/sec Loss 22.0867 LearningRate 0.0976 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:42:56,910-Speed 9174.70 samples/sec Loss 21.9746 LearningRate 0.0976 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:42:57,947-Speed 9888.07 samples/sec Loss 22.1232 LearningRate 0.0976 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:42:59,030-Speed 9460.29 samples/sec Loss 21.9763 LearningRate 0.0976 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:43:00,099-Speed 9582.60 samples/sec Loss 22.2258 LearningRate 0.0976 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:43:01,198-Speed 9324.19 samples/sec Loss 22.0691 LearningRate 0.0976 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:43:02,290-Speed 9384.79 samples/sec Loss 22.0069 LearningRate 0.0975 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:43:03,374-Speed 9450.77 samples/sec Loss 21.9752 LearningRate 0.0975 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:43:04,483-Speed 9243.09 samples/sec Loss 22.0534 LearningRate 0.0975 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:43:05,569-Speed 9435.82 samples/sec Loss 21.9970 LearningRate 0.0975 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:43:06,629-Speed 9665.16 samples/sec Loss 21.6982 LearningRate 0.0975 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:43:07,733-Speed 9279.84 samples/sec Loss 21.8319 LearningRate 0.0975 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:43:08,795-Speed 9644.53 samples/sec Loss 21.9921 LearningRate 0.0975 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:43:09,875-Speed 9484.39 samples/sec Loss 21.9641 LearningRate 0.0975 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:43:10,926-Speed 9746.21 samples/sec Loss 22.1073 LearningRate 0.0975 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 11:43:12,000-Speed 9545.32 samples/sec Loss 21.9464 LearningRate 0.0975 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:43:13,082-Speed 9467.92 samples/sec Loss 21.8877 LearningRate 0.0975 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:43:14,182-Speed 9314.88 samples/sec Loss 21.7187 LearningRate 0.0975 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 11:43:15,245-Speed 9642.91 samples/sec Loss 21.7246 LearningRate 0.0975 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:16,323-Speed 9501.18 samples/sec Loss 21.7000 LearningRate 0.0975 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:17,425-Speed 9299.60 samples/sec Loss 21.6902 LearningRate 0.0975 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:18,521-Speed 9343.96 samples/sec Loss 21.6160 LearningRate 0.0975 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:19,593-Speed 9557.25 samples/sec Loss 21.6972 LearningRate 0.0975 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:20,679-Speed 9434.27 samples/sec Loss 21.8595 LearningRate 0.0974 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:21,819-Speed 8991.96 samples/sec Loss 21.4877 LearningRate 0.0974 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:22,954-Speed 9020.67 samples/sec Loss 21.5318 LearningRate 0.0974 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:24,023-Speed 9587.70 samples/sec Loss 21.4337 LearningRate 0.0974 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:25,145-Speed 9129.34 samples/sec Loss 21.3612 LearningRate 0.0974 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:26,308-Speed 8811.24 samples/sec Loss 21.4196 LearningRate 0.0974 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:27,456-Speed 8924.52 samples/sec Loss 21.2804 LearningRate 0.0974 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:28,531-Speed 9533.75 samples/sec Loss 21.5462 LearningRate 0.0974 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:29,628-Speed 9341.88 samples/sec Loss 21.2936 LearningRate 0.0974 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:30,729-Speed 9313.53 samples/sec Loss 21.2395 LearningRate 0.0974 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:31,843-Speed 9193.94 samples/sec Loss 21.1103 LearningRate 0.0974 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:32,930-Speed 9431.57 samples/sec Loss 21.1466 LearningRate 0.0974 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:34,036-Speed 9257.76 samples/sec Loss 21.3131 LearningRate 0.0974 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:35,147-Speed 9222.46 samples/sec Loss 21.1059 LearningRate 0.0974 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:36,257-Speed 9231.40 samples/sec Loss 21.1085 LearningRate 0.0974 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:37,345-Speed 9420.07 samples/sec Loss 21.1556 LearningRate 0.0974 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:38,412-Speed 9608.78 samples/sec Loss 21.1984 LearningRate 0.0974 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:39,512-Speed 9306.00 samples/sec Loss 21.1935 LearningRate 0.0973 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:40,570-Speed 9688.94 samples/sec Loss 20.9696 LearningRate 0.0973 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:41,619-Speed 9763.21 samples/sec Loss 20.8218 LearningRate 0.0973 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:42,694-Speed 9536.73 samples/sec Loss 21.0455 LearningRate 0.0973 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:43,752-Speed 9681.25 samples/sec Loss 21.0737 LearningRate 0.0973 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:44,811-Speed 9678.19 samples/sec Loss 20.9992 LearningRate 0.0973 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:45,876-Speed 9617.55 samples/sec Loss 20.9504 LearningRate 0.0973 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:46,966-Speed 9398.36 samples/sec Loss 21.0031 LearningRate 0.0973 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:48,051-Speed 9450.94 samples/sec Loss 20.6416 LearningRate 0.0973 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:43:49,085-Speed 9906.81 samples/sec Loss 20.8955 LearningRate 0.0973 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:50,141-Speed 9703.82 samples/sec Loss 20.7509 LearningRate 0.0973 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:51,243-Speed 9301.55 samples/sec Loss 20.7338 LearningRate 0.0973 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:52,338-Speed 9359.91 samples/sec Loss 20.6263 LearningRate 0.0973 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:53,401-Speed 9634.75 samples/sec Loss 20.6812 LearningRate 0.0973 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:54,520-Speed 9161.08 samples/sec Loss 20.7516 LearningRate 0.0973 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:55,583-Speed 9636.22 samples/sec Loss 20.6207 LearningRate 0.0973 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:56,643-Speed 9666.24 samples/sec Loss 20.5009 LearningRate 0.0973 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:57,695-Speed 9739.87 samples/sec Loss 20.5345 LearningRate 0.0972 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:58,773-Speed 9501.43 samples/sec Loss 20.6136 LearningRate 0.0972 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:43:59,850-Speed 9510.92 samples/sec Loss 20.4605 LearningRate 0.0972 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:00,928-Speed 9514.52 samples/sec Loss 20.5138 LearningRate 0.0972 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:02,038-Speed 9225.31 samples/sec Loss 20.5248 LearningRate 0.0972 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:03,145-Speed 9256.52 samples/sec Loss 20.5428 LearningRate 0.0972 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:04,241-Speed 9344.41 samples/sec Loss 20.4880 LearningRate 0.0972 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:05,300-Speed 9682.11 samples/sec Loss 20.5168 LearningRate 0.0972 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:06,358-Speed 9678.90 samples/sec Loss 20.5719 LearningRate 0.0972 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:07,461-Speed 9296.54 samples/sec Loss 20.4287 LearningRate 0.0972 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:08,544-Speed 9459.21 samples/sec Loss 20.2676 LearningRate 0.0972 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:09,611-Speed 9607.57 samples/sec Loss 20.3449 LearningRate 0.0972 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:10,679-Speed 9594.52 samples/sec Loss 20.2511 LearningRate 0.0972 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:11,755-Speed 9520.25 samples/sec Loss 20.2724 LearningRate 0.0972 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:12,863-Speed 9247.80 samples/sec Loss 20.1414 LearningRate 0.0972 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:13,956-Speed 9366.95 samples/sec Loss 20.3054 LearningRate 0.0972 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:15,124-Speed 8772.62 samples/sec Loss 20.2283 LearningRate 0.0972 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:16,199-Speed 9532.45 samples/sec Loss 19.9648 LearningRate 0.0971 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:17,280-Speed 9477.86 samples/sec Loss 20.1779 LearningRate 0.0971 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:18,346-Speed 9609.47 samples/sec Loss 20.0078 LearningRate 0.0971 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:19,456-Speed 9236.22 samples/sec Loss 20.0908 LearningRate 0.0971 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:20,550-Speed 9365.72 samples/sec Loss 20.1457 LearningRate 0.0971 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:21,619-Speed 9582.12 samples/sec Loss 20.0080 LearningRate 0.0971 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:22,711-Speed 9386.65 samples/sec Loss 20.1607 LearningRate 0.0971 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:23,770-Speed 9676.85 samples/sec Loss 19.9143 LearningRate 0.0971 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:24,947-Speed 8700.89 samples/sec Loss 19.8807 LearningRate 0.0971 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:26,056-Speed 9241.87 samples/sec Loss 20.0876 LearningRate 0.0971 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:27,160-Speed 9274.12 samples/sec Loss 20.0098 LearningRate 0.0971 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:28,223-Speed 9645.08 samples/sec Loss 20.0129 LearningRate 0.0971 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:29,280-Speed 9696.33 samples/sec Loss 19.8824 LearningRate 0.0971 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:30,342-Speed 9646.94 samples/sec Loss 19.9389 LearningRate 0.0971 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:31,442-Speed 9315.80 samples/sec Loss 19.8909 LearningRate 0.0971 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:32,547-Speed 9281.08 samples/sec Loss 19.9501 LearningRate 0.0971 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-04-11 11:44:33,644-Speed 9341.99 samples/sec Loss 19.7934 LearningRate 0.0971 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:34,735-Speed 9388.10 samples/sec Loss 19.7875 LearningRate 0.0970 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:35,788-Speed 9735.73 samples/sec Loss 19.7300 LearningRate 0.0970 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:44:36,877-Speed 9406.79 samples/sec Loss 19.6381 LearningRate 0.0970 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:44:38,000-Speed 9123.06 samples/sec Loss 19.7158 LearningRate 0.0970 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:44:39,072-Speed 9561.97 samples/sec Loss 19.7361 LearningRate 0.0970 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:44:40,139-Speed 9602.95 samples/sec Loss 19.6004 LearningRate 0.0970 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:44:41,232-Speed 9368.34 samples/sec Loss 19.6818 LearningRate 0.0970 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:44:42,329-Speed 9345.09 samples/sec Loss 19.6162 LearningRate 0.0970 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:44:43,389-Speed 9660.81 samples/sec Loss 19.6476 LearningRate 0.0970 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:44:44,526-Speed 9007.59 samples/sec Loss 19.5601 LearningRate 0.0970 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:44:45,656-Speed 9066.39 samples/sec Loss 19.4445 LearningRate 0.0970 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:44:46,749-Speed 9381.76 samples/sec Loss 19.5585 LearningRate 0.0970 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:47,817-Speed 9587.81 samples/sec Loss 19.5550 LearningRate 0.0970 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:48,881-Speed 9632.19 samples/sec Loss 19.6959 LearningRate 0.0970 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:49,956-Speed 9532.71 samples/sec Loss 19.4494 LearningRate 0.0970 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:51,046-Speed 9407.53 samples/sec Loss 19.4698 LearningRate 0.0970 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:52,174-Speed 9077.77 samples/sec Loss 19.4328 LearningRate 0.0970 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:53,236-Speed 9652.72 samples/sec Loss 19.3848 LearningRate 0.0969 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:54,311-Speed 9526.48 samples/sec Loss 19.4890 LearningRate 0.0969 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:55,401-Speed 9401.04 samples/sec Loss 19.4728 LearningRate 0.0969 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:56,492-Speed 9393.68 samples/sec Loss 19.3903 LearningRate 0.0969 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:57,539-Speed 9780.88 samples/sec Loss 19.4691 LearningRate 0.0969 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:58,633-Speed 9371.86 samples/sec Loss 19.1470 LearningRate 0.0969 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:44:59,731-Speed 9330.43 samples/sec Loss 19.3418 LearningRate 0.0969 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:00,863-Speed 9045.81 samples/sec Loss 19.1907 LearningRate 0.0969 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:01,925-Speed 9658.06 samples/sec Loss 19.2618 LearningRate 0.0969 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:03,033-Speed 9246.90 samples/sec Loss 19.2372 LearningRate 0.0969 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:04,140-Speed 9255.33 samples/sec Loss 19.2026 LearningRate 0.0969 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:05,216-Speed 9518.54 samples/sec Loss 19.1667 LearningRate 0.0969 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:06,295-Speed 9496.13 samples/sec Loss 19.0466 LearningRate 0.0969 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:07,396-Speed 9305.96 samples/sec Loss 18.9775 LearningRate 0.0969 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:08,490-Speed 9368.65 samples/sec Loss 19.2514 LearningRate 0.0969 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:09,578-Speed 9417.71 samples/sec Loss 19.1179 LearningRate 0.0969 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:10,689-Speed 9221.53 samples/sec Loss 18.9584 LearningRate 0.0968 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:11,781-Speed 9390.33 samples/sec Loss 19.0272 LearningRate 0.0968 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:12,854-Speed 9547.31 samples/sec Loss 19.0288 LearningRate 0.0968 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:13,990-Speed 9018.60 samples/sec Loss 18.8454 LearningRate 0.0968 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:15,109-Speed 9151.00 samples/sec Loss 18.9004 LearningRate 0.0968 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:16,156-Speed 9789.00 samples/sec Loss 18.9816 LearningRate 0.0968 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:17,243-Speed 9428.96 samples/sec Loss 18.9874 LearningRate 0.0968 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:18,355-Speed 9209.02 samples/sec Loss 18.9047 LearningRate 0.0968 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:19,423-Speed 9598.08 samples/sec Loss 18.7873 LearningRate 0.0968 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:20,507-Speed 9449.40 samples/sec Loss 18.7929 LearningRate 0.0968 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:21,609-Speed 9302.10 samples/sec Loss 18.8171 LearningRate 0.0968 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:22,678-Speed 9585.54 samples/sec Loss 18.8697 LearningRate 0.0968 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:23,719-Speed 9839.37 samples/sec Loss 18.7860 LearningRate 0.0968 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:24,812-Speed 9377.17 samples/sec Loss 18.7701 LearningRate 0.0968 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:25,892-Speed 9483.29 samples/sec Loss 18.7940 LearningRate 0.0968 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:26,996-Speed 9285.09 samples/sec Loss 18.5525 LearningRate 0.0968 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:28,061-Speed 9618.28 samples/sec Loss 18.8522 LearningRate 0.0968 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:29,137-Speed 9522.48 samples/sec Loss 18.7236 LearningRate 0.0967 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:30,202-Speed 9625.34 samples/sec Loss 18.6760 LearningRate 0.0967 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:31,285-Speed 9459.82 samples/sec Loss 18.7610 LearningRate 0.0967 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:32,352-Speed 9608.46 samples/sec Loss 18.5223 LearningRate 0.0967 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:33,436-Speed 9445.83 samples/sec Loss 18.7246 LearningRate 0.0967 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:34,518-Speed 9471.39 samples/sec Loss 18.7294 LearningRate 0.0967 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:35,684-Speed 8788.38 samples/sec Loss 18.6238 LearningRate 0.0967 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:36,790-Speed 9264.41 samples/sec Loss 18.4025 LearningRate 0.0967 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:37,876-Speed 9433.10 samples/sec Loss 18.5196 LearningRate 0.0967 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:38,960-Speed 9448.17 samples/sec Loss 18.4728 LearningRate 0.0967 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:39,989-Speed 9958.66 samples/sec Loss 18.5642 LearningRate 0.0967 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:41,126-Speed 9010.03 samples/sec Loss 18.5164 LearningRate 0.0967 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:42,184-Speed 9688.44 samples/sec Loss 18.3858 LearningRate 0.0967 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:43,248-Speed 9626.62 samples/sec Loss 18.3577 LearningRate 0.0967 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:44,354-Speed 9260.80 samples/sec Loss 18.5251 LearningRate 0.0967 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:45,410-Speed 9705.51 samples/sec Loss 18.4723 LearningRate 0.0967 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:46,462-Speed 9737.46 samples/sec Loss 18.2085 LearningRate 0.0967 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:47,542-Speed 9489.51 samples/sec Loss 18.2877 LearningRate 0.0966 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:48,604-Speed 9642.10 samples/sec Loss 18.2462 LearningRate 0.0966 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:49,660-Speed 9708.98 samples/sec Loss 18.5288 LearningRate 0.0966 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:50,742-Speed 9477.43 samples/sec Loss 18.4966 LearningRate 0.0966 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:51,817-Speed 9529.50 samples/sec Loss 18.1736 LearningRate 0.0966 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:52,908-Speed 9386.44 samples/sec Loss 18.2846 LearningRate 0.0966 Epoch: 0 Global Step: 5690 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:53,985-Speed 9516.09 samples/sec Loss 18.3580 LearningRate 0.0966 Epoch: 0 Global Step: 5700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:45:55,071-Speed 9435.13 samples/sec Loss 18.2558 LearningRate 0.0966 Epoch: 0 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:56,158-Speed 9429.41 samples/sec Loss 17.9494 LearningRate 0.0966 Epoch: 0 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:57,258-Speed 9311.03 samples/sec Loss 18.2213 LearningRate 0.0966 Epoch: 0 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:58,325-Speed 9604.50 samples/sec Loss 18.2671 LearningRate 0.0966 Epoch: 0 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:45:59,451-Speed 9091.87 samples/sec Loss 18.0655 LearningRate 0.0966 Epoch: 0 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:46:00,549-Speed 9338.11 samples/sec Loss 18.1874 LearningRate 0.0966 Epoch: 0 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:46:01,652-Speed 9288.04 samples/sec Loss 18.1885 LearningRate 0.0966 Epoch: 0 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:02,749-Speed 9336.62 samples/sec Loss 18.2172 LearningRate 0.0966 Epoch: 0 Global Step: 5780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:03,842-Speed 9375.49 samples/sec Loss 18.1127 LearningRate 0.0966 Epoch: 0 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:04,941-Speed 9322.06 samples/sec Loss 18.0833 LearningRate 0.0966 Epoch: 0 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:06,006-Speed 9621.28 samples/sec Loss 18.2995 LearningRate 0.0965 Epoch: 0 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:07,075-Speed 9588.68 samples/sec Loss 18.0242 LearningRate 0.0965 Epoch: 0 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:08,120-Speed 9810.07 samples/sec Loss 18.0551 LearningRate 0.0965 Epoch: 0 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:09,250-Speed 9067.31 samples/sec Loss 17.9733 LearningRate 0.0965 Epoch: 0 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:10,353-Speed 9290.91 samples/sec Loss 17.9848 LearningRate 0.0965 Epoch: 0 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:11,419-Speed 9605.88 samples/sec Loss 18.1109 LearningRate 0.0965 Epoch: 0 Global Step: 5860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:12,474-Speed 9708.32 samples/sec Loss 17.9610 LearningRate 0.0965 Epoch: 0 Global Step: 5870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:13,514-Speed 9854.96 samples/sec Loss 17.9215 LearningRate 0.0965 Epoch: 0 Global Step: 5880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:14,587-Speed 9545.04 samples/sec Loss 18.1456 LearningRate 0.0965 Epoch: 0 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:15,677-Speed 9400.11 samples/sec Loss 18.1284 LearningRate 0.0965 Epoch: 0 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:46:16,792-Speed 9191.32 samples/sec Loss 18.0347 LearningRate 0.0965 Epoch: 0 Global Step: 5910 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:46:17,860-Speed 9591.86 samples/sec Loss 17.8141 LearningRate 0.0965 Epoch: 0 Global Step: 5920 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:46:18,942-Speed 9471.31 samples/sec Loss 17.7210 LearningRate 0.0965 Epoch: 0 Global Step: 5930 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:46:20,021-Speed 9493.01 samples/sec Loss 17.7901 LearningRate 0.0965 Epoch: 0 Global Step: 5940 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:46:21,064-Speed 9830.80 samples/sec Loss 17.9212 LearningRate 0.0965 Epoch: 0 Global Step: 5950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:46:22,114-Speed 9759.04 samples/sec Loss 17.9150 LearningRate 0.0965 Epoch: 0 Global Step: 5960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:46:23,166-Speed 9736.27 samples/sec Loss 17.8616 LearningRate 0.0965 Epoch: 0 Global Step: 5970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:46:24,228-Speed 9648.05 samples/sec Loss 17.6481 LearningRate 0.0964 Epoch: 0 Global Step: 5980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:46:25,327-Speed 9323.36 samples/sec Loss 17.6401 LearningRate 0.0964 Epoch: 0 Global Step: 5990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:46:26,464-Speed 9014.60 samples/sec Loss 17.7010 LearningRate 0.0964 Epoch: 0 Global Step: 6000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:46:48,276-[lfw][6000]XNorm: 16.679557 Training: 2022-04-11 11:46:48,277-[lfw][6000]Accuracy-Flip: 0.98383+-0.00606 Training: 2022-04-11 11:46:48,277-[lfw][6000]Accuracy-Highest: 0.98383 Training: 2022-04-11 11:47:13,524-[cfp_fp][6000]XNorm: 14.667061 Training: 2022-04-11 11:47:13,525-[cfp_fp][6000]Accuracy-Flip: 0.84529+-0.01912 Training: 2022-04-11 11:47:13,525-[cfp_fp][6000]Accuracy-Highest: 0.84529 Training: 2022-04-11 11:47:35,305-[agedb_30][6000]XNorm: 16.206444 Training: 2022-04-11 11:47:35,306-[agedb_30][6000]Accuracy-Flip: 0.87917+-0.01664 Training: 2022-04-11 11:47:35,307-[agedb_30][6000]Accuracy-Highest: 0.87917 Training: 2022-04-11 11:47:36,385-Speed 146.45 samples/sec Loss 17.5008 LearningRate 0.0964 Epoch: 0 Global Step: 6010 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:37,452-Speed 9598.74 samples/sec Loss 17.7172 LearningRate 0.0964 Epoch: 0 Global Step: 6020 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:38,524-Speed 9556.98 samples/sec Loss 17.8318 LearningRate 0.0964 Epoch: 0 Global Step: 6030 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:39,582-Speed 9684.99 samples/sec Loss 17.7027 LearningRate 0.0964 Epoch: 0 Global Step: 6040 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:40,686-Speed 9281.07 samples/sec Loss 17.6464 LearningRate 0.0964 Epoch: 0 Global Step: 6050 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:41,746-Speed 9666.90 samples/sec Loss 17.6286 LearningRate 0.0964 Epoch: 0 Global Step: 6060 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:42,799-Speed 9724.41 samples/sec Loss 17.6727 LearningRate 0.0964 Epoch: 0 Global Step: 6070 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:43,873-Speed 9545.52 samples/sec Loss 17.6142 LearningRate 0.0964 Epoch: 0 Global Step: 6080 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:44,999-Speed 9098.83 samples/sec Loss 17.6770 LearningRate 0.0964 Epoch: 0 Global Step: 6090 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:46,080-Speed 9475.74 samples/sec Loss 17.5960 LearningRate 0.0964 Epoch: 0 Global Step: 6100 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:47,175-Speed 9360.01 samples/sec Loss 17.5396 LearningRate 0.0964 Epoch: 0 Global Step: 6110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:47:48,270-Speed 9362.37 samples/sec Loss 17.5554 LearningRate 0.0964 Epoch: 0 Global Step: 6120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:47:49,351-Speed 9474.00 samples/sec Loss 17.5602 LearningRate 0.0964 Epoch: 0 Global Step: 6130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:47:50,431-Speed 9486.83 samples/sec Loss 17.4436 LearningRate 0.0964 Epoch: 0 Global Step: 6140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:47:51,538-Speed 9253.54 samples/sec Loss 17.4730 LearningRate 0.0963 Epoch: 0 Global Step: 6150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:47:52,692-Speed 8877.83 samples/sec Loss 17.4105 LearningRate 0.0963 Epoch: 0 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:47:53,753-Speed 9659.82 samples/sec Loss 17.3556 LearningRate 0.0963 Epoch: 0 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:47:54,839-Speed 9437.72 samples/sec Loss 17.4628 LearningRate 0.0963 Epoch: 0 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:47:55,936-Speed 9331.70 samples/sec Loss 17.3087 LearningRate 0.0963 Epoch: 0 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:47:57,003-Speed 9606.91 samples/sec Loss 17.4685 LearningRate 0.0963 Epoch: 0 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:47:58,082-Speed 9501.80 samples/sec Loss 17.5520 LearningRate 0.0963 Epoch: 0 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:47:59,150-Speed 9588.23 samples/sec Loss 17.4761 LearningRate 0.0963 Epoch: 0 Global Step: 6220 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:00,274-Speed 9120.17 samples/sec Loss 17.4283 LearningRate 0.0963 Epoch: 0 Global Step: 6230 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:01,357-Speed 9461.98 samples/sec Loss 17.1740 LearningRate 0.0963 Epoch: 0 Global Step: 6240 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:02,436-Speed 9493.01 samples/sec Loss 17.4032 LearningRate 0.0963 Epoch: 0 Global Step: 6250 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:03,534-Speed 9327.02 samples/sec Loss 17.1687 LearningRate 0.0963 Epoch: 0 Global Step: 6260 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:04,615-Speed 9481.43 samples/sec Loss 17.2191 LearningRate 0.0963 Epoch: 0 Global Step: 6270 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:05,733-Speed 9167.81 samples/sec Loss 17.2318 LearningRate 0.0963 Epoch: 0 Global Step: 6280 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:06,824-Speed 9387.61 samples/sec Loss 17.3612 LearningRate 0.0963 Epoch: 0 Global Step: 6290 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:07,931-Speed 9264.65 samples/sec Loss 17.2101 LearningRate 0.0963 Epoch: 0 Global Step: 6300 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:09,039-Speed 9242.54 samples/sec Loss 17.1371 LearningRate 0.0963 Epoch: 0 Global Step: 6310 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:10,142-Speed 9292.87 samples/sec Loss 17.1368 LearningRate 0.0962 Epoch: 0 Global Step: 6320 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:11,264-Speed 9126.10 samples/sec Loss 17.3025 LearningRate 0.0962 Epoch: 0 Global Step: 6330 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:12,388-Speed 9120.82 samples/sec Loss 17.2962 LearningRate 0.0962 Epoch: 0 Global Step: 6340 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:13,521-Speed 9037.21 samples/sec Loss 17.1724 LearningRate 0.0962 Epoch: 0 Global Step: 6350 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:14,577-Speed 9703.90 samples/sec Loss 17.0951 LearningRate 0.0962 Epoch: 0 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:15,655-Speed 9509.80 samples/sec Loss 17.2601 LearningRate 0.0962 Epoch: 0 Global Step: 6370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:16,709-Speed 9720.56 samples/sec Loss 17.1461 LearningRate 0.0962 Epoch: 0 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:17,796-Speed 9423.53 samples/sec Loss 17.1726 LearningRate 0.0962 Epoch: 0 Global Step: 6390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:18,886-Speed 9406.45 samples/sec Loss 17.1888 LearningRate 0.0962 Epoch: 0 Global Step: 6400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:19,937-Speed 9747.20 samples/sec Loss 16.9550 LearningRate 0.0962 Epoch: 0 Global Step: 6410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:20,997-Speed 9671.61 samples/sec Loss 17.1542 LearningRate 0.0962 Epoch: 0 Global Step: 6420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:22,094-Speed 9338.55 samples/sec Loss 17.1342 LearningRate 0.0962 Epoch: 0 Global Step: 6430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:23,174-Speed 9489.02 samples/sec Loss 17.0609 LearningRate 0.0962 Epoch: 0 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:24,229-Speed 9711.99 samples/sec Loss 16.8454 LearningRate 0.0962 Epoch: 0 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:25,335-Speed 9263.48 samples/sec Loss 17.1135 LearningRate 0.0962 Epoch: 0 Global Step: 6460 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:26,438-Speed 9289.38 samples/sec Loss 17.0617 LearningRate 0.0962 Epoch: 0 Global Step: 6470 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:27,499-Speed 9654.36 samples/sec Loss 17.1604 LearningRate 0.0962 Epoch: 0 Global Step: 6480 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:28,577-Speed 9510.55 samples/sec Loss 17.0028 LearningRate 0.0961 Epoch: 0 Global Step: 6490 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:29,672-Speed 9353.09 samples/sec Loss 16.8402 LearningRate 0.0961 Epoch: 0 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:30,733-Speed 9658.50 samples/sec Loss 16.9588 LearningRate 0.0961 Epoch: 0 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:31,800-Speed 9596.09 samples/sec Loss 17.0587 LearningRate 0.0961 Epoch: 0 Global Step: 6520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:32,854-Speed 9727.62 samples/sec Loss 16.8722 LearningRate 0.0961 Epoch: 0 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:33,899-Speed 9797.89 samples/sec Loss 16.7385 LearningRate 0.0961 Epoch: 0 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:34,945-Speed 9800.53 samples/sec Loss 16.9695 LearningRate 0.0961 Epoch: 0 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:36,066-Speed 9139.18 samples/sec Loss 16.9160 LearningRate 0.0961 Epoch: 0 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:37,147-Speed 9485.05 samples/sec Loss 16.8015 LearningRate 0.0961 Epoch: 0 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:38,226-Speed 9490.39 samples/sec Loss 16.9650 LearningRate 0.0961 Epoch: 0 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:39,343-Speed 9180.98 samples/sec Loss 16.7433 LearningRate 0.0961 Epoch: 0 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:40,428-Speed 9436.42 samples/sec Loss 16.7425 LearningRate 0.0961 Epoch: 0 Global Step: 6600 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:41,508-Speed 9493.10 samples/sec Loss 16.9101 LearningRate 0.0961 Epoch: 0 Global Step: 6610 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:42,624-Speed 9178.37 samples/sec Loss 16.7501 LearningRate 0.0961 Epoch: 0 Global Step: 6620 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:43,708-Speed 9451.55 samples/sec Loss 16.7721 LearningRate 0.0961 Epoch: 0 Global Step: 6630 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:48:44,827-Speed 9152.68 samples/sec Loss 16.7537 LearningRate 0.0961 Epoch: 0 Global Step: 6640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:48:45,933-Speed 9267.26 samples/sec Loss 16.7479 LearningRate 0.0961 Epoch: 0 Global Step: 6650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:48:47,007-Speed 9534.24 samples/sec Loss 16.8425 LearningRate 0.0960 Epoch: 0 Global Step: 6660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:48:48,075-Speed 9603.63 samples/sec Loss 16.8204 LearningRate 0.0960 Epoch: 0 Global Step: 6670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:48:49,113-Speed 9864.46 samples/sec Loss 16.7974 LearningRate 0.0960 Epoch: 0 Global Step: 6680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:48:50,151-Speed 9875.94 samples/sec Loss 16.7632 LearningRate 0.0960 Epoch: 0 Global Step: 6690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:48:51,237-Speed 9431.68 samples/sec Loss 16.9259 LearningRate 0.0960 Epoch: 0 Global Step: 6700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:48:52,331-Speed 9367.61 samples/sec Loss 16.8108 LearningRate 0.0960 Epoch: 0 Global Step: 6710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:48:53,459-Speed 9085.17 samples/sec Loss 16.5744 LearningRate 0.0960 Epoch: 0 Global Step: 6720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:48:54,571-Speed 9216.09 samples/sec Loss 16.6875 LearningRate 0.0960 Epoch: 0 Global Step: 6730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:48:55,665-Speed 9368.37 samples/sec Loss 16.5989 LearningRate 0.0960 Epoch: 0 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:56,812-Speed 8931.26 samples/sec Loss 16.6209 LearningRate 0.0960 Epoch: 0 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:57,877-Speed 9617.80 samples/sec Loss 16.5377 LearningRate 0.0960 Epoch: 0 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:48:58,965-Speed 9421.55 samples/sec Loss 16.5793 LearningRate 0.0960 Epoch: 0 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:00,017-Speed 9737.60 samples/sec Loss 16.4705 LearningRate 0.0960 Epoch: 0 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:01,054-Speed 9881.72 samples/sec Loss 16.5168 LearningRate 0.0960 Epoch: 0 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:02,134-Speed 9489.14 samples/sec Loss 16.4327 LearningRate 0.0960 Epoch: 0 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:03,210-Speed 9516.38 samples/sec Loss 16.6456 LearningRate 0.0960 Epoch: 0 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:04,306-Speed 9353.23 samples/sec Loss 16.4754 LearningRate 0.0960 Epoch: 0 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:05,421-Speed 9183.16 samples/sec Loss 16.5452 LearningRate 0.0959 Epoch: 0 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:06,523-Speed 9299.63 samples/sec Loss 16.3983 LearningRate 0.0959 Epoch: 0 Global Step: 6840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:07,596-Speed 9553.34 samples/sec Loss 16.4772 LearningRate 0.0959 Epoch: 0 Global Step: 6850 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:08,709-Speed 9207.90 samples/sec Loss 16.6114 LearningRate 0.0959 Epoch: 0 Global Step: 6860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:09,800-Speed 9391.37 samples/sec Loss 16.2884 LearningRate 0.0959 Epoch: 0 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:10,902-Speed 9294.47 samples/sec Loss 16.4442 LearningRate 0.0959 Epoch: 0 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:11,985-Speed 9460.57 samples/sec Loss 16.4762 LearningRate 0.0959 Epoch: 0 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:13,076-Speed 9395.34 samples/sec Loss 16.6455 LearningRate 0.0959 Epoch: 0 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:14,172-Speed 9346.61 samples/sec Loss 16.4600 LearningRate 0.0959 Epoch: 0 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:15,238-Speed 9607.89 samples/sec Loss 16.5215 LearningRate 0.0959 Epoch: 0 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:16,359-Speed 9139.01 samples/sec Loss 16.1868 LearningRate 0.0959 Epoch: 0 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:17,448-Speed 9413.73 samples/sec Loss 16.3410 LearningRate 0.0959 Epoch: 0 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:18,549-Speed 9303.98 samples/sec Loss 16.2221 LearningRate 0.0959 Epoch: 0 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:19,657-Speed 9248.72 samples/sec Loss 16.5358 LearningRate 0.0959 Epoch: 0 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:20,763-Speed 9261.76 samples/sec Loss 16.3804 LearningRate 0.0959 Epoch: 0 Global Step: 6970 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:21,837-Speed 9539.47 samples/sec Loss 16.4595 LearningRate 0.0959 Epoch: 0 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:22,936-Speed 9327.36 samples/sec Loss 16.4410 LearningRate 0.0959 Epoch: 0 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:24,019-Speed 9458.17 samples/sec Loss 16.4244 LearningRate 0.0959 Epoch: 0 Global Step: 7000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:25,083-Speed 9634.37 samples/sec Loss 16.2831 LearningRate 0.0958 Epoch: 0 Global Step: 7010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:26,158-Speed 9531.21 samples/sec Loss 16.2303 LearningRate 0.0958 Epoch: 0 Global Step: 7020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:27,239-Speed 9474.68 samples/sec Loss 16.4015 LearningRate 0.0958 Epoch: 0 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:28,339-Speed 9317.98 samples/sec Loss 16.3464 LearningRate 0.0958 Epoch: 0 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:29,407-Speed 9592.53 samples/sec Loss 16.4169 LearningRate 0.0958 Epoch: 0 Global Step: 7050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:30,486-Speed 9492.65 samples/sec Loss 16.4705 LearningRate 0.0958 Epoch: 0 Global Step: 7060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:31,580-Speed 9365.36 samples/sec Loss 16.2408 LearningRate 0.0958 Epoch: 0 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:32,676-Speed 9357.75 samples/sec Loss 16.1687 LearningRate 0.0958 Epoch: 0 Global Step: 7080 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:33,790-Speed 9199.89 samples/sec Loss 16.3010 LearningRate 0.0958 Epoch: 0 Global Step: 7090 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:34,866-Speed 9518.49 samples/sec Loss 16.3324 LearningRate 0.0958 Epoch: 0 Global Step: 7100 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:35,968-Speed 9296.22 samples/sec Loss 16.1534 LearningRate 0.0958 Epoch: 0 Global Step: 7110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:37,062-Speed 9365.04 samples/sec Loss 16.1953 LearningRate 0.0958 Epoch: 0 Global Step: 7120 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:38,172-Speed 9231.29 samples/sec Loss 16.2129 LearningRate 0.0958 Epoch: 0 Global Step: 7130 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:39,239-Speed 9617.85 samples/sec Loss 16.2317 LearningRate 0.0958 Epoch: 0 Global Step: 7140 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:40,327-Speed 9416.80 samples/sec Loss 16.1564 LearningRate 0.0958 Epoch: 0 Global Step: 7150 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:41,444-Speed 9171.52 samples/sec Loss 16.0538 LearningRate 0.0958 Epoch: 0 Global Step: 7160 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:42,484-Speed 9845.65 samples/sec Loss 16.0014 LearningRate 0.0958 Epoch: 0 Global Step: 7170 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:43,536-Speed 9743.55 samples/sec Loss 16.0453 LearningRate 0.0957 Epoch: 0 Global Step: 7180 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:44,605-Speed 9578.30 samples/sec Loss 15.9771 LearningRate 0.0957 Epoch: 0 Global Step: 7190 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:45,665-Speed 9667.93 samples/sec Loss 16.1116 LearningRate 0.0957 Epoch: 0 Global Step: 7200 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:46,741-Speed 9527.50 samples/sec Loss 16.0942 LearningRate 0.0957 Epoch: 0 Global Step: 7210 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:47,807-Speed 9610.49 samples/sec Loss 16.1601 LearningRate 0.0957 Epoch: 0 Global Step: 7220 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:48,882-Speed 9531.76 samples/sec Loss 15.9627 LearningRate 0.0957 Epoch: 0 Global Step: 7230 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:49,946-Speed 9631.79 samples/sec Loss 16.0249 LearningRate 0.0957 Epoch: 0 Global Step: 7240 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:51,025-Speed 9495.93 samples/sec Loss 16.0340 LearningRate 0.0957 Epoch: 0 Global Step: 7250 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:52,141-Speed 9176.54 samples/sec Loss 15.9855 LearningRate 0.0957 Epoch: 0 Global Step: 7260 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:53,242-Speed 9313.60 samples/sec Loss 15.9557 LearningRate 0.0957 Epoch: 0 Global Step: 7270 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:54,333-Speed 9387.11 samples/sec Loss 15.8949 LearningRate 0.0957 Epoch: 0 Global Step: 7280 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:55,424-Speed 9394.29 samples/sec Loss 16.1444 LearningRate 0.0957 Epoch: 0 Global Step: 7290 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:49:56,499-Speed 9533.76 samples/sec Loss 15.9049 LearningRate 0.0957 Epoch: 0 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:57,573-Speed 9538.54 samples/sec Loss 15.8453 LearningRate 0.0957 Epoch: 0 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:58,681-Speed 9250.44 samples/sec Loss 16.0681 LearningRate 0.0957 Epoch: 0 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:49:59,764-Speed 9464.14 samples/sec Loss 15.9797 LearningRate 0.0957 Epoch: 0 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:00,848-Speed 9446.16 samples/sec Loss 16.0411 LearningRate 0.0957 Epoch: 0 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:01,941-Speed 9375.35 samples/sec Loss 15.9272 LearningRate 0.0956 Epoch: 0 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:03,040-Speed 9320.94 samples/sec Loss 15.9009 LearningRate 0.0956 Epoch: 0 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:04,090-Speed 9754.11 samples/sec Loss 15.9812 LearningRate 0.0956 Epoch: 0 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:05,146-Speed 9706.18 samples/sec Loss 15.8851 LearningRate 0.0956 Epoch: 0 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:06,231-Speed 9439.52 samples/sec Loss 15.8638 LearningRate 0.0956 Epoch: 0 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:07,300-Speed 9583.58 samples/sec Loss 15.8123 LearningRate 0.0956 Epoch: 0 Global Step: 7400 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:08,376-Speed 9533.33 samples/sec Loss 15.8232 LearningRate 0.0956 Epoch: 0 Global Step: 7410 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:09,463-Speed 9421.76 samples/sec Loss 15.7331 LearningRate 0.0956 Epoch: 0 Global Step: 7420 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:10,520-Speed 9695.65 samples/sec Loss 16.0093 LearningRate 0.0956 Epoch: 0 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:11,622-Speed 9298.26 samples/sec Loss 15.9563 LearningRate 0.0956 Epoch: 0 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:12,741-Speed 9150.99 samples/sec Loss 15.6756 LearningRate 0.0956 Epoch: 0 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:13,797-Speed 9708.70 samples/sec Loss 15.8312 LearningRate 0.0956 Epoch: 0 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:14,866-Speed 9585.44 samples/sec Loss 15.7785 LearningRate 0.0956 Epoch: 0 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:15,963-Speed 9341.07 samples/sec Loss 15.9013 LearningRate 0.0956 Epoch: 0 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:17,043-Speed 9486.86 samples/sec Loss 15.8060 LearningRate 0.0956 Epoch: 0 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:18,110-Speed 9603.13 samples/sec Loss 15.6741 LearningRate 0.0956 Epoch: 0 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:19,226-Speed 9180.30 samples/sec Loss 15.7917 LearningRate 0.0956 Epoch: 0 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:20,276-Speed 9755.61 samples/sec Loss 15.6208 LearningRate 0.0955 Epoch: 0 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:21,328-Speed 9744.90 samples/sec Loss 15.7513 LearningRate 0.0955 Epoch: 0 Global Step: 7530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:22,452-Speed 9110.82 samples/sec Loss 15.7954 LearningRate 0.0955 Epoch: 0 Global Step: 7540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:23,528-Speed 9523.47 samples/sec Loss 15.8151 LearningRate 0.0955 Epoch: 0 Global Step: 7550 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:24,625-Speed 9345.43 samples/sec Loss 15.7733 LearningRate 0.0955 Epoch: 0 Global Step: 7560 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:25,716-Speed 9384.33 samples/sec Loss 15.8369 LearningRate 0.0955 Epoch: 0 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:26,830-Speed 9201.25 samples/sec Loss 15.6917 LearningRate 0.0955 Epoch: 0 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:27,878-Speed 9780.39 samples/sec Loss 15.6704 LearningRate 0.0955 Epoch: 0 Global Step: 7590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:28,963-Speed 9444.44 samples/sec Loss 15.6117 LearningRate 0.0955 Epoch: 0 Global Step: 7600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:30,040-Speed 9506.13 samples/sec Loss 15.5319 LearningRate 0.0955 Epoch: 0 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:31,106-Speed 9609.15 samples/sec Loss 15.5709 LearningRate 0.0955 Epoch: 0 Global Step: 7620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:32,183-Speed 9513.84 samples/sec Loss 15.5791 LearningRate 0.0955 Epoch: 0 Global Step: 7630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:33,277-Speed 9367.40 samples/sec Loss 15.5446 LearningRate 0.0955 Epoch: 0 Global Step: 7640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:34,339-Speed 9651.03 samples/sec Loss 15.5619 LearningRate 0.0955 Epoch: 0 Global Step: 7650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:35,377-Speed 9870.05 samples/sec Loss 15.5141 LearningRate 0.0955 Epoch: 0 Global Step: 7660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:36,425-Speed 9781.15 samples/sec Loss 15.7960 LearningRate 0.0955 Epoch: 0 Global Step: 7670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:37,463-Speed 9871.33 samples/sec Loss 15.6440 LearningRate 0.0955 Epoch: 0 Global Step: 7680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:38,549-Speed 9431.66 samples/sec Loss 15.6327 LearningRate 0.0954 Epoch: 0 Global Step: 7690 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:39,667-Speed 9167.39 samples/sec Loss 15.4566 LearningRate 0.0954 Epoch: 0 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:40,755-Speed 9419.30 samples/sec Loss 15.4343 LearningRate 0.0954 Epoch: 0 Global Step: 7710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:41,831-Speed 9514.78 samples/sec Loss 15.4710 LearningRate 0.0954 Epoch: 0 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:42,892-Speed 9663.60 samples/sec Loss 15.6013 LearningRate 0.0954 Epoch: 0 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:43,977-Speed 9442.05 samples/sec Loss 15.5167 LearningRate 0.0954 Epoch: 0 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:45,029-Speed 9732.59 samples/sec Loss 15.5458 LearningRate 0.0954 Epoch: 0 Global Step: 7750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:46,104-Speed 9533.02 samples/sec Loss 15.4970 LearningRate 0.0954 Epoch: 0 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:47,164-Speed 9669.63 samples/sec Loss 15.4526 LearningRate 0.0954 Epoch: 0 Global Step: 7770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:48,247-Speed 9463.67 samples/sec Loss 15.3404 LearningRate 0.0954 Epoch: 0 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:49,309-Speed 9647.45 samples/sec Loss 15.5775 LearningRate 0.0954 Epoch: 0 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:50:50,342-Speed 9916.82 samples/sec Loss 15.5025 LearningRate 0.0954 Epoch: 0 Global Step: 7800 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:51,393-Speed 9744.84 samples/sec Loss 15.4618 LearningRate 0.0954 Epoch: 0 Global Step: 7810 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:52,482-Speed 9408.95 samples/sec Loss 15.3603 LearningRate 0.0954 Epoch: 0 Global Step: 7820 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:53,527-Speed 9814.07 samples/sec Loss 15.5566 LearningRate 0.0954 Epoch: 0 Global Step: 7830 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:54,583-Speed 9708.58 samples/sec Loss 15.3929 LearningRate 0.0954 Epoch: 0 Global Step: 7840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:55,655-Speed 9560.23 samples/sec Loss 15.4494 LearningRate 0.0954 Epoch: 0 Global Step: 7850 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:56,718-Speed 9636.05 samples/sec Loss 15.7313 LearningRate 0.0953 Epoch: 0 Global Step: 7860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:57,807-Speed 9408.56 samples/sec Loss 15.3551 LearningRate 0.0953 Epoch: 0 Global Step: 7870 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:50:58,944-Speed 9014.42 samples/sec Loss 15.3278 LearningRate 0.0953 Epoch: 0 Global Step: 7880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:51:00,032-Speed 9417.02 samples/sec Loss 15.4437 LearningRate 0.0953 Epoch: 0 Global Step: 7890 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:51:01,083-Speed 9742.02 samples/sec Loss 15.3176 LearningRate 0.0953 Epoch: 0 Global Step: 7900 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:51:02,164-Speed 9485.74 samples/sec Loss 15.2191 LearningRate 0.0953 Epoch: 0 Global Step: 7910 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 11:51:03,195-Speed 9936.19 samples/sec Loss 15.3220 LearningRate 0.0953 Epoch: 0 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:51:04,247-Speed 9731.80 samples/sec Loss 15.3347 LearningRate 0.0953 Epoch: 0 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:51:05,342-Speed 9355.81 samples/sec Loss 15.2523 LearningRate 0.0953 Epoch: 0 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:51:06,416-Speed 9547.22 samples/sec Loss 15.2218 LearningRate 0.0953 Epoch: 0 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:51:07,476-Speed 9664.42 samples/sec Loss 15.1459 LearningRate 0.0953 Epoch: 0 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:51:08,582-Speed 9261.72 samples/sec Loss 15.3113 LearningRate 0.0953 Epoch: 0 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:51:09,656-Speed 9544.29 samples/sec Loss 15.2284 LearningRate 0.0953 Epoch: 0 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:51:10,709-Speed 9723.87 samples/sec Loss 15.1489 LearningRate 0.0953 Epoch: 0 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:51:11,826-Speed 9179.69 samples/sec Loss 15.2488 LearningRate 0.0953 Epoch: 0 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:51:33,638-[lfw][8000]XNorm: 15.520002 Training: 2022-04-11 11:51:33,639-[lfw][8000]Accuracy-Flip: 0.98833+-0.00563 Training: 2022-04-11 11:51:33,639-[lfw][8000]Accuracy-Highest: 0.98833 Training: 2022-04-11 11:51:58,837-[cfp_fp][8000]XNorm: 13.494573 Training: 2022-04-11 11:51:58,838-[cfp_fp][8000]Accuracy-Flip: 0.86786+-0.01484 Training: 2022-04-11 11:51:58,838-[cfp_fp][8000]Accuracy-Highest: 0.86786 Training: 2022-04-11 11:52:20,581-[agedb_30][8000]XNorm: 14.844895 Training: 2022-04-11 11:52:20,581-[agedb_30][8000]Accuracy-Flip: 0.89850+-0.01985 Training: 2022-04-11 11:52:20,582-[agedb_30][8000]Accuracy-Highest: 0.89850 Training: 2022-04-11 11:52:21,676-Speed 146.60 samples/sec Loss 15.0222 LearningRate 0.0953 Epoch: 0 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:22,766-Speed 9401.00 samples/sec Loss 15.2775 LearningRate 0.0953 Epoch: 0 Global Step: 8020 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:52:23,838-Speed 9552.76 samples/sec Loss 15.2405 LearningRate 0.0952 Epoch: 0 Global Step: 8030 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:52:24,880-Speed 9839.74 samples/sec Loss 15.0844 LearningRate 0.0952 Epoch: 0 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:25,999-Speed 9150.91 samples/sec Loss 15.1817 LearningRate 0.0952 Epoch: 0 Global Step: 8050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:27,121-Speed 9133.47 samples/sec Loss 15.1577 LearningRate 0.0952 Epoch: 0 Global Step: 8060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:28,224-Speed 9296.77 samples/sec Loss 15.3948 LearningRate 0.0952 Epoch: 0 Global Step: 8070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:29,307-Speed 9456.54 samples/sec Loss 15.3087 LearningRate 0.0952 Epoch: 0 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:30,446-Speed 8992.65 samples/sec Loss 15.0670 LearningRate 0.0952 Epoch: 0 Global Step: 8090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:31,532-Speed 9437.26 samples/sec Loss 15.2515 LearningRate 0.0952 Epoch: 0 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:32,645-Speed 9201.76 samples/sec Loss 15.2359 LearningRate 0.0952 Epoch: 0 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:33,727-Speed 9473.79 samples/sec Loss 15.0842 LearningRate 0.0952 Epoch: 0 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:34,798-Speed 9562.26 samples/sec Loss 15.1574 LearningRate 0.0952 Epoch: 0 Global Step: 8130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:35,875-Speed 9515.15 samples/sec Loss 15.0796 LearningRate 0.0952 Epoch: 0 Global Step: 8140 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:52:36,935-Speed 9665.56 samples/sec Loss 15.0916 LearningRate 0.0952 Epoch: 0 Global Step: 8150 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:52:37,987-Speed 9745.19 samples/sec Loss 15.1117 LearningRate 0.0952 Epoch: 0 Global Step: 8160 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:52:39,056-Speed 9585.07 samples/sec Loss 15.0558 LearningRate 0.0952 Epoch: 0 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:40,143-Speed 9422.06 samples/sec Loss 15.1955 LearningRate 0.0952 Epoch: 0 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:41,251-Speed 9255.72 samples/sec Loss 15.1030 LearningRate 0.0952 Epoch: 0 Global Step: 8190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:42,314-Speed 9636.04 samples/sec Loss 15.2572 LearningRate 0.0951 Epoch: 0 Global Step: 8200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:43,407-Speed 9375.22 samples/sec Loss 15.0293 LearningRate 0.0951 Epoch: 0 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:44,476-Speed 9591.43 samples/sec Loss 15.1566 LearningRate 0.0951 Epoch: 0 Global Step: 8220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:45,567-Speed 9389.58 samples/sec Loss 15.1310 LearningRate 0.0951 Epoch: 0 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:46,635-Speed 9593.84 samples/sec Loss 15.2285 LearningRate 0.0951 Epoch: 0 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:47,749-Speed 9196.07 samples/sec Loss 15.0746 LearningRate 0.0951 Epoch: 0 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:48,804-Speed 9709.18 samples/sec Loss 15.1088 LearningRate 0.0951 Epoch: 0 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:49,876-Speed 9561.02 samples/sec Loss 14.9655 LearningRate 0.0951 Epoch: 0 Global Step: 8270 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:52:50,994-Speed 9162.25 samples/sec Loss 15.1881 LearningRate 0.0951 Epoch: 0 Global Step: 8280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:52,092-Speed 9335.11 samples/sec Loss 14.9222 LearningRate 0.0951 Epoch: 0 Global Step: 8290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:53,150-Speed 9676.10 samples/sec Loss 14.9933 LearningRate 0.0951 Epoch: 0 Global Step: 8300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:54,220-Speed 9579.18 samples/sec Loss 15.0484 LearningRate 0.0951 Epoch: 0 Global Step: 8310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:55,302-Speed 9465.69 samples/sec Loss 14.9019 LearningRate 0.0951 Epoch: 0 Global Step: 8320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:56,395-Speed 9381.76 samples/sec Loss 14.8229 LearningRate 0.0951 Epoch: 0 Global Step: 8330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:57,467-Speed 9562.90 samples/sec Loss 15.0150 LearningRate 0.0951 Epoch: 0 Global Step: 8340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:58,552-Speed 9449.39 samples/sec Loss 14.8976 LearningRate 0.0951 Epoch: 0 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:52:59,640-Speed 9410.52 samples/sec Loss 14.8662 LearningRate 0.0951 Epoch: 0 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:00,704-Speed 9630.88 samples/sec Loss 14.9685 LearningRate 0.0950 Epoch: 0 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:01,788-Speed 9458.40 samples/sec Loss 15.0106 LearningRate 0.0950 Epoch: 0 Global Step: 8380 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:02,869-Speed 9475.54 samples/sec Loss 15.0083 LearningRate 0.0950 Epoch: 0 Global Step: 8390 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:03,999-Speed 9068.47 samples/sec Loss 14.9393 LearningRate 0.0950 Epoch: 0 Global Step: 8400 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:05,091-Speed 9383.46 samples/sec Loss 14.8000 LearningRate 0.0950 Epoch: 0 Global Step: 8410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:06,193-Speed 9294.56 samples/sec Loss 14.8640 LearningRate 0.0950 Epoch: 0 Global Step: 8420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:07,286-Speed 9369.61 samples/sec Loss 14.9343 LearningRate 0.0950 Epoch: 0 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:08,348-Speed 9661.85 samples/sec Loss 14.9694 LearningRate 0.0950 Epoch: 0 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:09,469-Speed 9142.36 samples/sec Loss 14.9183 LearningRate 0.0950 Epoch: 0 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:10,553-Speed 9443.82 samples/sec Loss 14.9703 LearningRate 0.0950 Epoch: 0 Global Step: 8460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:11,670-Speed 9172.85 samples/sec Loss 14.8755 LearningRate 0.0950 Epoch: 0 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:12,767-Speed 9346.77 samples/sec Loss 14.8454 LearningRate 0.0950 Epoch: 0 Global Step: 8480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:13,867-Speed 9314.09 samples/sec Loss 14.9873 LearningRate 0.0950 Epoch: 0 Global Step: 8490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:14,988-Speed 9141.80 samples/sec Loss 14.6725 LearningRate 0.0950 Epoch: 0 Global Step: 8500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:16,042-Speed 9722.15 samples/sec Loss 14.8946 LearningRate 0.0950 Epoch: 0 Global Step: 8510 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:17,129-Speed 9426.21 samples/sec Loss 14.7996 LearningRate 0.0950 Epoch: 0 Global Step: 8520 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:18,205-Speed 9518.84 samples/sec Loss 14.8122 LearningRate 0.0950 Epoch: 0 Global Step: 8530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:19,307-Speed 9300.91 samples/sec Loss 14.7667 LearningRate 0.0949 Epoch: 0 Global Step: 8540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:20,379-Speed 9559.78 samples/sec Loss 14.9513 LearningRate 0.0949 Epoch: 0 Global Step: 8550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:21,469-Speed 9399.89 samples/sec Loss 14.8031 LearningRate 0.0949 Epoch: 0 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:22,544-Speed 9528.73 samples/sec Loss 14.7933 LearningRate 0.0949 Epoch: 0 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:23,587-Speed 9830.32 samples/sec Loss 14.7071 LearningRate 0.0949 Epoch: 0 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:24,658-Speed 9560.58 samples/sec Loss 14.7208 LearningRate 0.0949 Epoch: 0 Global Step: 8590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:25,755-Speed 9342.46 samples/sec Loss 14.7633 LearningRate 0.0949 Epoch: 0 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:26,867-Speed 9213.22 samples/sec Loss 14.8649 LearningRate 0.0949 Epoch: 0 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:27,958-Speed 9388.58 samples/sec Loss 14.7308 LearningRate 0.0949 Epoch: 0 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:29,051-Speed 9379.20 samples/sec Loss 14.6444 LearningRate 0.0949 Epoch: 0 Global Step: 8630 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:30,182-Speed 9063.92 samples/sec Loss 14.7099 LearningRate 0.0949 Epoch: 0 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:31,280-Speed 9329.78 samples/sec Loss 14.6920 LearningRate 0.0949 Epoch: 0 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:32,346-Speed 9606.95 samples/sec Loss 14.8013 LearningRate 0.0949 Epoch: 0 Global Step: 8660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:33,421-Speed 9530.63 samples/sec Loss 14.6737 LearningRate 0.0949 Epoch: 0 Global Step: 8670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:34,532-Speed 9220.80 samples/sec Loss 14.6479 LearningRate 0.0949 Epoch: 0 Global Step: 8680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:35,607-Speed 9536.95 samples/sec Loss 14.6136 LearningRate 0.0949 Epoch: 0 Global Step: 8690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:36,722-Speed 9188.57 samples/sec Loss 14.5834 LearningRate 0.0949 Epoch: 0 Global Step: 8700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:37,817-Speed 9360.19 samples/sec Loss 14.6875 LearningRate 0.0948 Epoch: 0 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:38,936-Speed 9153.79 samples/sec Loss 14.6546 LearningRate 0.0948 Epoch: 0 Global Step: 8720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:40,021-Speed 9443.02 samples/sec Loss 14.6563 LearningRate 0.0948 Epoch: 0 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:41,104-Speed 9461.59 samples/sec Loss 14.5757 LearningRate 0.0948 Epoch: 0 Global Step: 8740 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:42,179-Speed 9531.81 samples/sec Loss 14.6187 LearningRate 0.0948 Epoch: 0 Global Step: 8750 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:43,290-Speed 9222.39 samples/sec Loss 14.4769 LearningRate 0.0948 Epoch: 0 Global Step: 8760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:44,417-Speed 9092.22 samples/sec Loss 14.5844 LearningRate 0.0948 Epoch: 0 Global Step: 8770 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:45,515-Speed 9335.70 samples/sec Loss 14.6638 LearningRate 0.0948 Epoch: 0 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:46,586-Speed 9570.90 samples/sec Loss 14.6435 LearningRate 0.0948 Epoch: 0 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:47,690-Speed 9275.92 samples/sec Loss 14.7833 LearningRate 0.0948 Epoch: 0 Global Step: 8800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:48,802-Speed 9214.44 samples/sec Loss 14.6483 LearningRate 0.0948 Epoch: 0 Global Step: 8810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:49,886-Speed 9453.12 samples/sec Loss 14.5539 LearningRate 0.0948 Epoch: 0 Global Step: 8820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:50,980-Speed 9364.64 samples/sec Loss 14.6425 LearningRate 0.0948 Epoch: 0 Global Step: 8830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:52,046-Speed 9616.41 samples/sec Loss 14.5090 LearningRate 0.0948 Epoch: 0 Global Step: 8840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:53,165-Speed 9155.04 samples/sec Loss 14.5300 LearningRate 0.0948 Epoch: 0 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:54,206-Speed 9845.80 samples/sec Loss 14.6407 LearningRate 0.0948 Epoch: 0 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:55,299-Speed 9373.53 samples/sec Loss 14.4673 LearningRate 0.0948 Epoch: 0 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:56,387-Speed 9415.88 samples/sec Loss 14.4904 LearningRate 0.0948 Epoch: 0 Global Step: 8880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:57,462-Speed 9523.19 samples/sec Loss 14.5097 LearningRate 0.0947 Epoch: 0 Global Step: 8890 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:53:58,537-Speed 9536.32 samples/sec Loss 14.4103 LearningRate 0.0947 Epoch: 0 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:53:59,657-Speed 9154.86 samples/sec Loss 14.5347 LearningRate 0.0947 Epoch: 0 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:00,757-Speed 9310.50 samples/sec Loss 14.5774 LearningRate 0.0947 Epoch: 0 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:01,873-Speed 9182.22 samples/sec Loss 14.6534 LearningRate 0.0947 Epoch: 0 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:02,971-Speed 9331.64 samples/sec Loss 14.6237 LearningRate 0.0947 Epoch: 0 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:04,058-Speed 9426.11 samples/sec Loss 14.6331 LearningRate 0.0947 Epoch: 0 Global Step: 8950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:05,125-Speed 9605.63 samples/sec Loss 14.4673 LearningRate 0.0947 Epoch: 0 Global Step: 8960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:06,186-Speed 9656.13 samples/sec Loss 14.3469 LearningRate 0.0947 Epoch: 0 Global Step: 8970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:07,295-Speed 9240.15 samples/sec Loss 14.3912 LearningRate 0.0947 Epoch: 0 Global Step: 8980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:08,402-Speed 9256.53 samples/sec Loss 14.5361 LearningRate 0.0947 Epoch: 0 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:09,489-Speed 9424.30 samples/sec Loss 14.5439 LearningRate 0.0947 Epoch: 0 Global Step: 9000 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:54:10,568-Speed 9494.45 samples/sec Loss 14.5701 LearningRate 0.0947 Epoch: 0 Global Step: 9010 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:54:11,642-Speed 9543.04 samples/sec Loss 14.3791 LearningRate 0.0947 Epoch: 0 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:12,748-Speed 9265.20 samples/sec Loss 14.4282 LearningRate 0.0947 Epoch: 0 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:13,827-Speed 9495.30 samples/sec Loss 14.4555 LearningRate 0.0947 Epoch: 0 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:14,911-Speed 9453.77 samples/sec Loss 14.3501 LearningRate 0.0947 Epoch: 0 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:15,997-Speed 9435.89 samples/sec Loss 14.4889 LearningRate 0.0946 Epoch: 0 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:17,033-Speed 9889.67 samples/sec Loss 14.5686 LearningRate 0.0946 Epoch: 0 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:18,104-Speed 9567.66 samples/sec Loss 14.4444 LearningRate 0.0946 Epoch: 0 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:19,155-Speed 9744.89 samples/sec Loss 14.4829 LearningRate 0.0946 Epoch: 0 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:20,255-Speed 9309.59 samples/sec Loss 14.5255 LearningRate 0.0946 Epoch: 0 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:21,320-Speed 9626.38 samples/sec Loss 14.3949 LearningRate 0.0946 Epoch: 0 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:22,427-Speed 9256.19 samples/sec Loss 14.3023 LearningRate 0.0946 Epoch: 0 Global Step: 9120 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:54:23,529-Speed 9293.19 samples/sec Loss 14.3447 LearningRate 0.0946 Epoch: 0 Global Step: 9130 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:54:24,591-Speed 9651.97 samples/sec Loss 14.2210 LearningRate 0.0946 Epoch: 0 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:25,701-Speed 9226.04 samples/sec Loss 14.3370 LearningRate 0.0946 Epoch: 0 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:26,758-Speed 9699.60 samples/sec Loss 14.5415 LearningRate 0.0946 Epoch: 0 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:27,848-Speed 9397.44 samples/sec Loss 14.3287 LearningRate 0.0946 Epoch: 0 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:28,903-Speed 9718.20 samples/sec Loss 14.4068 LearningRate 0.0946 Epoch: 0 Global Step: 9180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:29,984-Speed 9474.40 samples/sec Loss 14.3969 LearningRate 0.0946 Epoch: 0 Global Step: 9190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:31,069-Speed 9440.22 samples/sec Loss 14.4114 LearningRate 0.0946 Epoch: 0 Global Step: 9200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:32,144-Speed 9536.46 samples/sec Loss 14.3317 LearningRate 0.0946 Epoch: 0 Global Step: 9210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:33,261-Speed 9175.72 samples/sec Loss 14.3303 LearningRate 0.0946 Epoch: 0 Global Step: 9220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:34,347-Speed 9426.01 samples/sec Loss 14.4023 LearningRate 0.0945 Epoch: 0 Global Step: 9230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:35,398-Speed 9755.42 samples/sec Loss 14.1657 LearningRate 0.0945 Epoch: 0 Global Step: 9240 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:54:36,434-Speed 9887.72 samples/sec Loss 14.3222 LearningRate 0.0945 Epoch: 0 Global Step: 9250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:37,493-Speed 9671.66 samples/sec Loss 14.4690 LearningRate 0.0945 Epoch: 0 Global Step: 9260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:38,605-Speed 9220.27 samples/sec Loss 14.2836 LearningRate 0.0945 Epoch: 0 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:39,697-Speed 9384.31 samples/sec Loss 14.1386 LearningRate 0.0945 Epoch: 0 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:40,800-Speed 9288.68 samples/sec Loss 14.2956 LearningRate 0.0945 Epoch: 0 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:41,915-Speed 9181.42 samples/sec Loss 14.4021 LearningRate 0.0945 Epoch: 0 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:43,006-Speed 9392.29 samples/sec Loss 14.2632 LearningRate 0.0945 Epoch: 0 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:44,088-Speed 9476.21 samples/sec Loss 14.2763 LearningRate 0.0945 Epoch: 0 Global Step: 9320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:45,142-Speed 9730.09 samples/sec Loss 14.2442 LearningRate 0.0945 Epoch: 0 Global Step: 9330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:46,245-Speed 9289.55 samples/sec Loss 14.2962 LearningRate 0.0945 Epoch: 0 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:47,340-Speed 9355.74 samples/sec Loss 14.2791 LearningRate 0.0945 Epoch: 0 Global Step: 9350 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:54:48,401-Speed 9656.23 samples/sec Loss 14.2043 LearningRate 0.0945 Epoch: 0 Global Step: 9360 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:54:49,458-Speed 9695.89 samples/sec Loss 14.3050 LearningRate 0.0945 Epoch: 0 Global Step: 9370 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:54:50,503-Speed 9799.35 samples/sec Loss 14.2533 LearningRate 0.0945 Epoch: 0 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:51,570-Speed 9607.63 samples/sec Loss 14.3734 LearningRate 0.0945 Epoch: 0 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:52,652-Speed 9465.35 samples/sec Loss 14.2330 LearningRate 0.0944 Epoch: 0 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:53,728-Speed 9525.93 samples/sec Loss 13.9418 LearningRate 0.0944 Epoch: 0 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:54,797-Speed 9578.90 samples/sec Loss 14.1254 LearningRate 0.0944 Epoch: 0 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:55,844-Speed 9789.29 samples/sec Loss 14.0854 LearningRate 0.0944 Epoch: 0 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:56,917-Speed 9545.91 samples/sec Loss 14.1763 LearningRate 0.0944 Epoch: 0 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:58,078-Speed 8829.16 samples/sec Loss 14.1632 LearningRate 0.0944 Epoch: 0 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:54:59,198-Speed 9148.15 samples/sec Loss 14.0741 LearningRate 0.0944 Epoch: 0 Global Step: 9460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:00,297-Speed 9322.03 samples/sec Loss 14.1996 LearningRate 0.0944 Epoch: 0 Global Step: 9470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:01,376-Speed 9490.74 samples/sec Loss 14.0939 LearningRate 0.0944 Epoch: 0 Global Step: 9480 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:55:02,427-Speed 9752.23 samples/sec Loss 14.1523 LearningRate 0.0944 Epoch: 0 Global Step: 9490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:03,532-Speed 9269.35 samples/sec Loss 14.1423 LearningRate 0.0944 Epoch: 0 Global Step: 9500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:04,660-Speed 9089.13 samples/sec Loss 14.1426 LearningRate 0.0944 Epoch: 0 Global Step: 9510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:05,701-Speed 9842.96 samples/sec Loss 14.1207 LearningRate 0.0944 Epoch: 0 Global Step: 9520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:06,764-Speed 9639.88 samples/sec Loss 14.1092 LearningRate 0.0944 Epoch: 0 Global Step: 9530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:07,895-Speed 9051.68 samples/sec Loss 14.1582 LearningRate 0.0944 Epoch: 0 Global Step: 9540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:08,993-Speed 9342.41 samples/sec Loss 14.1210 LearningRate 0.0944 Epoch: 0 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:10,058-Speed 9618.02 samples/sec Loss 14.1277 LearningRate 0.0944 Epoch: 0 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:11,173-Speed 9184.54 samples/sec Loss 14.0517 LearningRate 0.0943 Epoch: 0 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:12,232-Speed 9676.86 samples/sec Loss 14.1337 LearningRate 0.0943 Epoch: 0 Global Step: 9580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:13,292-Speed 9667.56 samples/sec Loss 14.0319 LearningRate 0.0943 Epoch: 0 Global Step: 9590 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:55:14,390-Speed 9327.88 samples/sec Loss 14.1079 LearningRate 0.0943 Epoch: 0 Global Step: 9600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:15,479-Speed 9417.91 samples/sec Loss 14.1678 LearningRate 0.0943 Epoch: 0 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:16,554-Speed 9531.68 samples/sec Loss 13.9427 LearningRate 0.0943 Epoch: 0 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:17,618-Speed 9625.78 samples/sec Loss 14.0555 LearningRate 0.0943 Epoch: 0 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:18,712-Speed 9366.87 samples/sec Loss 13.9973 LearningRate 0.0943 Epoch: 0 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:19,823-Speed 9215.93 samples/sec Loss 14.0377 LearningRate 0.0943 Epoch: 0 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:20,891-Speed 9604.18 samples/sec Loss 14.1020 LearningRate 0.0943 Epoch: 0 Global Step: 9660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:21,973-Speed 9467.93 samples/sec Loss 14.0979 LearningRate 0.0943 Epoch: 0 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:23,032-Speed 9674.30 samples/sec Loss 14.0212 LearningRate 0.0943 Epoch: 0 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:24,131-Speed 9324.70 samples/sec Loss 13.9637 LearningRate 0.0943 Epoch: 0 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:25,199-Speed 9593.92 samples/sec Loss 13.9928 LearningRate 0.0943 Epoch: 0 Global Step: 9700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:55:26,300-Speed 9300.15 samples/sec Loss 14.0360 LearningRate 0.0943 Epoch: 0 Global Step: 9710 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:55:27,415-Speed 9188.37 samples/sec Loss 14.0094 LearningRate 0.0943 Epoch: 0 Global Step: 9720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:55:28,491-Speed 9532.66 samples/sec Loss 14.1481 LearningRate 0.0943 Epoch: 0 Global Step: 9730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:55:29,582-Speed 9385.12 samples/sec Loss 13.9599 LearningRate 0.0942 Epoch: 0 Global Step: 9740 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:55:30,679-Speed 9345.94 samples/sec Loss 13.9150 LearningRate 0.0942 Epoch: 0 Global Step: 9750 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:55:31,736-Speed 9688.63 samples/sec Loss 13.9560 LearningRate 0.0942 Epoch: 0 Global Step: 9760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:55:32,788-Speed 9739.73 samples/sec Loss 13.9750 LearningRate 0.0942 Epoch: 0 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:33,854-Speed 9610.63 samples/sec Loss 14.0218 LearningRate 0.0942 Epoch: 0 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:34,955-Speed 9308.82 samples/sec Loss 13.9575 LearningRate 0.0942 Epoch: 0 Global Step: 9790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:36,035-Speed 9479.82 samples/sec Loss 13.8919 LearningRate 0.0942 Epoch: 0 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:37,106-Speed 9567.90 samples/sec Loss 13.9690 LearningRate 0.0942 Epoch: 0 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:38,219-Speed 9214.97 samples/sec Loss 13.8265 LearningRate 0.0942 Epoch: 0 Global Step: 9820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:55:39,334-Speed 9185.25 samples/sec Loss 13.9871 LearningRate 0.0942 Epoch: 0 Global Step: 9830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:55:40,433-Speed 9328.53 samples/sec Loss 13.8396 LearningRate 0.0942 Epoch: 0 Global Step: 9840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:55:41,496-Speed 9636.96 samples/sec Loss 13.9508 LearningRate 0.0942 Epoch: 0 Global Step: 9850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:55:42,567-Speed 9559.78 samples/sec Loss 13.8536 LearningRate 0.0942 Epoch: 0 Global Step: 9860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:55:43,685-Speed 9167.24 samples/sec Loss 13.8362 LearningRate 0.0942 Epoch: 0 Global Step: 9870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:55:44,760-Speed 9534.57 samples/sec Loss 13.9766 LearningRate 0.0942 Epoch: 0 Global Step: 9880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:55:45,803-Speed 9822.47 samples/sec Loss 13.9342 LearningRate 0.0942 Epoch: 0 Global Step: 9890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:55:46,904-Speed 9304.37 samples/sec Loss 13.9251 LearningRate 0.0942 Epoch: 0 Global Step: 9900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:55:48,021-Speed 9175.13 samples/sec Loss 13.8554 LearningRate 0.0942 Epoch: 0 Global Step: 9910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:55:49,117-Speed 9349.10 samples/sec Loss 13.9319 LearningRate 0.0941 Epoch: 0 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:50,200-Speed 9460.83 samples/sec Loss 13.8180 LearningRate 0.0941 Epoch: 0 Global Step: 9930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:55:51,252-Speed 9738.72 samples/sec Loss 13.8256 LearningRate 0.0941 Epoch: 0 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:55:52,393-Speed 8978.88 samples/sec Loss 13.8198 LearningRate 0.0941 Epoch: 0 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:55:53,459-Speed 9612.11 samples/sec Loss 13.8622 LearningRate 0.0941 Epoch: 0 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 11:55:54,503-Speed 9806.27 samples/sec Loss 13.6730 LearningRate 0.0941 Epoch: 0 Global Step: 9970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 11:55:55,589-Speed 9437.68 samples/sec Loss 13.7765 LearningRate 0.0941 Epoch: 0 Global Step: 9980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 11:55:56,677-Speed 9414.97 samples/sec Loss 13.8868 LearningRate 0.0941 Epoch: 0 Global Step: 9990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 11:55:57,780-Speed 9297.00 samples/sec Loss 13.8260 LearningRate 0.0941 Epoch: 0 Global Step: 10000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 11:56:19,588-[lfw][10000]XNorm: 14.822650 Training: 2022-04-11 11:56:19,589-[lfw][10000]Accuracy-Flip: 0.98850+-0.00580 Training: 2022-04-11 11:56:19,589-[lfw][10000]Accuracy-Highest: 0.98850 Training: 2022-04-11 11:56:44,775-[cfp_fp][10000]XNorm: 12.692100 Training: 2022-04-11 11:56:44,776-[cfp_fp][10000]Accuracy-Flip: 0.89357+-0.01505 Training: 2022-04-11 11:56:44,776-[cfp_fp][10000]Accuracy-Highest: 0.89357 Training: 2022-04-11 11:57:06,526-[agedb_30][10000]XNorm: 14.347011 Training: 2022-04-11 11:57:06,527-[agedb_30][10000]Accuracy-Flip: 0.91417+-0.01724 Training: 2022-04-11 11:57:06,527-[agedb_30][10000]Accuracy-Highest: 0.91417 Training: 2022-04-11 11:57:07,614-Speed 146.64 samples/sec Loss 13.8117 LearningRate 0.0941 Epoch: 0 Global Step: 10010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:57:08,661-Speed 9792.04 samples/sec Loss 13.6865 LearningRate 0.0941 Epoch: 0 Global Step: 10020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:57:09,691-Speed 9946.35 samples/sec Loss 13.9296 LearningRate 0.0941 Epoch: 0 Global Step: 10030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:57:10,735-Speed 9810.93 samples/sec Loss 13.7630 LearningRate 0.0941 Epoch: 0 Global Step: 10040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:57:11,806-Speed 9571.14 samples/sec Loss 13.6834 LearningRate 0.0941 Epoch: 0 Global Step: 10050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:57:12,873-Speed 9595.53 samples/sec Loss 13.9078 LearningRate 0.0941 Epoch: 0 Global Step: 10060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:57:13,976-Speed 9292.32 samples/sec Loss 13.7805 LearningRate 0.0941 Epoch: 0 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:15,030-Speed 9721.49 samples/sec Loss 13.7886 LearningRate 0.0941 Epoch: 0 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:16,104-Speed 9541.89 samples/sec Loss 13.8436 LearningRate 0.0940 Epoch: 0 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:17,232-Speed 9082.67 samples/sec Loss 13.7716 LearningRate 0.0940 Epoch: 0 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:18,303-Speed 9564.45 samples/sec Loss 13.8955 LearningRate 0.0940 Epoch: 0 Global Step: 10110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:19,404-Speed 9308.63 samples/sec Loss 13.5553 LearningRate 0.0940 Epoch: 0 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:20,494-Speed 9401.75 samples/sec Loss 13.7221 LearningRate 0.0940 Epoch: 0 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:21,576-Speed 9469.33 samples/sec Loss 13.8253 LearningRate 0.0940 Epoch: 0 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:22,675-Speed 9319.12 samples/sec Loss 13.8168 LearningRate 0.0940 Epoch: 0 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:23,747-Speed 9554.69 samples/sec Loss 13.6775 LearningRate 0.0940 Epoch: 0 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:24,810-Speed 9639.41 samples/sec Loss 13.7506 LearningRate 0.0940 Epoch: 0 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:25,902-Speed 9386.87 samples/sec Loss 13.6320 LearningRate 0.0940 Epoch: 0 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:27,010-Speed 9253.19 samples/sec Loss 13.6417 LearningRate 0.0940 Epoch: 0 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:28,082-Speed 9557.58 samples/sec Loss 13.7480 LearningRate 0.0940 Epoch: 0 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:29,155-Speed 9548.46 samples/sec Loss 13.8239 LearningRate 0.0940 Epoch: 0 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:30,229-Speed 9544.92 samples/sec Loss 13.8356 LearningRate 0.0940 Epoch: 0 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:31,323-Speed 9362.99 samples/sec Loss 13.5969 LearningRate 0.0940 Epoch: 0 Global Step: 10230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:32,388-Speed 9615.98 samples/sec Loss 13.6792 LearningRate 0.0940 Epoch: 0 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:33,440-Speed 9742.79 samples/sec Loss 13.7257 LearningRate 0.0940 Epoch: 0 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:34,481-Speed 9841.00 samples/sec Loss 13.6115 LearningRate 0.0939 Epoch: 0 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:35,582-Speed 9311.19 samples/sec Loss 13.6101 LearningRate 0.0939 Epoch: 0 Global Step: 10270 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:57:36,674-Speed 9380.86 samples/sec Loss 13.7199 LearningRate 0.0939 Epoch: 0 Global Step: 10280 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:57:37,764-Speed 9397.05 samples/sec Loss 13.8257 LearningRate 0.0939 Epoch: 0 Global Step: 10290 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:57:38,868-Speed 9280.39 samples/sec Loss 13.6122 LearningRate 0.0939 Epoch: 0 Global Step: 10300 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:57:39,947-Speed 9498.16 samples/sec Loss 13.6817 LearningRate 0.0939 Epoch: 0 Global Step: 10310 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:57:41,028-Speed 9479.07 samples/sec Loss 13.6109 LearningRate 0.0939 Epoch: 0 Global Step: 10320 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:57:42,085-Speed 9693.86 samples/sec Loss 13.8151 LearningRate 0.0939 Epoch: 0 Global Step: 10330 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:57:43,178-Speed 9368.60 samples/sec Loss 13.7769 LearningRate 0.0939 Epoch: 0 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:44,265-Speed 9430.40 samples/sec Loss 13.6004 LearningRate 0.0939 Epoch: 0 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:45,304-Speed 9865.20 samples/sec Loss 13.6354 LearningRate 0.0939 Epoch: 0 Global Step: 10360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:46,387-Speed 9464.54 samples/sec Loss 13.6196 LearningRate 0.0939 Epoch: 0 Global Step: 10370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:47,496-Speed 9240.61 samples/sec Loss 13.6462 LearningRate 0.0939 Epoch: 0 Global Step: 10380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:48,626-Speed 9061.92 samples/sec Loss 13.6245 LearningRate 0.0939 Epoch: 0 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:49,700-Speed 9540.01 samples/sec Loss 13.6304 LearningRate 0.0939 Epoch: 0 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:50,744-Speed 9817.20 samples/sec Loss 13.5567 LearningRate 0.0939 Epoch: 0 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:51,810-Speed 9612.87 samples/sec Loss 13.6514 LearningRate 0.0939 Epoch: 0 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:52,890-Speed 9484.26 samples/sec Loss 13.4888 LearningRate 0.0938 Epoch: 0 Global Step: 10430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:53,969-Speed 9498.50 samples/sec Loss 13.4899 LearningRate 0.0938 Epoch: 0 Global Step: 10440 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:57:55,085-Speed 9182.58 samples/sec Loss 13.5914 LearningRate 0.0938 Epoch: 0 Global Step: 10450 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:57:56,193-Speed 9244.70 samples/sec Loss 13.4860 LearningRate 0.0938 Epoch: 0 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:57,301-Speed 9242.74 samples/sec Loss 13.5170 LearningRate 0.0938 Epoch: 0 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:58,379-Speed 9506.60 samples/sec Loss 13.6101 LearningRate 0.0938 Epoch: 0 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:57:59,455-Speed 9527.97 samples/sec Loss 13.4814 LearningRate 0.0938 Epoch: 0 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:00,570-Speed 9182.89 samples/sec Loss 13.5444 LearningRate 0.0938 Epoch: 0 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:01,637-Speed 9605.54 samples/sec Loss 13.6231 LearningRate 0.0938 Epoch: 0 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:02,737-Speed 9315.11 samples/sec Loss 13.4149 LearningRate 0.0938 Epoch: 0 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:03,835-Speed 9328.59 samples/sec Loss 13.4588 LearningRate 0.0938 Epoch: 0 Global Step: 10530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:04,927-Speed 9385.33 samples/sec Loss 13.5180 LearningRate 0.0938 Epoch: 0 Global Step: 10540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:05,992-Speed 9616.64 samples/sec Loss 13.6097 LearningRate 0.0938 Epoch: 0 Global Step: 10550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:07,131-Speed 8997.34 samples/sec Loss 13.4665 LearningRate 0.0938 Epoch: 0 Global Step: 10560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:08,228-Speed 9340.52 samples/sec Loss 13.5826 LearningRate 0.0938 Epoch: 0 Global Step: 10570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:09,305-Speed 9514.50 samples/sec Loss 13.3947 LearningRate 0.0938 Epoch: 0 Global Step: 10580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:10,384-Speed 9496.78 samples/sec Loss 13.5758 LearningRate 0.0938 Epoch: 0 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:11,478-Speed 9369.71 samples/sec Loss 13.4992 LearningRate 0.0938 Epoch: 0 Global Step: 10600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:12,584-Speed 9265.96 samples/sec Loss 13.5095 LearningRate 0.0937 Epoch: 0 Global Step: 10610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:13,649-Speed 9619.38 samples/sec Loss 13.3801 LearningRate 0.0937 Epoch: 0 Global Step: 10620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:14,746-Speed 9339.84 samples/sec Loss 13.4453 LearningRate 0.0937 Epoch: 0 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:15,842-Speed 9351.89 samples/sec Loss 13.2912 LearningRate 0.0937 Epoch: 0 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:16,931-Speed 9407.13 samples/sec Loss 13.3842 LearningRate 0.0937 Epoch: 0 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:18,052-Speed 9140.58 samples/sec Loss 13.4956 LearningRate 0.0937 Epoch: 0 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:19,151-Speed 9317.10 samples/sec Loss 13.4850 LearningRate 0.0937 Epoch: 0 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:20,205-Speed 9721.04 samples/sec Loss 13.4888 LearningRate 0.0937 Epoch: 0 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:21,285-Speed 9494.91 samples/sec Loss 13.4634 LearningRate 0.0937 Epoch: 0 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:22,363-Speed 9501.75 samples/sec Loss 13.5067 LearningRate 0.0937 Epoch: 0 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:23,471-Speed 9244.70 samples/sec Loss 13.4755 LearningRate 0.0937 Epoch: 0 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:24,571-Speed 9313.33 samples/sec Loss 13.3568 LearningRate 0.0937 Epoch: 0 Global Step: 10720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:25,605-Speed 9910.73 samples/sec Loss 13.5001 LearningRate 0.0937 Epoch: 0 Global Step: 10730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:58:26,653-Speed 9776.31 samples/sec Loss 13.3838 LearningRate 0.0937 Epoch: 0 Global Step: 10740 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:58:27,768-Speed 9192.10 samples/sec Loss 13.4533 LearningRate 0.0937 Epoch: 0 Global Step: 10750 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:58:28,836-Speed 9597.33 samples/sec Loss 13.3535 LearningRate 0.0937 Epoch: 0 Global Step: 10760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:58:29,972-Speed 9019.85 samples/sec Loss 13.3199 LearningRate 0.0937 Epoch: 0 Global Step: 10770 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:58:31,065-Speed 9369.86 samples/sec Loss 13.3470 LearningRate 0.0936 Epoch: 0 Global Step: 10780 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:58:32,158-Speed 9374.92 samples/sec Loss 13.4602 LearningRate 0.0936 Epoch: 0 Global Step: 10790 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:58:33,234-Speed 9530.16 samples/sec Loss 13.3897 LearningRate 0.0936 Epoch: 0 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:34,314-Speed 9487.29 samples/sec Loss 13.3459 LearningRate 0.0936 Epoch: 0 Global Step: 10810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:35,396-Speed 9462.40 samples/sec Loss 13.4320 LearningRate 0.0936 Epoch: 0 Global Step: 10820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:36,519-Speed 9122.12 samples/sec Loss 13.3965 LearningRate 0.0936 Epoch: 0 Global Step: 10830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:37,591-Speed 9564.64 samples/sec Loss 13.2727 LearningRate 0.0936 Epoch: 0 Global Step: 10840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:38,663-Speed 9561.60 samples/sec Loss 13.2606 LearningRate 0.0936 Epoch: 0 Global Step: 10850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:39,750-Speed 9424.23 samples/sec Loss 13.3826 LearningRate 0.0936 Epoch: 0 Global Step: 10860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:40,849-Speed 9321.18 samples/sec Loss 13.4596 LearningRate 0.0936 Epoch: 0 Global Step: 10870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:41,937-Speed 9420.36 samples/sec Loss 13.2379 LearningRate 0.0936 Epoch: 0 Global Step: 10880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:43,032-Speed 9348.58 samples/sec Loss 13.2307 LearningRate 0.0936 Epoch: 0 Global Step: 10890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:44,103-Speed 9574.31 samples/sec Loss 13.3384 LearningRate 0.0936 Epoch: 0 Global Step: 10900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:45,216-Speed 9209.52 samples/sec Loss 13.2638 LearningRate 0.0936 Epoch: 0 Global Step: 10910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:46,290-Speed 9535.49 samples/sec Loss 13.3883 LearningRate 0.0936 Epoch: 0 Global Step: 10920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:47,369-Speed 9498.24 samples/sec Loss 13.2771 LearningRate 0.0936 Epoch: 0 Global Step: 10930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:48,469-Speed 9317.48 samples/sec Loss 13.3046 LearningRate 0.0936 Epoch: 0 Global Step: 10940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:49,583-Speed 9199.58 samples/sec Loss 13.3710 LearningRate 0.0935 Epoch: 0 Global Step: 10950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:50,692-Speed 9235.20 samples/sec Loss 13.3177 LearningRate 0.0935 Epoch: 0 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:51,782-Speed 9396.22 samples/sec Loss 13.2272 LearningRate 0.0935 Epoch: 0 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:52,908-Speed 9099.05 samples/sec Loss 13.3388 LearningRate 0.0935 Epoch: 0 Global Step: 10980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:54,025-Speed 9175.35 samples/sec Loss 13.3432 LearningRate 0.0935 Epoch: 0 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:55,095-Speed 9571.15 samples/sec Loss 13.1947 LearningRate 0.0935 Epoch: 0 Global Step: 11000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:56,189-Speed 9369.05 samples/sec Loss 13.3308 LearningRate 0.0935 Epoch: 0 Global Step: 11010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:58:57,274-Speed 9453.33 samples/sec Loss 13.4233 LearningRate 0.0935 Epoch: 0 Global Step: 11020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:58,352-Speed 9500.58 samples/sec Loss 13.3589 LearningRate 0.0935 Epoch: 0 Global Step: 11030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:58:59,471-Speed 9160.29 samples/sec Loss 13.3372 LearningRate 0.0935 Epoch: 0 Global Step: 11040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:00,562-Speed 9390.26 samples/sec Loss 13.4123 LearningRate 0.0935 Epoch: 0 Global Step: 11050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:01,639-Speed 9510.30 samples/sec Loss 13.2353 LearningRate 0.0935 Epoch: 0 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:02,724-Speed 9441.62 samples/sec Loss 13.1576 LearningRate 0.0935 Epoch: 0 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:03,807-Speed 9459.56 samples/sec Loss 13.2591 LearningRate 0.0935 Epoch: 0 Global Step: 11080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:04,922-Speed 9191.72 samples/sec Loss 13.1777 LearningRate 0.0935 Epoch: 0 Global Step: 11090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:05,990-Speed 9598.36 samples/sec Loss 13.2163 LearningRate 0.0935 Epoch: 0 Global Step: 11100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:07,062-Speed 9556.80 samples/sec Loss 13.0921 LearningRate 0.0935 Epoch: 0 Global Step: 11110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:08,126-Speed 9626.43 samples/sec Loss 13.2595 LearningRate 0.0934 Epoch: 0 Global Step: 11120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:09,202-Speed 9521.30 samples/sec Loss 13.2789 LearningRate 0.0934 Epoch: 0 Global Step: 11130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:10,317-Speed 9194.02 samples/sec Loss 13.3091 LearningRate 0.0934 Epoch: 0 Global Step: 11140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:11,395-Speed 9503.99 samples/sec Loss 13.2040 LearningRate 0.0934 Epoch: 0 Global Step: 11150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:12,480-Speed 9441.01 samples/sec Loss 13.2387 LearningRate 0.0934 Epoch: 0 Global Step: 11160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:13,568-Speed 9416.32 samples/sec Loss 13.2121 LearningRate 0.0934 Epoch: 0 Global Step: 11170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:14,623-Speed 9711.85 samples/sec Loss 13.2449 LearningRate 0.0934 Epoch: 0 Global Step: 11180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:15,668-Speed 9809.95 samples/sec Loss 13.1522 LearningRate 0.0934 Epoch: 0 Global Step: 11190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:16,716-Speed 9777.29 samples/sec Loss 13.2581 LearningRate 0.0934 Epoch: 0 Global Step: 11200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:17,803-Speed 9420.12 samples/sec Loss 13.0787 LearningRate 0.0934 Epoch: 0 Global Step: 11210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:18,873-Speed 9577.86 samples/sec Loss 13.2372 LearningRate 0.0934 Epoch: 0 Global Step: 11220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:19,897-Speed 10008.41 samples/sec Loss 13.1641 LearningRate 0.0934 Epoch: 0 Global Step: 11230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 11:59:20,976-Speed 9495.70 samples/sec Loss 13.3030 LearningRate 0.0934 Epoch: 0 Global Step: 11240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 11:59:22,037-Speed 9661.51 samples/sec Loss 13.1580 LearningRate 0.0934 Epoch: 0 Global Step: 11250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 11:59:23,121-Speed 9449.94 samples/sec Loss 13.0880 LearningRate 0.0934 Epoch: 0 Global Step: 11260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 11:59:24,217-Speed 9350.55 samples/sec Loss 13.1713 LearningRate 0.0934 Epoch: 0 Global Step: 11270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 11:59:25,303-Speed 9433.51 samples/sec Loss 13.1631 LearningRate 0.0934 Epoch: 0 Global Step: 11280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 11:59:26,418-Speed 9191.36 samples/sec Loss 13.1579 LearningRate 0.0934 Epoch: 0 Global Step: 11290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 11:59:27,507-Speed 9404.81 samples/sec Loss 13.2202 LearningRate 0.0933 Epoch: 0 Global Step: 11300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 11:59:28,566-Speed 9681.14 samples/sec Loss 13.1730 LearningRate 0.0933 Epoch: 0 Global Step: 11310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 11:59:29,692-Speed 9092.07 samples/sec Loss 13.1400 LearningRate 0.0933 Epoch: 0 Global Step: 11320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 11:59:30,773-Speed 9483.39 samples/sec Loss 13.2696 LearningRate 0.0933 Epoch: 0 Global Step: 11330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:31,836-Speed 9635.38 samples/sec Loss 13.0583 LearningRate 0.0933 Epoch: 0 Global Step: 11340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:32,898-Speed 9651.54 samples/sec Loss 13.1067 LearningRate 0.0933 Epoch: 0 Global Step: 11350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:33,988-Speed 9399.40 samples/sec Loss 13.0934 LearningRate 0.0933 Epoch: 0 Global Step: 11360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:35,036-Speed 9774.23 samples/sec Loss 13.1968 LearningRate 0.0933 Epoch: 0 Global Step: 11370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:36,093-Speed 9694.05 samples/sec Loss 13.2346 LearningRate 0.0933 Epoch: 0 Global Step: 11380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:37,144-Speed 9748.37 samples/sec Loss 13.0503 LearningRate 0.0933 Epoch: 0 Global Step: 11390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:38,202-Speed 9684.53 samples/sec Loss 13.1405 LearningRate 0.0933 Epoch: 0 Global Step: 11400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:39,236-Speed 9907.88 samples/sec Loss 13.2056 LearningRate 0.0933 Epoch: 0 Global Step: 11410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:40,329-Speed 9373.17 samples/sec Loss 13.1356 LearningRate 0.0933 Epoch: 0 Global Step: 11420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 11:59:41,406-Speed 9510.54 samples/sec Loss 13.1380 LearningRate 0.0933 Epoch: 0 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:42,462-Speed 9706.65 samples/sec Loss 13.0074 LearningRate 0.0933 Epoch: 0 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:43,538-Speed 9523.76 samples/sec Loss 12.9947 LearningRate 0.0933 Epoch: 0 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:44,651-Speed 9203.28 samples/sec Loss 13.1376 LearningRate 0.0933 Epoch: 0 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:45,745-Speed 9370.67 samples/sec Loss 13.1798 LearningRate 0.0932 Epoch: 0 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:46,791-Speed 9801.14 samples/sec Loss 13.0685 LearningRate 0.0932 Epoch: 0 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:47,930-Speed 8989.10 samples/sec Loss 13.1962 LearningRate 0.0932 Epoch: 0 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:49,044-Speed 9202.26 samples/sec Loss 13.0952 LearningRate 0.0932 Epoch: 0 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:50,175-Speed 9060.94 samples/sec Loss 13.2808 LearningRate 0.0932 Epoch: 0 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:51,239-Speed 9625.89 samples/sec Loss 13.0085 LearningRate 0.0932 Epoch: 0 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:52,316-Speed 9512.02 samples/sec Loss 13.0292 LearningRate 0.0932 Epoch: 0 Global Step: 11530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:59:53,415-Speed 9328.41 samples/sec Loss 13.0972 LearningRate 0.0932 Epoch: 0 Global Step: 11540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 11:59:54,532-Speed 9164.96 samples/sec Loss 12.8728 LearningRate 0.0932 Epoch: 0 Global Step: 11550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:55,638-Speed 9266.00 samples/sec Loss 12.9844 LearningRate 0.0932 Epoch: 0 Global Step: 11560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:56,710-Speed 9559.35 samples/sec Loss 13.2074 LearningRate 0.0932 Epoch: 0 Global Step: 11570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:57,752-Speed 9832.97 samples/sec Loss 13.1581 LearningRate 0.0932 Epoch: 0 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 11:59:58,887-Speed 9027.28 samples/sec Loss 13.0190 LearningRate 0.0932 Epoch: 0 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:00,005-Speed 9165.10 samples/sec Loss 13.0243 LearningRate 0.0932 Epoch: 0 Global Step: 11600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:01,096-Speed 9389.09 samples/sec Loss 12.9247 LearningRate 0.0932 Epoch: 0 Global Step: 11610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:02,213-Speed 9176.56 samples/sec Loss 13.0095 LearningRate 0.0932 Epoch: 0 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:03,302-Speed 9407.28 samples/sec Loss 12.9081 LearningRate 0.0932 Epoch: 0 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:04,400-Speed 9326.97 samples/sec Loss 13.1367 LearningRate 0.0931 Epoch: 0 Global Step: 11640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:00:05,472-Speed 9565.72 samples/sec Loss 12.9357 LearningRate 0.0931 Epoch: 0 Global Step: 11650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:00:06,524-Speed 9738.14 samples/sec Loss 13.0027 LearningRate 0.0931 Epoch: 0 Global Step: 11660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:00:07,585-Speed 9655.01 samples/sec Loss 13.0084 LearningRate 0.0931 Epoch: 0 Global Step: 11670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:00:08,648-Speed 9639.88 samples/sec Loss 13.0947 LearningRate 0.0931 Epoch: 0 Global Step: 11680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:00:09,779-Speed 9058.75 samples/sec Loss 13.0559 LearningRate 0.0931 Epoch: 0 Global Step: 11690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:00:10,832-Speed 9726.27 samples/sec Loss 12.8795 LearningRate 0.0931 Epoch: 0 Global Step: 11700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:00:11,930-Speed 9330.14 samples/sec Loss 13.0638 LearningRate 0.0931 Epoch: 0 Global Step: 11710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:00:12,986-Speed 9702.07 samples/sec Loss 12.9762 LearningRate 0.0931 Epoch: 0 Global Step: 11720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:00:14,094-Speed 9255.38 samples/sec Loss 13.0909 LearningRate 0.0931 Epoch: 0 Global Step: 11730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:00:15,202-Speed 9242.21 samples/sec Loss 12.9561 LearningRate 0.0931 Epoch: 0 Global Step: 11740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:16,285-Speed 9464.03 samples/sec Loss 12.9493 LearningRate 0.0931 Epoch: 0 Global Step: 11750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:17,344-Speed 9674.00 samples/sec Loss 13.0619 LearningRate 0.0931 Epoch: 0 Global Step: 11760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:18,418-Speed 9539.54 samples/sec Loss 12.9873 LearningRate 0.0931 Epoch: 0 Global Step: 11770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:19,490-Speed 9556.51 samples/sec Loss 12.9322 LearningRate 0.0931 Epoch: 0 Global Step: 11780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:20,571-Speed 9484.36 samples/sec Loss 13.0992 LearningRate 0.0931 Epoch: 0 Global Step: 11790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:21,622-Speed 9744.22 samples/sec Loss 12.8867 LearningRate 0.0931 Epoch: 0 Global Step: 11800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:22,741-Speed 9154.60 samples/sec Loss 12.8418 LearningRate 0.0930 Epoch: 0 Global Step: 11810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:23,824-Speed 9465.83 samples/sec Loss 12.9154 LearningRate 0.0930 Epoch: 0 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:24,898-Speed 9535.03 samples/sec Loss 12.9473 LearningRate 0.0930 Epoch: 0 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:00:26,013-Speed 9191.84 samples/sec Loss 12.7851 LearningRate 0.0930 Epoch: 0 Global Step: 11840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:00:27,098-Speed 9443.95 samples/sec Loss 13.0222 LearningRate 0.0930 Epoch: 0 Global Step: 11850 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:00:28,168-Speed 9575.05 samples/sec Loss 12.7787 LearningRate 0.0930 Epoch: 0 Global Step: 11860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:00:29,255-Speed 9429.12 samples/sec Loss 12.9065 LearningRate 0.0930 Epoch: 0 Global Step: 11870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:00:30,347-Speed 9383.92 samples/sec Loss 12.9979 LearningRate 0.0930 Epoch: 0 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:00:31,426-Speed 9493.66 samples/sec Loss 12.9361 LearningRate 0.0930 Epoch: 0 Global Step: 11890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:00:32,480-Speed 9724.43 samples/sec Loss 12.8256 LearningRate 0.0930 Epoch: 0 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:00:33,563-Speed 9463.13 samples/sec Loss 12.8610 LearningRate 0.0930 Epoch: 0 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:00:34,659-Speed 9346.46 samples/sec Loss 12.7382 LearningRate 0.0930 Epoch: 0 Global Step: 11920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:00:35,748-Speed 9408.23 samples/sec Loss 12.9615 LearningRate 0.0930 Epoch: 0 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:00:36,846-Speed 9333.22 samples/sec Loss 12.8015 LearningRate 0.0930 Epoch: 0 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:00:37,926-Speed 9480.63 samples/sec Loss 13.0450 LearningRate 0.0930 Epoch: 0 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:00:39,038-Speed 9215.17 samples/sec Loss 12.8651 LearningRate 0.0930 Epoch: 0 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:00:40,119-Speed 9480.64 samples/sec Loss 12.9885 LearningRate 0.0930 Epoch: 0 Global Step: 11970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:00:41,158-Speed 9855.73 samples/sec Loss 12.8115 LearningRate 0.0930 Epoch: 0 Global Step: 11980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:00:42,227-Speed 9583.40 samples/sec Loss 12.9106 LearningRate 0.0929 Epoch: 0 Global Step: 11990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:00:43,336-Speed 9237.69 samples/sec Loss 12.7730 LearningRate 0.0929 Epoch: 0 Global Step: 12000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:01:04,984-[lfw][12000]XNorm: 14.524520 Training: 2022-04-11 12:01:04,985-[lfw][12000]Accuracy-Flip: 0.99250+-0.00449 Training: 2022-04-11 12:01:04,985-[lfw][12000]Accuracy-Highest: 0.99250 Training: 2022-04-11 12:01:30,027-[cfp_fp][12000]XNorm: 12.549293 Training: 2022-04-11 12:01:30,027-[cfp_fp][12000]Accuracy-Flip: 0.90814+-0.01084 Training: 2022-04-11 12:01:30,028-[cfp_fp][12000]Accuracy-Highest: 0.90814 Training: 2022-04-11 12:01:51,618-[agedb_30][12000]XNorm: 14.055729 Training: 2022-04-11 12:01:51,619-[agedb_30][12000]Accuracy-Flip: 0.91983+-0.01552 Training: 2022-04-11 12:01:51,620-[agedb_30][12000]Accuracy-Highest: 0.91983 Training: 2022-04-11 12:01:52,711-Speed 147.61 samples/sec Loss 12.8340 LearningRate 0.0929 Epoch: 0 Global Step: 12010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:01:53,772-Speed 9651.64 samples/sec Loss 12.8630 LearningRate 0.0929 Epoch: 0 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:01:54,852-Speed 9490.37 samples/sec Loss 12.8801 LearningRate 0.0929 Epoch: 0 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:01:55,954-Speed 9296.69 samples/sec Loss 12.8837 LearningRate 0.0929 Epoch: 0 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:01:57,024-Speed 9570.91 samples/sec Loss 12.8698 LearningRate 0.0929 Epoch: 0 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:01:58,082-Speed 9681.11 samples/sec Loss 12.8692 LearningRate 0.0929 Epoch: 0 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:01:59,141-Speed 9680.37 samples/sec Loss 12.9020 LearningRate 0.0929 Epoch: 0 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:00,216-Speed 9535.36 samples/sec Loss 12.6502 LearningRate 0.0929 Epoch: 0 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:01,314-Speed 9326.91 samples/sec Loss 12.8896 LearningRate 0.0929 Epoch: 0 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:02,378-Speed 9629.61 samples/sec Loss 12.8220 LearningRate 0.0929 Epoch: 0 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:03,502-Speed 9115.10 samples/sec Loss 12.7903 LearningRate 0.0929 Epoch: 0 Global Step: 12110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:04,578-Speed 9522.17 samples/sec Loss 12.8962 LearningRate 0.0929 Epoch: 0 Global Step: 12120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:05,657-Speed 9495.88 samples/sec Loss 12.7374 LearningRate 0.0929 Epoch: 0 Global Step: 12130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:06,700-Speed 9824.96 samples/sec Loss 12.6687 LearningRate 0.0929 Epoch: 0 Global Step: 12140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:07,779-Speed 9493.39 samples/sec Loss 12.7032 LearningRate 0.0929 Epoch: 0 Global Step: 12150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:08,878-Speed 9326.71 samples/sec Loss 12.7360 LearningRate 0.0928 Epoch: 0 Global Step: 12160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:09,937-Speed 9675.73 samples/sec Loss 12.7242 LearningRate 0.0928 Epoch: 0 Global Step: 12170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:10,974-Speed 9881.33 samples/sec Loss 12.7418 LearningRate 0.0928 Epoch: 0 Global Step: 12180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:12,056-Speed 9468.80 samples/sec Loss 12.8030 LearningRate 0.0928 Epoch: 0 Global Step: 12190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:13,099-Speed 9827.26 samples/sec Loss 12.7961 LearningRate 0.0928 Epoch: 0 Global Step: 12200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:14,187-Speed 9415.64 samples/sec Loss 12.8507 LearningRate 0.0928 Epoch: 0 Global Step: 12210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:15,283-Speed 9349.35 samples/sec Loss 12.8995 LearningRate 0.0928 Epoch: 0 Global Step: 12220 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:16,334-Speed 9743.59 samples/sec Loss 12.6023 LearningRate 0.0928 Epoch: 0 Global Step: 12230 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:17,400-Speed 9612.69 samples/sec Loss 12.9008 LearningRate 0.0928 Epoch: 0 Global Step: 12240 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:18,493-Speed 9375.45 samples/sec Loss 12.8437 LearningRate 0.0928 Epoch: 0 Global Step: 12250 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:19,591-Speed 9332.63 samples/sec Loss 12.6981 LearningRate 0.0928 Epoch: 0 Global Step: 12260 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:20,678-Speed 9421.80 samples/sec Loss 12.7391 LearningRate 0.0928 Epoch: 0 Global Step: 12270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:21,756-Speed 9511.00 samples/sec Loss 12.8582 LearningRate 0.0928 Epoch: 0 Global Step: 12280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:22,815-Speed 9668.45 samples/sec Loss 12.8479 LearningRate 0.0928 Epoch: 0 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:23,907-Speed 9381.71 samples/sec Loss 12.7283 LearningRate 0.0928 Epoch: 0 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:24,983-Speed 9523.25 samples/sec Loss 12.6446 LearningRate 0.0928 Epoch: 0 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:26,039-Speed 9709.94 samples/sec Loss 12.6791 LearningRate 0.0928 Epoch: 0 Global Step: 12320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:27,150-Speed 9218.81 samples/sec Loss 12.7706 LearningRate 0.0927 Epoch: 0 Global Step: 12330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:28,213-Speed 9635.32 samples/sec Loss 12.8033 LearningRate 0.0927 Epoch: 0 Global Step: 12340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:29,267-Speed 9727.40 samples/sec Loss 12.6068 LearningRate 0.0927 Epoch: 0 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:30,344-Speed 9516.66 samples/sec Loss 12.7052 LearningRate 0.0927 Epoch: 0 Global Step: 12360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:31,444-Speed 9311.20 samples/sec Loss 12.6010 LearningRate 0.0927 Epoch: 0 Global Step: 12370 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:32,492-Speed 9773.81 samples/sec Loss 12.6863 LearningRate 0.0927 Epoch: 0 Global Step: 12380 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:33,589-Speed 9339.61 samples/sec Loss 12.6741 LearningRate 0.0927 Epoch: 0 Global Step: 12390 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:34,633-Speed 9819.56 samples/sec Loss 12.7076 LearningRate 0.0927 Epoch: 0 Global Step: 12400 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:35,692-Speed 9669.50 samples/sec Loss 12.7454 LearningRate 0.0927 Epoch: 0 Global Step: 12410 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:36,712-Speed 10051.30 samples/sec Loss 12.6901 LearningRate 0.0927 Epoch: 0 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:37,797-Speed 9441.27 samples/sec Loss 12.6408 LearningRate 0.0927 Epoch: 0 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:38,905-Speed 9247.23 samples/sec Loss 12.6322 LearningRate 0.0927 Epoch: 0 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:40,010-Speed 9275.92 samples/sec Loss 12.5561 LearningRate 0.0927 Epoch: 0 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:41,104-Speed 9364.34 samples/sec Loss 12.7031 LearningRate 0.0927 Epoch: 0 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:42,150-Speed 9797.71 samples/sec Loss 12.7378 LearningRate 0.0927 Epoch: 0 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:43,239-Speed 9402.89 samples/sec Loss 12.8337 LearningRate 0.0927 Epoch: 0 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:44,290-Speed 9751.64 samples/sec Loss 12.6136 LearningRate 0.0927 Epoch: 0 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:45,349-Speed 9680.94 samples/sec Loss 12.5764 LearningRate 0.0927 Epoch: 0 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:46,437-Speed 9414.26 samples/sec Loss 12.5857 LearningRate 0.0926 Epoch: 0 Global Step: 12510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:47,528-Speed 9389.55 samples/sec Loss 12.8426 LearningRate 0.0926 Epoch: 0 Global Step: 12520 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:02:48,608-Speed 9482.00 samples/sec Loss 12.6352 LearningRate 0.0926 Epoch: 0 Global Step: 12530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:49,686-Speed 9511.43 samples/sec Loss 12.5440 LearningRate 0.0926 Epoch: 0 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:50,768-Speed 9461.66 samples/sec Loss 12.6327 LearningRate 0.0926 Epoch: 0 Global Step: 12550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:51,857-Speed 9408.28 samples/sec Loss 12.6484 LearningRate 0.0926 Epoch: 0 Global Step: 12560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:52,975-Speed 9170.16 samples/sec Loss 12.5944 LearningRate 0.0926 Epoch: 0 Global Step: 12570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:54,034-Speed 9673.76 samples/sec Loss 12.6673 LearningRate 0.0926 Epoch: 0 Global Step: 12580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:55,102-Speed 9592.64 samples/sec Loss 12.6545 LearningRate 0.0926 Epoch: 0 Global Step: 12590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:56,175-Speed 9553.68 samples/sec Loss 12.4322 LearningRate 0.0926 Epoch: 0 Global Step: 12600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:57,241-Speed 9613.22 samples/sec Loss 12.6872 LearningRate 0.0926 Epoch: 0 Global Step: 12610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:58,334-Speed 9374.16 samples/sec Loss 12.5953 LearningRate 0.0926 Epoch: 0 Global Step: 12620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:02:59,372-Speed 9867.45 samples/sec Loss 12.5456 LearningRate 0.0926 Epoch: 0 Global Step: 12630 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:00,427-Speed 9716.97 samples/sec Loss 12.6085 LearningRate 0.0926 Epoch: 0 Global Step: 12640 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:01,514-Speed 9417.48 samples/sec Loss 12.6472 LearningRate 0.0926 Epoch: 0 Global Step: 12650 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:02,544-Speed 9954.67 samples/sec Loss 12.6460 LearningRate 0.0926 Epoch: 0 Global Step: 12660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:03,575-Speed 9937.05 samples/sec Loss 12.6353 LearningRate 0.0926 Epoch: 0 Global Step: 12670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:04,720-Speed 8949.53 samples/sec Loss 12.7286 LearningRate 0.0925 Epoch: 0 Global Step: 12680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:05,778-Speed 9680.63 samples/sec Loss 12.6698 LearningRate 0.0925 Epoch: 0 Global Step: 12690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:06,850-Speed 9554.75 samples/sec Loss 12.4817 LearningRate 0.0925 Epoch: 0 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:07,951-Speed 9309.86 samples/sec Loss 12.6575 LearningRate 0.0925 Epoch: 0 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:09,012-Speed 9659.38 samples/sec Loss 12.6429 LearningRate 0.0925 Epoch: 0 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:10,075-Speed 9631.48 samples/sec Loss 12.5919 LearningRate 0.0925 Epoch: 0 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:11,171-Speed 9353.79 samples/sec Loss 12.5849 LearningRate 0.0925 Epoch: 0 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:12,257-Speed 9428.35 samples/sec Loss 12.6372 LearningRate 0.0925 Epoch: 0 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:13,366-Speed 9240.61 samples/sec Loss 12.6055 LearningRate 0.0925 Epoch: 0 Global Step: 12760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:14,419-Speed 9734.23 samples/sec Loss 12.7091 LearningRate 0.0925 Epoch: 0 Global Step: 12770 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:15,464-Speed 9810.24 samples/sec Loss 12.6457 LearningRate 0.0925 Epoch: 0 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:16,508-Speed 9813.66 samples/sec Loss 12.5026 LearningRate 0.0925 Epoch: 0 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:17,563-Speed 9705.50 samples/sec Loss 12.6652 LearningRate 0.0925 Epoch: 0 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:18,653-Speed 9401.11 samples/sec Loss 12.6603 LearningRate 0.0925 Epoch: 0 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:19,725-Speed 9558.01 samples/sec Loss 12.5191 LearningRate 0.0925 Epoch: 0 Global Step: 12820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:20,810-Speed 9449.85 samples/sec Loss 12.5762 LearningRate 0.0925 Epoch: 0 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:21,892-Speed 9461.22 samples/sec Loss 12.6681 LearningRate 0.0925 Epoch: 0 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:22,996-Speed 9279.60 samples/sec Loss 12.5955 LearningRate 0.0924 Epoch: 0 Global Step: 12850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:24,086-Speed 9398.97 samples/sec Loss 12.5997 LearningRate 0.0924 Epoch: 0 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:25,218-Speed 9054.20 samples/sec Loss 12.6456 LearningRate 0.0924 Epoch: 0 Global Step: 12870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:26,266-Speed 9776.85 samples/sec Loss 12.5949 LearningRate 0.0924 Epoch: 0 Global Step: 12880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:27,336-Speed 9582.01 samples/sec Loss 12.4659 LearningRate 0.0924 Epoch: 0 Global Step: 12890 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:28,430-Speed 9361.11 samples/sec Loss 12.4890 LearningRate 0.0924 Epoch: 0 Global Step: 12900 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:29,512-Speed 9468.99 samples/sec Loss 12.4189 LearningRate 0.0924 Epoch: 0 Global Step: 12910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:30,558-Speed 9794.71 samples/sec Loss 12.5972 LearningRate 0.0924 Epoch: 0 Global Step: 12920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:31,653-Speed 9359.44 samples/sec Loss 12.6682 LearningRate 0.0924 Epoch: 0 Global Step: 12930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:32,748-Speed 9358.70 samples/sec Loss 12.4769 LearningRate 0.0924 Epoch: 0 Global Step: 12940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:33,874-Speed 9100.48 samples/sec Loss 12.5518 LearningRate 0.0924 Epoch: 0 Global Step: 12950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:34,998-Speed 9117.73 samples/sec Loss 12.5567 LearningRate 0.0924 Epoch: 0 Global Step: 12960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:36,087-Speed 9402.58 samples/sec Loss 12.5242 LearningRate 0.0924 Epoch: 0 Global Step: 12970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:37,150-Speed 9639.61 samples/sec Loss 12.5538 LearningRate 0.0924 Epoch: 0 Global Step: 12980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:38,215-Speed 9621.35 samples/sec Loss 12.5736 LearningRate 0.0924 Epoch: 0 Global Step: 12990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:39,319-Speed 9283.11 samples/sec Loss 12.4390 LearningRate 0.0924 Epoch: 0 Global Step: 13000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:40,429-Speed 9228.65 samples/sec Loss 12.5126 LearningRate 0.0924 Epoch: 0 Global Step: 13010 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:41,490-Speed 9659.46 samples/sec Loss 12.3492 LearningRate 0.0924 Epoch: 0 Global Step: 13020 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:42,547-Speed 9698.05 samples/sec Loss 12.3923 LearningRate 0.0923 Epoch: 0 Global Step: 13030 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:03:43,630-Speed 9454.40 samples/sec Loss 12.4301 LearningRate 0.0923 Epoch: 0 Global Step: 13040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:44,700-Speed 9575.96 samples/sec Loss 12.4297 LearningRate 0.0923 Epoch: 0 Global Step: 13050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:45,767-Speed 9611.30 samples/sec Loss 12.5151 LearningRate 0.0923 Epoch: 0 Global Step: 13060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:46,824-Speed 9689.85 samples/sec Loss 12.3869 LearningRate 0.0923 Epoch: 0 Global Step: 13070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:03:47,957-Speed 9041.46 samples/sec Loss 12.3746 LearningRate 0.0923 Epoch: 0 Global Step: 13080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:03:49,046-Speed 9415.46 samples/sec Loss 12.3226 LearningRate 0.0923 Epoch: 0 Global Step: 13090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:03:50,131-Speed 9438.48 samples/sec Loss 12.3378 LearningRate 0.0923 Epoch: 0 Global Step: 13100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:03:51,203-Speed 9557.71 samples/sec Loss 12.4092 LearningRate 0.0923 Epoch: 0 Global Step: 13110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:03:52,265-Speed 9651.65 samples/sec Loss 12.4583 LearningRate 0.0923 Epoch: 0 Global Step: 13120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:03:53,358-Speed 9370.66 samples/sec Loss 12.4302 LearningRate 0.0923 Epoch: 0 Global Step: 13130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:03:54,423-Speed 9622.44 samples/sec Loss 12.4398 LearningRate 0.0923 Epoch: 0 Global Step: 13140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:03:55,507-Speed 9448.85 samples/sec Loss 12.3969 LearningRate 0.0923 Epoch: 0 Global Step: 13150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:03:56,605-Speed 9332.71 samples/sec Loss 12.2608 LearningRate 0.0923 Epoch: 0 Global Step: 13160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:03:57,701-Speed 9348.55 samples/sec Loss 12.3879 LearningRate 0.0923 Epoch: 0 Global Step: 13170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:58,805-Speed 9280.93 samples/sec Loss 12.5216 LearningRate 0.0923 Epoch: 0 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:03:59,909-Speed 9286.64 samples/sec Loss 12.4154 LearningRate 0.0923 Epoch: 0 Global Step: 13190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:01,013-Speed 9279.19 samples/sec Loss 12.2881 LearningRate 0.0922 Epoch: 0 Global Step: 13200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:02,141-Speed 9084.82 samples/sec Loss 12.5593 LearningRate 0.0922 Epoch: 0 Global Step: 13210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:03,223-Speed 9469.12 samples/sec Loss 12.4628 LearningRate 0.0922 Epoch: 0 Global Step: 13220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:04,284-Speed 9664.17 samples/sec Loss 12.4251 LearningRate 0.0922 Epoch: 0 Global Step: 13230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:05,333-Speed 9762.41 samples/sec Loss 12.4314 LearningRate 0.0922 Epoch: 0 Global Step: 13240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:06,410-Speed 9510.47 samples/sec Loss 12.4674 LearningRate 0.0922 Epoch: 0 Global Step: 13250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:07,527-Speed 9174.68 samples/sec Loss 12.3817 LearningRate 0.0922 Epoch: 0 Global Step: 13260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:08,614-Speed 9426.97 samples/sec Loss 12.4942 LearningRate 0.0922 Epoch: 0 Global Step: 13270 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:04:09,674-Speed 9664.94 samples/sec Loss 12.4095 LearningRate 0.0922 Epoch: 0 Global Step: 13280 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:04:10,770-Speed 9347.39 samples/sec Loss 12.4539 LearningRate 0.0922 Epoch: 0 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:11,870-Speed 9312.27 samples/sec Loss 12.3728 LearningRate 0.0922 Epoch: 0 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:12,985-Speed 9194.80 samples/sec Loss 12.5353 LearningRate 0.0922 Epoch: 0 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:14,067-Speed 9463.58 samples/sec Loss 12.3878 LearningRate 0.0922 Epoch: 0 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:15,119-Speed 9746.22 samples/sec Loss 12.4562 LearningRate 0.0922 Epoch: 0 Global Step: 13330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:16,198-Speed 9499.43 samples/sec Loss 12.3794 LearningRate 0.0922 Epoch: 0 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:17,294-Speed 9348.68 samples/sec Loss 12.3280 LearningRate 0.0922 Epoch: 0 Global Step: 13350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:18,406-Speed 9206.21 samples/sec Loss 12.2635 LearningRate 0.0922 Epoch: 0 Global Step: 13360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:19,449-Speed 9832.62 samples/sec Loss 12.2532 LearningRate 0.0922 Epoch: 0 Global Step: 13370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:20,538-Speed 9404.46 samples/sec Loss 12.5140 LearningRate 0.0921 Epoch: 0 Global Step: 13380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:21,642-Speed 9282.71 samples/sec Loss 12.3176 LearningRate 0.0921 Epoch: 0 Global Step: 13390 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:04:22,731-Speed 9405.27 samples/sec Loss 12.2675 LearningRate 0.0921 Epoch: 0 Global Step: 13400 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:04:23,792-Speed 9655.80 samples/sec Loss 12.3865 LearningRate 0.0921 Epoch: 0 Global Step: 13410 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:04:24,884-Speed 9383.47 samples/sec Loss 12.2444 LearningRate 0.0921 Epoch: 0 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:25,964-Speed 9494.42 samples/sec Loss 12.2592 LearningRate 0.0921 Epoch: 0 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:27,041-Speed 9512.86 samples/sec Loss 12.3091 LearningRate 0.0921 Epoch: 0 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:28,114-Speed 9543.64 samples/sec Loss 12.3033 LearningRate 0.0921 Epoch: 0 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:29,170-Speed 9704.74 samples/sec Loss 12.3247 LearningRate 0.0921 Epoch: 0 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:30,228-Speed 9686.80 samples/sec Loss 12.3368 LearningRate 0.0921 Epoch: 0 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:31,277-Speed 9769.81 samples/sec Loss 12.1808 LearningRate 0.0921 Epoch: 0 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:32,351-Speed 9540.58 samples/sec Loss 12.2779 LearningRate 0.0921 Epoch: 0 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:33,465-Speed 9192.21 samples/sec Loss 12.1764 LearningRate 0.0921 Epoch: 0 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:34,563-Speed 9337.22 samples/sec Loss 12.2099 LearningRate 0.0921 Epoch: 0 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:35,653-Speed 9398.97 samples/sec Loss 12.2138 LearningRate 0.0921 Epoch: 0 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:36,699-Speed 9796.70 samples/sec Loss 12.2397 LearningRate 0.0921 Epoch: 0 Global Step: 13530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:04:37,760-Speed 9650.02 samples/sec Loss 12.3388 LearningRate 0.0921 Epoch: 0 Global Step: 13540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:04:38,836-Speed 9526.57 samples/sec Loss 12.2845 LearningRate 0.0920 Epoch: 0 Global Step: 13550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:04:39,908-Speed 9560.48 samples/sec Loss 12.3474 LearningRate 0.0920 Epoch: 0 Global Step: 13560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:04:40,954-Speed 9790.28 samples/sec Loss 12.4895 LearningRate 0.0920 Epoch: 0 Global Step: 13570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:04:42,065-Speed 9223.94 samples/sec Loss 12.3225 LearningRate 0.0920 Epoch: 0 Global Step: 13580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:04:43,150-Speed 9441.76 samples/sec Loss 12.2434 LearningRate 0.0920 Epoch: 0 Global Step: 13590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:04:44,234-Speed 9456.35 samples/sec Loss 12.2950 LearningRate 0.0920 Epoch: 0 Global Step: 13600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:04:45,277-Speed 9827.79 samples/sec Loss 12.3657 LearningRate 0.0920 Epoch: 0 Global Step: 13610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:04:46,361-Speed 9447.55 samples/sec Loss 12.2772 LearningRate 0.0920 Epoch: 0 Global Step: 13620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:04:47,492-Speed 9065.78 samples/sec Loss 12.1833 LearningRate 0.0920 Epoch: 0 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:48,585-Speed 9373.29 samples/sec Loss 12.1516 LearningRate 0.0920 Epoch: 0 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:49,630-Speed 9802.40 samples/sec Loss 12.2556 LearningRate 0.0920 Epoch: 0 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:50,659-Speed 9951.83 samples/sec Loss 12.2326 LearningRate 0.0920 Epoch: 0 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:51,740-Speed 9480.17 samples/sec Loss 12.2588 LearningRate 0.0920 Epoch: 0 Global Step: 13670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:04:52,812-Speed 9558.90 samples/sec Loss 12.2050 LearningRate 0.0920 Epoch: 0 Global Step: 13680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:04:53,912-Speed 9315.85 samples/sec Loss 12.4608 LearningRate 0.0920 Epoch: 0 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:04:54,946-Speed 9908.13 samples/sec Loss 12.2377 LearningRate 0.0920 Epoch: 0 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:04:55,988-Speed 9835.85 samples/sec Loss 12.2080 LearningRate 0.0920 Epoch: 0 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:04:57,125-Speed 9010.09 samples/sec Loss 12.3106 LearningRate 0.0919 Epoch: 0 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:04:58,252-Speed 9093.45 samples/sec Loss 12.2522 LearningRate 0.0919 Epoch: 0 Global Step: 13730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:04:59,311-Speed 9674.27 samples/sec Loss 12.3416 LearningRate 0.0919 Epoch: 0 Global Step: 13740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:05:00,437-Speed 9105.72 samples/sec Loss 12.2248 LearningRate 0.0919 Epoch: 0 Global Step: 13750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:05:01,501-Speed 9625.03 samples/sec Loss 12.1997 LearningRate 0.0919 Epoch: 0 Global Step: 13760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:05:02,603-Speed 9303.68 samples/sec Loss 12.1260 LearningRate 0.0919 Epoch: 0 Global Step: 13770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:05:03,693-Speed 9398.00 samples/sec Loss 12.3195 LearningRate 0.0919 Epoch: 0 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:04,795-Speed 9301.63 samples/sec Loss 12.2425 LearningRate 0.0919 Epoch: 0 Global Step: 13790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:05,897-Speed 9294.73 samples/sec Loss 12.2130 LearningRate 0.0919 Epoch: 0 Global Step: 13800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:06,953-Speed 9699.81 samples/sec Loss 12.2542 LearningRate 0.0919 Epoch: 0 Global Step: 13810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:08,050-Speed 9337.38 samples/sec Loss 12.1573 LearningRate 0.0919 Epoch: 0 Global Step: 13820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:09,112-Speed 9650.57 samples/sec Loss 12.2083 LearningRate 0.0919 Epoch: 0 Global Step: 13830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:10,191-Speed 9499.17 samples/sec Loss 12.3091 LearningRate 0.0919 Epoch: 0 Global Step: 13840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:11,265-Speed 9539.54 samples/sec Loss 12.2635 LearningRate 0.0919 Epoch: 0 Global Step: 13850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:05:12,357-Speed 9380.56 samples/sec Loss 12.1053 LearningRate 0.0919 Epoch: 0 Global Step: 13860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:05:13,451-Speed 9366.60 samples/sec Loss 12.2629 LearningRate 0.0919 Epoch: 0 Global Step: 13870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:05:14,547-Speed 9349.19 samples/sec Loss 12.2397 LearningRate 0.0919 Epoch: 0 Global Step: 13880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:05:15,617-Speed 9575.21 samples/sec Loss 12.2242 LearningRate 0.0919 Epoch: 0 Global Step: 13890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:05:16,689-Speed 9566.63 samples/sec Loss 12.4194 LearningRate 0.0918 Epoch: 0 Global Step: 13900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:05:17,781-Speed 9377.92 samples/sec Loss 12.1494 LearningRate 0.0918 Epoch: 0 Global Step: 13910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:05:18,867-Speed 9432.33 samples/sec Loss 12.2492 LearningRate 0.0918 Epoch: 0 Global Step: 13920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:05:19,945-Speed 9512.05 samples/sec Loss 12.2088 LearningRate 0.0918 Epoch: 0 Global Step: 13930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:05:21,019-Speed 9536.10 samples/sec Loss 12.3089 LearningRate 0.0918 Epoch: 0 Global Step: 13940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:05:22,068-Speed 9768.38 samples/sec Loss 12.2258 LearningRate 0.0918 Epoch: 0 Global Step: 13950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:23,153-Speed 9444.85 samples/sec Loss 12.0991 LearningRate 0.0918 Epoch: 0 Global Step: 13960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:24,234-Speed 9473.66 samples/sec Loss 12.0774 LearningRate 0.0918 Epoch: 0 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:25,300-Speed 9618.44 samples/sec Loss 12.1688 LearningRate 0.0918 Epoch: 0 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:26,427-Speed 9093.56 samples/sec Loss 12.2206 LearningRate 0.0918 Epoch: 0 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:27,540-Speed 9203.34 samples/sec Loss 11.9706 LearningRate 0.0918 Epoch: 0 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:05:49,451-[lfw][14000]XNorm: 14.270774 Training: 2022-04-11 12:05:49,452-[lfw][14000]Accuracy-Flip: 0.99317+-0.00369 Training: 2022-04-11 12:05:49,452-[lfw][14000]Accuracy-Highest: 0.99317 Training: 2022-04-11 12:06:14,824-[cfp_fp][14000]XNorm: 12.275331 Training: 2022-04-11 12:06:14,824-[cfp_fp][14000]Accuracy-Flip: 0.91457+-0.01183 Training: 2022-04-11 12:06:14,825-[cfp_fp][14000]Accuracy-Highest: 0.91457 Training: 2022-04-11 12:06:36,719-[agedb_30][14000]XNorm: 13.830484 Training: 2022-04-11 12:06:36,719-[agedb_30][14000]Accuracy-Flip: 0.93083+-0.01480 Training: 2022-04-11 12:06:36,720-[agedb_30][14000]Accuracy-Highest: 0.93083 Training: 2022-04-11 12:06:37,792-Speed 145.76 samples/sec Loss 12.1851 LearningRate 0.0918 Epoch: 0 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:38,878-Speed 9439.85 samples/sec Loss 12.0886 LearningRate 0.0918 Epoch: 0 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:39,999-Speed 9136.15 samples/sec Loss 12.1327 LearningRate 0.0918 Epoch: 0 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:41,096-Speed 9341.66 samples/sec Loss 12.2180 LearningRate 0.0918 Epoch: 0 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:42,185-Speed 9414.59 samples/sec Loss 12.1815 LearningRate 0.0918 Epoch: 0 Global Step: 14050 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:06:43,278-Speed 9374.53 samples/sec Loss 12.1006 LearningRate 0.0918 Epoch: 0 Global Step: 14060 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:06:44,379-Speed 9298.68 samples/sec Loss 12.0662 LearningRate 0.0917 Epoch: 0 Global Step: 14070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:45,443-Speed 9631.75 samples/sec Loss 12.1563 LearningRate 0.0917 Epoch: 0 Global Step: 14080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:46,538-Speed 9357.25 samples/sec Loss 12.1743 LearningRate 0.0917 Epoch: 0 Global Step: 14090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:47,637-Speed 9325.88 samples/sec Loss 12.1596 LearningRate 0.0917 Epoch: 0 Global Step: 14100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:48,729-Speed 9377.14 samples/sec Loss 12.2693 LearningRate 0.0917 Epoch: 0 Global Step: 14110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:49,813-Speed 9456.04 samples/sec Loss 12.1877 LearningRate 0.0917 Epoch: 0 Global Step: 14120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:50,852-Speed 9864.61 samples/sec Loss 12.1650 LearningRate 0.0917 Epoch: 0 Global Step: 14130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:06:51,919-Speed 9603.92 samples/sec Loss 12.1866 LearningRate 0.0917 Epoch: 0 Global Step: 14140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:06:52,991-Speed 9550.23 samples/sec Loss 12.1826 LearningRate 0.0917 Epoch: 0 Global Step: 14150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:06:54,083-Speed 9389.14 samples/sec Loss 12.2008 LearningRate 0.0917 Epoch: 0 Global Step: 14160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:06:55,209-Speed 9092.22 samples/sec Loss 12.0511 LearningRate 0.0917 Epoch: 0 Global Step: 14170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:06:56,266-Speed 9702.17 samples/sec Loss 12.1455 LearningRate 0.0917 Epoch: 0 Global Step: 14180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:06:57,363-Speed 9337.49 samples/sec Loss 12.2032 LearningRate 0.0917 Epoch: 0 Global Step: 14190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:06:58,427-Speed 9628.37 samples/sec Loss 12.1471 LearningRate 0.0917 Epoch: 0 Global Step: 14200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:06:59,468-Speed 9846.12 samples/sec Loss 12.1746 LearningRate 0.0917 Epoch: 0 Global Step: 14210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:00,575-Speed 9255.10 samples/sec Loss 12.0544 LearningRate 0.0917 Epoch: 0 Global Step: 14220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:01,656-Speed 9483.47 samples/sec Loss 12.0907 LearningRate 0.0917 Epoch: 0 Global Step: 14230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:02,723-Speed 9599.99 samples/sec Loss 12.0712 LearningRate 0.0917 Epoch: 0 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:03,776-Speed 9733.64 samples/sec Loss 12.0033 LearningRate 0.0916 Epoch: 0 Global Step: 14250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:04,846-Speed 9571.68 samples/sec Loss 12.1213 LearningRate 0.0916 Epoch: 0 Global Step: 14260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:05,909-Speed 9636.49 samples/sec Loss 11.9063 LearningRate 0.0916 Epoch: 0 Global Step: 14270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:06,951-Speed 9833.98 samples/sec Loss 12.0486 LearningRate 0.0916 Epoch: 0 Global Step: 14280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:08,021-Speed 9577.69 samples/sec Loss 12.0914 LearningRate 0.0916 Epoch: 0 Global Step: 14290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:09,092-Speed 9568.65 samples/sec Loss 12.0334 LearningRate 0.0916 Epoch: 0 Global Step: 14300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:10,191-Speed 9326.13 samples/sec Loss 12.1311 LearningRate 0.0916 Epoch: 0 Global Step: 14310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:11,298-Speed 9252.35 samples/sec Loss 11.8749 LearningRate 0.0916 Epoch: 0 Global Step: 14320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:12,382-Speed 9455.57 samples/sec Loss 11.9627 LearningRate 0.0916 Epoch: 0 Global Step: 14330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:13,517-Speed 9024.39 samples/sec Loss 12.1669 LearningRate 0.0916 Epoch: 0 Global Step: 14340 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:07:14,652-Speed 9027.00 samples/sec Loss 11.9712 LearningRate 0.0916 Epoch: 0 Global Step: 14350 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:07:15,762-Speed 9225.02 samples/sec Loss 12.1143 LearningRate 0.0916 Epoch: 0 Global Step: 14360 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:07:16,841-Speed 9498.70 samples/sec Loss 12.1133 LearningRate 0.0916 Epoch: 0 Global Step: 14370 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:07:17,918-Speed 9519.40 samples/sec Loss 12.1661 LearningRate 0.0916 Epoch: 0 Global Step: 14380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:19,025-Speed 9259.16 samples/sec Loss 11.9875 LearningRate 0.0916 Epoch: 0 Global Step: 14390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:20,116-Speed 9392.08 samples/sec Loss 12.1295 LearningRate 0.0916 Epoch: 0 Global Step: 14400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:21,227-Speed 9224.15 samples/sec Loss 12.0643 LearningRate 0.0916 Epoch: 0 Global Step: 14410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:22,270-Speed 9824.41 samples/sec Loss 12.0729 LearningRate 0.0915 Epoch: 0 Global Step: 14420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:23,364-Speed 9360.01 samples/sec Loss 12.0137 LearningRate 0.0915 Epoch: 0 Global Step: 14430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:24,424-Speed 9664.40 samples/sec Loss 12.0708 LearningRate 0.0915 Epoch: 0 Global Step: 14440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:25,580-Speed 8868.37 samples/sec Loss 12.0676 LearningRate 0.0915 Epoch: 0 Global Step: 14450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:26,642-Speed 9650.71 samples/sec Loss 12.1367 LearningRate 0.0915 Epoch: 0 Global Step: 14460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:27,710-Speed 9592.83 samples/sec Loss 12.1241 LearningRate 0.0915 Epoch: 0 Global Step: 14470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:28,766-Speed 9701.38 samples/sec Loss 12.0971 LearningRate 0.0915 Epoch: 0 Global Step: 14480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:29,825-Speed 9669.07 samples/sec Loss 12.1021 LearningRate 0.0915 Epoch: 0 Global Step: 14490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:30,862-Speed 9878.44 samples/sec Loss 12.1350 LearningRate 0.0915 Epoch: 0 Global Step: 14500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:31,991-Speed 9081.52 samples/sec Loss 12.1066 LearningRate 0.0915 Epoch: 0 Global Step: 14510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:33,090-Speed 9320.31 samples/sec Loss 12.1014 LearningRate 0.0915 Epoch: 0 Global Step: 14520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:34,123-Speed 9916.79 samples/sec Loss 11.9311 LearningRate 0.0915 Epoch: 0 Global Step: 14530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:35,270-Speed 8930.63 samples/sec Loss 12.0781 LearningRate 0.0915 Epoch: 0 Global Step: 14540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:36,373-Speed 9290.52 samples/sec Loss 12.1560 LearningRate 0.0915 Epoch: 0 Global Step: 14550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:37,471-Speed 9333.95 samples/sec Loss 11.9866 LearningRate 0.0915 Epoch: 0 Global Step: 14560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:38,556-Speed 9491.90 samples/sec Loss 11.9248 LearningRate 0.0915 Epoch: 0 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:39,636-Speed 9482.42 samples/sec Loss 12.0374 LearningRate 0.0915 Epoch: 0 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:40,699-Speed 9640.81 samples/sec Loss 12.0464 LearningRate 0.0914 Epoch: 0 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:41,800-Speed 9309.43 samples/sec Loss 11.9838 LearningRate 0.0914 Epoch: 0 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:42,873-Speed 9549.12 samples/sec Loss 11.9571 LearningRate 0.0914 Epoch: 0 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:43,974-Speed 9298.11 samples/sec Loss 11.8206 LearningRate 0.0914 Epoch: 0 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:45,055-Speed 9482.97 samples/sec Loss 11.9890 LearningRate 0.0914 Epoch: 0 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:46,108-Speed 9727.07 samples/sec Loss 12.0158 LearningRate 0.0914 Epoch: 0 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:47,185-Speed 9513.05 samples/sec Loss 12.0857 LearningRate 0.0914 Epoch: 0 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:07:48,244-Speed 9680.53 samples/sec Loss 11.9804 LearningRate 0.0914 Epoch: 0 Global Step: 14660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:49,313-Speed 9583.33 samples/sec Loss 11.9114 LearningRate 0.0914 Epoch: 0 Global Step: 14670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:50,417-Speed 9283.98 samples/sec Loss 11.9844 LearningRate 0.0914 Epoch: 0 Global Step: 14680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:51,498-Speed 9478.06 samples/sec Loss 12.0201 LearningRate 0.0914 Epoch: 0 Global Step: 14690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:52,552-Speed 9714.60 samples/sec Loss 12.0135 LearningRate 0.0914 Epoch: 0 Global Step: 14700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:53,618-Speed 9611.91 samples/sec Loss 11.9690 LearningRate 0.0914 Epoch: 0 Global Step: 14710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:54,706-Speed 9420.67 samples/sec Loss 12.0423 LearningRate 0.0914 Epoch: 0 Global Step: 14720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:55,778-Speed 9567.49 samples/sec Loss 12.0172 LearningRate 0.0914 Epoch: 0 Global Step: 14730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:56,850-Speed 9553.11 samples/sec Loss 11.8900 LearningRate 0.0914 Epoch: 0 Global Step: 14740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:57,970-Speed 9150.76 samples/sec Loss 12.0095 LearningRate 0.0914 Epoch: 0 Global Step: 14750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:07:59,075-Speed 9271.83 samples/sec Loss 11.9094 LearningRate 0.0914 Epoch: 0 Global Step: 14760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:00,174-Speed 9320.80 samples/sec Loss 11.9992 LearningRate 0.0913 Epoch: 0 Global Step: 14770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:01,259-Speed 9441.26 samples/sec Loss 11.8873 LearningRate 0.0913 Epoch: 0 Global Step: 14780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:02,335-Speed 9520.54 samples/sec Loss 11.9921 LearningRate 0.0913 Epoch: 0 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:03,396-Speed 9664.13 samples/sec Loss 12.0396 LearningRate 0.0913 Epoch: 0 Global Step: 14800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:04,475-Speed 9491.78 samples/sec Loss 11.9169 LearningRate 0.0913 Epoch: 0 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:05,539-Speed 9627.74 samples/sec Loss 11.8847 LearningRate 0.0913 Epoch: 0 Global Step: 14820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:06,627-Speed 9421.50 samples/sec Loss 11.9643 LearningRate 0.0913 Epoch: 0 Global Step: 14830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:07,677-Speed 9757.95 samples/sec Loss 11.9326 LearningRate 0.0913 Epoch: 0 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:08,765-Speed 9414.88 samples/sec Loss 12.0292 LearningRate 0.0913 Epoch: 0 Global Step: 14850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:09,864-Speed 9327.84 samples/sec Loss 11.8849 LearningRate 0.0913 Epoch: 0 Global Step: 14860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:10,971-Speed 9252.86 samples/sec Loss 11.8783 LearningRate 0.0913 Epoch: 0 Global Step: 14870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:12,097-Speed 9096.05 samples/sec Loss 11.9811 LearningRate 0.0913 Epoch: 0 Global Step: 14880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:13,172-Speed 9531.82 samples/sec Loss 12.0979 LearningRate 0.0913 Epoch: 0 Global Step: 14890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:14,240-Speed 9595.62 samples/sec Loss 11.9962 LearningRate 0.0913 Epoch: 0 Global Step: 14900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:15,308-Speed 9590.79 samples/sec Loss 11.8617 LearningRate 0.0913 Epoch: 0 Global Step: 14910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:16,407-Speed 9327.54 samples/sec Loss 11.9993 LearningRate 0.0913 Epoch: 0 Global Step: 14920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:17,501-Speed 9361.25 samples/sec Loss 11.7845 LearningRate 0.0913 Epoch: 0 Global Step: 14930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:18,550-Speed 9771.96 samples/sec Loss 11.9204 LearningRate 0.0912 Epoch: 0 Global Step: 14940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:19,585-Speed 9896.14 samples/sec Loss 11.9193 LearningRate 0.0912 Epoch: 0 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:20,667-Speed 9476.07 samples/sec Loss 12.0000 LearningRate 0.0912 Epoch: 0 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:21,773-Speed 9263.56 samples/sec Loss 11.9422 LearningRate 0.0912 Epoch: 0 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:22,834-Speed 9655.99 samples/sec Loss 11.7482 LearningRate 0.0912 Epoch: 0 Global Step: 14980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:23,880-Speed 9794.66 samples/sec Loss 11.9768 LearningRate 0.0912 Epoch: 0 Global Step: 14990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:24,975-Speed 9353.23 samples/sec Loss 11.8133 LearningRate 0.0912 Epoch: 0 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:26,043-Speed 9602.16 samples/sec Loss 11.8807 LearningRate 0.0912 Epoch: 0 Global Step: 15010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:27,144-Speed 9306.10 samples/sec Loss 11.9748 LearningRate 0.0912 Epoch: 0 Global Step: 15020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:28,186-Speed 9829.62 samples/sec Loss 11.8927 LearningRate 0.0912 Epoch: 0 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:29,355-Speed 8762.29 samples/sec Loss 12.0114 LearningRate 0.0912 Epoch: 0 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:30,434-Speed 9498.33 samples/sec Loss 11.8390 LearningRate 0.0912 Epoch: 0 Global Step: 15050 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:08:31,531-Speed 9344.76 samples/sec Loss 11.9419 LearningRate 0.0912 Epoch: 0 Global Step: 15060 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:08:32,573-Speed 9829.69 samples/sec Loss 11.9942 LearningRate 0.0912 Epoch: 0 Global Step: 15070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:33,666-Speed 9371.63 samples/sec Loss 11.8170 LearningRate 0.0912 Epoch: 0 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:34,776-Speed 9236.08 samples/sec Loss 11.8859 LearningRate 0.0912 Epoch: 0 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:35,839-Speed 9640.65 samples/sec Loss 11.7871 LearningRate 0.0912 Epoch: 0 Global Step: 15100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:36,935-Speed 9350.39 samples/sec Loss 11.8138 LearningRate 0.0912 Epoch: 0 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:38,018-Speed 9454.04 samples/sec Loss 11.7433 LearningRate 0.0911 Epoch: 0 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:39,105-Speed 9436.33 samples/sec Loss 11.9279 LearningRate 0.0911 Epoch: 0 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:40,158-Speed 9727.93 samples/sec Loss 11.9021 LearningRate 0.0911 Epoch: 0 Global Step: 15140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:41,178-Speed 10043.41 samples/sec Loss 11.7625 LearningRate 0.0911 Epoch: 0 Global Step: 15150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:42,246-Speed 9591.58 samples/sec Loss 11.8297 LearningRate 0.0911 Epoch: 0 Global Step: 15160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:43,346-Speed 9310.99 samples/sec Loss 11.6465 LearningRate 0.0911 Epoch: 0 Global Step: 15170 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:08:44,432-Speed 9440.04 samples/sec Loss 11.7528 LearningRate 0.0911 Epoch: 0 Global Step: 15180 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:08:45,518-Speed 9435.08 samples/sec Loss 11.7531 LearningRate 0.0911 Epoch: 0 Global Step: 15190 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:08:46,606-Speed 9414.81 samples/sec Loss 11.8311 LearningRate 0.0911 Epoch: 0 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:47,707-Speed 9307.07 samples/sec Loss 11.7752 LearningRate 0.0911 Epoch: 0 Global Step: 15210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:48,765-Speed 9681.07 samples/sec Loss 11.8102 LearningRate 0.0911 Epoch: 0 Global Step: 15220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:08:49,889-Speed 9117.20 samples/sec Loss 11.6840 LearningRate 0.0911 Epoch: 0 Global Step: 15230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:50,964-Speed 9533.02 samples/sec Loss 11.9313 LearningRate 0.0911 Epoch: 0 Global Step: 15240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:52,037-Speed 9551.80 samples/sec Loss 11.7891 LearningRate 0.0911 Epoch: 0 Global Step: 15250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:53,114-Speed 9505.80 samples/sec Loss 11.9617 LearningRate 0.0911 Epoch: 0 Global Step: 15260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:54,188-Speed 9546.35 samples/sec Loss 11.6927 LearningRate 0.0911 Epoch: 0 Global Step: 15270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:55,261-Speed 9542.66 samples/sec Loss 11.8284 LearningRate 0.0911 Epoch: 0 Global Step: 15280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:56,405-Speed 8967.02 samples/sec Loss 11.7464 LearningRate 0.0910 Epoch: 0 Global Step: 15290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:57,546-Speed 8974.64 samples/sec Loss 11.9094 LearningRate 0.0910 Epoch: 0 Global Step: 15300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:58,618-Speed 9558.43 samples/sec Loss 11.7717 LearningRate 0.0910 Epoch: 0 Global Step: 15310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:08:59,693-Speed 9529.37 samples/sec Loss 11.8900 LearningRate 0.0910 Epoch: 0 Global Step: 15320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:09:00,777-Speed 9457.06 samples/sec Loss 11.9040 LearningRate 0.0910 Epoch: 0 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:09:01,852-Speed 9533.45 samples/sec Loss 11.7788 LearningRate 0.0910 Epoch: 0 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:09:02,955-Speed 9287.06 samples/sec Loss 11.7600 LearningRate 0.0910 Epoch: 0 Global Step: 15350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:09:04,056-Speed 9299.72 samples/sec Loss 11.8806 LearningRate 0.0910 Epoch: 0 Global Step: 15360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:09:05,174-Speed 9164.59 samples/sec Loss 11.8045 LearningRate 0.0910 Epoch: 0 Global Step: 15370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:09:06,232-Speed 9684.43 samples/sec Loss 11.7220 LearningRate 0.0910 Epoch: 0 Global Step: 15380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:09:07,273-Speed 9846.24 samples/sec Loss 11.7334 LearningRate 0.0910 Epoch: 0 Global Step: 15390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:09:08,315-Speed 9837.85 samples/sec Loss 11.7859 LearningRate 0.0910 Epoch: 0 Global Step: 15400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:09:09,401-Speed 9432.66 samples/sec Loss 11.8469 LearningRate 0.0910 Epoch: 0 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:09:10,482-Speed 9481.51 samples/sec Loss 11.7805 LearningRate 0.0910 Epoch: 0 Global Step: 15420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:09:11,547-Speed 9612.60 samples/sec Loss 11.7125 LearningRate 0.0910 Epoch: 0 Global Step: 15430 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:09:12,601-Speed 9725.84 samples/sec Loss 11.7514 LearningRate 0.0910 Epoch: 0 Global Step: 15440 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:09:13,646-Speed 9803.51 samples/sec Loss 11.8582 LearningRate 0.0910 Epoch: 0 Global Step: 15450 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:09:14,745-Speed 9319.55 samples/sec Loss 11.7563 LearningRate 0.0910 Epoch: 0 Global Step: 15460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:09:15,837-Speed 9390.55 samples/sec Loss 11.8585 LearningRate 0.0909 Epoch: 0 Global Step: 15470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:16,907-Speed 9576.72 samples/sec Loss 11.8731 LearningRate 0.0909 Epoch: 0 Global Step: 15480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:17,957-Speed 9750.18 samples/sec Loss 11.6198 LearningRate 0.0909 Epoch: 0 Global Step: 15490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:19,026-Speed 9586.69 samples/sec Loss 11.6308 LearningRate 0.0909 Epoch: 0 Global Step: 15500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:20,147-Speed 9144.96 samples/sec Loss 11.7820 LearningRate 0.0909 Epoch: 0 Global Step: 15510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:21,237-Speed 9402.97 samples/sec Loss 11.7360 LearningRate 0.0909 Epoch: 0 Global Step: 15520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:22,285-Speed 9774.21 samples/sec Loss 11.7495 LearningRate 0.0909 Epoch: 0 Global Step: 15530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:23,395-Speed 9230.41 samples/sec Loss 11.6290 LearningRate 0.0909 Epoch: 0 Global Step: 15540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:24,471-Speed 9518.94 samples/sec Loss 11.5500 LearningRate 0.0909 Epoch: 0 Global Step: 15550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:25,559-Speed 9422.05 samples/sec Loss 11.6793 LearningRate 0.0909 Epoch: 0 Global Step: 15560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:26,641-Speed 9472.87 samples/sec Loss 11.7836 LearningRate 0.0909 Epoch: 0 Global Step: 15570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:09:27,713-Speed 9551.93 samples/sec Loss 11.5937 LearningRate 0.0909 Epoch: 0 Global Step: 15580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:09:28,800-Speed 9425.22 samples/sec Loss 11.7034 LearningRate 0.0909 Epoch: 0 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:29,863-Speed 9638.29 samples/sec Loss 11.8234 LearningRate 0.0909 Epoch: 0 Global Step: 15600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:09:30,952-Speed 9409.65 samples/sec Loss 11.6990 LearningRate 0.0909 Epoch: 0 Global Step: 15610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:09:32,029-Speed 9513.59 samples/sec Loss 11.6444 LearningRate 0.0909 Epoch: 0 Global Step: 15620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:09:33,088-Speed 9677.93 samples/sec Loss 11.8032 LearningRate 0.0909 Epoch: 0 Global Step: 15630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:09:34,150-Speed 9640.80 samples/sec Loss 11.6995 LearningRate 0.0908 Epoch: 0 Global Step: 15640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:09:35,314-Speed 8809.60 samples/sec Loss 11.7043 LearningRate 0.0908 Epoch: 0 Global Step: 15650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:09:36,408-Speed 9363.95 samples/sec Loss 11.7226 LearningRate 0.0908 Epoch: 0 Global Step: 15660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:09:37,530-Speed 9135.31 samples/sec Loss 11.8005 LearningRate 0.0908 Epoch: 0 Global Step: 15670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:09:38,616-Speed 9433.07 samples/sec Loss 11.7666 LearningRate 0.0908 Epoch: 0 Global Step: 15680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:09:39,695-Speed 9500.62 samples/sec Loss 11.6816 LearningRate 0.0908 Epoch: 0 Global Step: 15690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:09:40,755-Speed 9667.19 samples/sec Loss 11.7433 LearningRate 0.0908 Epoch: 0 Global Step: 15700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:41,806-Speed 9748.69 samples/sec Loss 11.6522 LearningRate 0.0908 Epoch: 0 Global Step: 15710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:42,885-Speed 9497.56 samples/sec Loss 11.5866 LearningRate 0.0908 Epoch: 0 Global Step: 15720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:43,975-Speed 9399.40 samples/sec Loss 11.6595 LearningRate 0.0908 Epoch: 0 Global Step: 15730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:45,053-Speed 9504.05 samples/sec Loss 11.7896 LearningRate 0.0908 Epoch: 0 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:46,121-Speed 9593.60 samples/sec Loss 11.6122 LearningRate 0.0908 Epoch: 0 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:47,190-Speed 9585.49 samples/sec Loss 11.6318 LearningRate 0.0908 Epoch: 0 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:48,286-Speed 9344.19 samples/sec Loss 11.5868 LearningRate 0.0908 Epoch: 0 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:49,417-Speed 9064.90 samples/sec Loss 11.7618 LearningRate 0.0908 Epoch: 0 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:50,483-Speed 9607.79 samples/sec Loss 11.7001 LearningRate 0.0908 Epoch: 0 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:51,575-Speed 9383.82 samples/sec Loss 11.7093 LearningRate 0.0908 Epoch: 0 Global Step: 15800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:09:52,630-Speed 9706.38 samples/sec Loss 11.6994 LearningRate 0.0908 Epoch: 0 Global Step: 15810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:09:53,772-Speed 8978.41 samples/sec Loss 11.5868 LearningRate 0.0907 Epoch: 0 Global Step: 15820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:54,895-Speed 9120.93 samples/sec Loss 11.6608 LearningRate 0.0907 Epoch: 0 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:55,946-Speed 9755.28 samples/sec Loss 11.7203 LearningRate 0.0907 Epoch: 0 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:57,035-Speed 9408.76 samples/sec Loss 11.5887 LearningRate 0.0907 Epoch: 0 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:58,097-Speed 9643.99 samples/sec Loss 11.5577 LearningRate 0.0907 Epoch: 0 Global Step: 15860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:09:59,180-Speed 9459.69 samples/sec Loss 11.5677 LearningRate 0.0907 Epoch: 0 Global Step: 15870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:10:00,255-Speed 9533.60 samples/sec Loss 11.6617 LearningRate 0.0907 Epoch: 0 Global Step: 15880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:10:01,304-Speed 9769.46 samples/sec Loss 11.7082 LearningRate 0.0907 Epoch: 0 Global Step: 15890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:10:02,367-Speed 9634.32 samples/sec Loss 11.7516 LearningRate 0.0907 Epoch: 0 Global Step: 15900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:10:03,450-Speed 9467.17 samples/sec Loss 11.5323 LearningRate 0.0907 Epoch: 0 Global Step: 15910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:10:04,523-Speed 9542.61 samples/sec Loss 11.6315 LearningRate 0.0907 Epoch: 0 Global Step: 15920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:10:05,592-Speed 9586.49 samples/sec Loss 11.5522 LearningRate 0.0907 Epoch: 0 Global Step: 15930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:10:06,671-Speed 9490.80 samples/sec Loss 11.6365 LearningRate 0.0907 Epoch: 0 Global Step: 15940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:10:07,738-Speed 9602.56 samples/sec Loss 11.6783 LearningRate 0.0907 Epoch: 0 Global Step: 15950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:10:08,816-Speed 9511.37 samples/sec Loss 11.5670 LearningRate 0.0907 Epoch: 0 Global Step: 15960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:10:09,920-Speed 9278.36 samples/sec Loss 11.5745 LearningRate 0.0907 Epoch: 0 Global Step: 15970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:10:10,961-Speed 9850.14 samples/sec Loss 11.7599 LearningRate 0.0907 Epoch: 0 Global Step: 15980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:10:12,047-Speed 9428.13 samples/sec Loss 11.6846 LearningRate 0.0906 Epoch: 0 Global Step: 15990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:10:13,183-Speed 9024.81 samples/sec Loss 11.4858 LearningRate 0.0906 Epoch: 0 Global Step: 16000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:10:35,454-[lfw][16000]XNorm: 14.390814 Training: 2022-04-11 12:10:35,455-[lfw][16000]Accuracy-Flip: 0.99267+-0.00448 Training: 2022-04-11 12:10:35,455-[lfw][16000]Accuracy-Highest: 0.99317 Training: 2022-04-11 12:11:01,059-[cfp_fp][16000]XNorm: 12.355993 Training: 2022-04-11 12:11:01,060-[cfp_fp][16000]Accuracy-Flip: 0.91700+-0.01248 Training: 2022-04-11 12:11:01,060-[cfp_fp][16000]Accuracy-Highest: 0.91700 Training: 2022-04-11 12:11:23,035-[agedb_30][16000]XNorm: 13.844527 Training: 2022-04-11 12:11:23,036-[agedb_30][16000]Accuracy-Flip: 0.92967+-0.01641 Training: 2022-04-11 12:11:23,037-[agedb_30][16000]Accuracy-Highest: 0.93083 Training: 2022-04-11 12:11:24,106-Speed 144.38 samples/sec Loss 11.5254 LearningRate 0.0906 Epoch: 0 Global Step: 16010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:25,182-Speed 9518.83 samples/sec Loss 11.5666 LearningRate 0.0906 Epoch: 0 Global Step: 16020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:26,258-Speed 9522.16 samples/sec Loss 11.6676 LearningRate 0.0906 Epoch: 0 Global Step: 16030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:27,360-Speed 9297.44 samples/sec Loss 11.5352 LearningRate 0.0906 Epoch: 0 Global Step: 16040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:28,436-Speed 9520.91 samples/sec Loss 11.6476 LearningRate 0.0906 Epoch: 0 Global Step: 16050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:29,487-Speed 9748.89 samples/sec Loss 11.6504 LearningRate 0.0906 Epoch: 0 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:30,600-Speed 9206.23 samples/sec Loss 11.6341 LearningRate 0.0906 Epoch: 0 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:31,677-Speed 9516.10 samples/sec Loss 11.6039 LearningRate 0.0906 Epoch: 0 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:32,718-Speed 9842.04 samples/sec Loss 11.5709 LearningRate 0.0906 Epoch: 0 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:33,785-Speed 9605.92 samples/sec Loss 11.7156 LearningRate 0.0906 Epoch: 0 Global Step: 16100 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:11:34,864-Speed 9490.71 samples/sec Loss 11.5102 LearningRate 0.0906 Epoch: 0 Global Step: 16110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:11:35,958-Speed 9363.99 samples/sec Loss 11.6546 LearningRate 0.0906 Epoch: 0 Global Step: 16120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:11:37,016-Speed 9684.65 samples/sec Loss 11.7099 LearningRate 0.0906 Epoch: 0 Global Step: 16130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:11:38,100-Speed 9455.04 samples/sec Loss 11.6172 LearningRate 0.0906 Epoch: 0 Global Step: 16140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:11:39,186-Speed 9431.37 samples/sec Loss 11.7497 LearningRate 0.0906 Epoch: 0 Global Step: 16150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:11:40,270-Speed 9463.06 samples/sec Loss 11.5953 LearningRate 0.0906 Epoch: 0 Global Step: 16160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:11:41,355-Speed 9439.49 samples/sec Loss 11.6466 LearningRate 0.0905 Epoch: 0 Global Step: 16170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:11:42,475-Speed 9145.86 samples/sec Loss 11.5153 LearningRate 0.0905 Epoch: 0 Global Step: 16180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:11:43,557-Speed 9478.36 samples/sec Loss 11.5628 LearningRate 0.0905 Epoch: 0 Global Step: 16190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:11:44,608-Speed 9745.53 samples/sec Loss 11.5774 LearningRate 0.0905 Epoch: 0 Global Step: 16200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:11:45,655-Speed 9782.54 samples/sec Loss 11.5766 LearningRate 0.0905 Epoch: 0 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:46,733-Speed 9515.31 samples/sec Loss 11.6465 LearningRate 0.0905 Epoch: 0 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:47,821-Speed 9412.28 samples/sec Loss 11.5418 LearningRate 0.0905 Epoch: 0 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:48,886-Speed 9621.17 samples/sec Loss 11.5928 LearningRate 0.0905 Epoch: 0 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:49,978-Speed 9383.35 samples/sec Loss 11.6317 LearningRate 0.0905 Epoch: 0 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:51,065-Speed 9421.16 samples/sec Loss 11.5658 LearningRate 0.0905 Epoch: 0 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:52,132-Speed 9605.23 samples/sec Loss 11.6592 LearningRate 0.0905 Epoch: 0 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:53,195-Speed 9636.53 samples/sec Loss 11.5533 LearningRate 0.0905 Epoch: 0 Global Step: 16280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:54,319-Speed 9119.30 samples/sec Loss 11.5611 LearningRate 0.0905 Epoch: 0 Global Step: 16290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:55,363-Speed 9813.63 samples/sec Loss 11.4909 LearningRate 0.0905 Epoch: 0 Global Step: 16300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:56,421-Speed 9683.71 samples/sec Loss 11.5292 LearningRate 0.0905 Epoch: 0 Global Step: 16310 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:11:57,502-Speed 9473.01 samples/sec Loss 11.4926 LearningRate 0.0905 Epoch: 0 Global Step: 16320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:58,614-Speed 9220.50 samples/sec Loss 11.6181 LearningRate 0.0905 Epoch: 0 Global Step: 16330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:11:59,700-Speed 9433.08 samples/sec Loss 11.5920 LearningRate 0.0904 Epoch: 0 Global Step: 16340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:00,778-Speed 9502.50 samples/sec Loss 11.4260 LearningRate 0.0904 Epoch: 0 Global Step: 16350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:01,862-Speed 9455.71 samples/sec Loss 11.4515 LearningRate 0.0904 Epoch: 0 Global Step: 16360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:02,955-Speed 9370.22 samples/sec Loss 11.4889 LearningRate 0.0904 Epoch: 0 Global Step: 16370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:04,060-Speed 9274.30 samples/sec Loss 11.5285 LearningRate 0.0904 Epoch: 0 Global Step: 16380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:05,133-Speed 9553.77 samples/sec Loss 11.4439 LearningRate 0.0904 Epoch: 0 Global Step: 16390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:06,215-Speed 9468.09 samples/sec Loss 11.5055 LearningRate 0.0904 Epoch: 0 Global Step: 16400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:07,280-Speed 9619.67 samples/sec Loss 11.5291 LearningRate 0.0904 Epoch: 0 Global Step: 16410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:08,384-Speed 9278.28 samples/sec Loss 11.3905 LearningRate 0.0904 Epoch: 0 Global Step: 16420 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:12:09,481-Speed 9341.17 samples/sec Loss 11.5184 LearningRate 0.0904 Epoch: 0 Global Step: 16430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:10,569-Speed 9419.88 samples/sec Loss 11.5696 LearningRate 0.0904 Epoch: 0 Global Step: 16440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:11,652-Speed 9463.88 samples/sec Loss 11.6175 LearningRate 0.0904 Epoch: 0 Global Step: 16450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:12,766-Speed 9197.16 samples/sec Loss 11.5146 LearningRate 0.0904 Epoch: 0 Global Step: 16460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:13,861-Speed 9359.92 samples/sec Loss 11.5617 LearningRate 0.0904 Epoch: 0 Global Step: 16470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:14,977-Speed 9176.77 samples/sec Loss 11.5747 LearningRate 0.0904 Epoch: 0 Global Step: 16480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:16,006-Speed 9956.12 samples/sec Loss 11.5791 LearningRate 0.0904 Epoch: 0 Global Step: 16490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:17,059-Speed 9738.63 samples/sec Loss 11.4911 LearningRate 0.0904 Epoch: 0 Global Step: 16500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:18,120-Speed 9657.56 samples/sec Loss 11.5522 LearningRate 0.0904 Epoch: 0 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:19,189-Speed 9581.12 samples/sec Loss 11.4091 LearningRate 0.0903 Epoch: 0 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:20,288-Speed 9322.55 samples/sec Loss 11.6741 LearningRate 0.0903 Epoch: 0 Global Step: 16530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:12:21,339-Speed 9748.22 samples/sec Loss 11.5850 LearningRate 0.0903 Epoch: 0 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:22,375-Speed 9889.97 samples/sec Loss 11.5221 LearningRate 0.0903 Epoch: 0 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:23,418-Speed 9827.46 samples/sec Loss 11.4709 LearningRate 0.0903 Epoch: 0 Global Step: 16560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:12:24,453-Speed 9906.41 samples/sec Loss 11.4413 LearningRate 0.0903 Epoch: 0 Global Step: 16570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:12:25,527-Speed 9537.41 samples/sec Loss 11.4402 LearningRate 0.0903 Epoch: 0 Global Step: 16580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:12:26,602-Speed 9531.01 samples/sec Loss 11.5501 LearningRate 0.0903 Epoch: 0 Global Step: 16590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:12:27,715-Speed 9199.18 samples/sec Loss 11.4448 LearningRate 0.0903 Epoch: 0 Global Step: 16600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:12:28,824-Speed 9240.87 samples/sec Loss 11.5353 LearningRate 0.0903 Epoch: 0 Global Step: 16610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:12:29,894-Speed 9580.61 samples/sec Loss 11.3909 LearningRate 0.0903 Epoch: 0 Global Step: 16620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:12:30,969-Speed 9533.53 samples/sec Loss 11.4310 LearningRate 0.0903 Epoch: 0 Global Step: 16630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:12:32,050-Speed 9476.80 samples/sec Loss 11.5372 LearningRate 0.0903 Epoch: 0 Global Step: 16640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:12:33,136-Speed 9428.01 samples/sec Loss 11.4440 LearningRate 0.0903 Epoch: 0 Global Step: 16650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:12:34,202-Speed 9612.94 samples/sec Loss 11.5497 LearningRate 0.0903 Epoch: 0 Global Step: 16660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:35,315-Speed 9205.15 samples/sec Loss 11.3278 LearningRate 0.0903 Epoch: 0 Global Step: 16670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:36,741-Speed 7182.54 samples/sec Loss 11.5450 LearningRate 0.0903 Epoch: 0 Global Step: 16680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:12:37,790-Speed 9766.55 samples/sec Loss 11.4412 LearningRate 0.0903 Epoch: 0 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:10,677-Speed 311.38 samples/sec Loss 10.7104 LearningRate 0.0902 Epoch: 1 Global Step: 16700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:11,934-Speed 8155.30 samples/sec Loss 10.5454 LearningRate 0.0902 Epoch: 1 Global Step: 16710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:13,080-Speed 8938.96 samples/sec Loss 10.7086 LearningRate 0.0902 Epoch: 1 Global Step: 16720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:14,158-Speed 9505.45 samples/sec Loss 10.5646 LearningRate 0.0902 Epoch: 1 Global Step: 16730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:15,304-Speed 8937.37 samples/sec Loss 10.5241 LearningRate 0.0902 Epoch: 1 Global Step: 16740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:16,406-Speed 9303.24 samples/sec Loss 10.6329 LearningRate 0.0902 Epoch: 1 Global Step: 16750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:18,516-Speed 4855.49 samples/sec Loss 10.6392 LearningRate 0.0902 Epoch: 1 Global Step: 16760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:19,587-Speed 9572.08 samples/sec Loss 10.5079 LearningRate 0.0902 Epoch: 1 Global Step: 16770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:20,693-Speed 9257.85 samples/sec Loss 10.5909 LearningRate 0.0902 Epoch: 1 Global Step: 16780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:21,871-Speed 8697.68 samples/sec Loss 10.5529 LearningRate 0.0902 Epoch: 1 Global Step: 16790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:23,040-Speed 8765.59 samples/sec Loss 10.6659 LearningRate 0.0902 Epoch: 1 Global Step: 16800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:24,146-Speed 9260.82 samples/sec Loss 10.6209 LearningRate 0.0902 Epoch: 1 Global Step: 16810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:25,223-Speed 9518.41 samples/sec Loss 10.6567 LearningRate 0.0902 Epoch: 1 Global Step: 16820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:26,312-Speed 9407.30 samples/sec Loss 10.5656 LearningRate 0.0902 Epoch: 1 Global Step: 16830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:27,379-Speed 9603.00 samples/sec Loss 10.5475 LearningRate 0.0902 Epoch: 1 Global Step: 16840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:28,467-Speed 9414.81 samples/sec Loss 10.5731 LearningRate 0.0902 Epoch: 1 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:29,528-Speed 9652.90 samples/sec Loss 10.5232 LearningRate 0.0902 Epoch: 1 Global Step: 16860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:13:30,615-Speed 9433.15 samples/sec Loss 10.7433 LearningRate 0.0901 Epoch: 1 Global Step: 16870 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:13:31,740-Speed 9108.31 samples/sec Loss 10.6933 LearningRate 0.0901 Epoch: 1 Global Step: 16880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:13:32,829-Speed 9407.49 samples/sec Loss 10.6025 LearningRate 0.0901 Epoch: 1 Global Step: 16890 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:13:33,881-Speed 9742.54 samples/sec Loss 10.6044 LearningRate 0.0901 Epoch: 1 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:34,978-Speed 9334.45 samples/sec Loss 10.5977 LearningRate 0.0901 Epoch: 1 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:36,094-Speed 9178.73 samples/sec Loss 10.7549 LearningRate 0.0901 Epoch: 1 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:37,172-Speed 9511.84 samples/sec Loss 10.6553 LearningRate 0.0901 Epoch: 1 Global Step: 16930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:38,590-Speed 7226.61 samples/sec Loss 10.7400 LearningRate 0.0901 Epoch: 1 Global Step: 16940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:39,690-Speed 9313.23 samples/sec Loss 10.6604 LearningRate 0.0901 Epoch: 1 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:40,774-Speed 9454.82 samples/sec Loss 10.6586 LearningRate 0.0901 Epoch: 1 Global Step: 16960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:13:41,877-Speed 9286.12 samples/sec Loss 10.6468 LearningRate 0.0901 Epoch: 1 Global Step: 16970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:13:42,966-Speed 9414.68 samples/sec Loss 10.6722 LearningRate 0.0901 Epoch: 1 Global Step: 16980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:13:44,038-Speed 9555.80 samples/sec Loss 10.7272 LearningRate 0.0901 Epoch: 1 Global Step: 16990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:13:45,152-Speed 9190.58 samples/sec Loss 10.6701 LearningRate 0.0901 Epoch: 1 Global Step: 17000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:13:46,288-Speed 9025.11 samples/sec Loss 10.5134 LearningRate 0.0901 Epoch: 1 Global Step: 17010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:13:47,367-Speed 9500.17 samples/sec Loss 10.6492 LearningRate 0.0901 Epoch: 1 Global Step: 17020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:13:48,486-Speed 9149.29 samples/sec Loss 10.5671 LearningRate 0.0901 Epoch: 1 Global Step: 17030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:13:49,571-Speed 9451.55 samples/sec Loss 10.6251 LearningRate 0.0901 Epoch: 1 Global Step: 17040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:13:50,676-Speed 9264.72 samples/sec Loss 10.5883 LearningRate 0.0900 Epoch: 1 Global Step: 17050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:13:51,786-Speed 9234.64 samples/sec Loss 10.6533 LearningRate 0.0900 Epoch: 1 Global Step: 17060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:52,881-Speed 9352.49 samples/sec Loss 10.5445 LearningRate 0.0900 Epoch: 1 Global Step: 17070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:53,952-Speed 9568.99 samples/sec Loss 10.5837 LearningRate 0.0900 Epoch: 1 Global Step: 17080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:55,048-Speed 9354.99 samples/sec Loss 10.7921 LearningRate 0.0900 Epoch: 1 Global Step: 17090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:56,108-Speed 9666.44 samples/sec Loss 10.6724 LearningRate 0.0900 Epoch: 1 Global Step: 17100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:57,222-Speed 9197.89 samples/sec Loss 10.6069 LearningRate 0.0900 Epoch: 1 Global Step: 17110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:58,306-Speed 9451.26 samples/sec Loss 10.7484 LearningRate 0.0900 Epoch: 1 Global Step: 17120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:13:59,357-Speed 9747.28 samples/sec Loss 10.7546 LearningRate 0.0900 Epoch: 1 Global Step: 17130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:00,488-Speed 9060.31 samples/sec Loss 10.6446 LearningRate 0.0900 Epoch: 1 Global Step: 17140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:01,592-Speed 9275.17 samples/sec Loss 10.8212 LearningRate 0.0900 Epoch: 1 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:02,703-Speed 9225.85 samples/sec Loss 10.6174 LearningRate 0.0900 Epoch: 1 Global Step: 17160 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:03,765-Speed 9648.47 samples/sec Loss 10.7752 LearningRate 0.0900 Epoch: 1 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:04,904-Speed 8993.65 samples/sec Loss 10.7181 LearningRate 0.0900 Epoch: 1 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:05,985-Speed 9482.25 samples/sec Loss 10.5637 LearningRate 0.0900 Epoch: 1 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:07,098-Speed 9203.62 samples/sec Loss 10.7173 LearningRate 0.0900 Epoch: 1 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:08,213-Speed 9188.77 samples/sec Loss 10.6381 LearningRate 0.0900 Epoch: 1 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:09,320-Speed 9260.93 samples/sec Loss 10.6878 LearningRate 0.0899 Epoch: 1 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:10,427-Speed 9262.61 samples/sec Loss 10.7537 LearningRate 0.0899 Epoch: 1 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:11,513-Speed 9436.93 samples/sec Loss 10.6699 LearningRate 0.0899 Epoch: 1 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:12,613-Speed 9311.40 samples/sec Loss 10.6071 LearningRate 0.0899 Epoch: 1 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:13,663-Speed 9757.75 samples/sec Loss 10.7437 LearningRate 0.0899 Epoch: 1 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:14,766-Speed 9296.07 samples/sec Loss 10.8205 LearningRate 0.0899 Epoch: 1 Global Step: 17270 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:15,817-Speed 9748.34 samples/sec Loss 10.6715 LearningRate 0.0899 Epoch: 1 Global Step: 17280 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:16,862-Speed 9803.00 samples/sec Loss 10.7199 LearningRate 0.0899 Epoch: 1 Global Step: 17290 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:17,934-Speed 9553.88 samples/sec Loss 10.6814 LearningRate 0.0899 Epoch: 1 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:18,980-Speed 9795.96 samples/sec Loss 10.7940 LearningRate 0.0899 Epoch: 1 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:20,037-Speed 9694.42 samples/sec Loss 10.7808 LearningRate 0.0899 Epoch: 1 Global Step: 17320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:21,123-Speed 9432.31 samples/sec Loss 10.8118 LearningRate 0.0899 Epoch: 1 Global Step: 17330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:22,201-Speed 9508.23 samples/sec Loss 10.6500 LearningRate 0.0899 Epoch: 1 Global Step: 17340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:23,288-Speed 9534.57 samples/sec Loss 10.7649 LearningRate 0.0899 Epoch: 1 Global Step: 17350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:24,351-Speed 9637.41 samples/sec Loss 10.7126 LearningRate 0.0899 Epoch: 1 Global Step: 17360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:25,459-Speed 9246.96 samples/sec Loss 10.7393 LearningRate 0.0899 Epoch: 1 Global Step: 17370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:26,575-Speed 9186.47 samples/sec Loss 10.7498 LearningRate 0.0899 Epoch: 1 Global Step: 17380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:27,681-Speed 9263.19 samples/sec Loss 10.7579 LearningRate 0.0899 Epoch: 1 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:28,755-Speed 9534.22 samples/sec Loss 10.7184 LearningRate 0.0898 Epoch: 1 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:29,828-Speed 9566.20 samples/sec Loss 10.7045 LearningRate 0.0898 Epoch: 1 Global Step: 17410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:30,908-Speed 9490.93 samples/sec Loss 10.7346 LearningRate 0.0898 Epoch: 1 Global Step: 17420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:31,982-Speed 9531.87 samples/sec Loss 10.7781 LearningRate 0.0898 Epoch: 1 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:33,048-Speed 9618.58 samples/sec Loss 10.6559 LearningRate 0.0898 Epoch: 1 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:34,090-Speed 9826.33 samples/sec Loss 10.6481 LearningRate 0.0898 Epoch: 1 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:35,210-Speed 9150.81 samples/sec Loss 10.7002 LearningRate 0.0898 Epoch: 1 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:36,323-Speed 9202.09 samples/sec Loss 10.9602 LearningRate 0.0898 Epoch: 1 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:37,416-Speed 9377.68 samples/sec Loss 10.7387 LearningRate 0.0898 Epoch: 1 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:38,531-Speed 9185.26 samples/sec Loss 10.8021 LearningRate 0.0898 Epoch: 1 Global Step: 17490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:39,601-Speed 9580.40 samples/sec Loss 10.6833 LearningRate 0.0898 Epoch: 1 Global Step: 17500 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:40,694-Speed 9372.12 samples/sec Loss 10.6411 LearningRate 0.0898 Epoch: 1 Global Step: 17510 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:41,773-Speed 9495.18 samples/sec Loss 10.7294 LearningRate 0.0898 Epoch: 1 Global Step: 17520 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:42,869-Speed 9348.42 samples/sec Loss 10.7815 LearningRate 0.0898 Epoch: 1 Global Step: 17530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:43,946-Speed 9514.92 samples/sec Loss 10.7162 LearningRate 0.0898 Epoch: 1 Global Step: 17540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:45,038-Speed 9383.89 samples/sec Loss 10.8865 LearningRate 0.0898 Epoch: 1 Global Step: 17550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:46,086-Speed 9777.29 samples/sec Loss 10.6806 LearningRate 0.0898 Epoch: 1 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:47,185-Speed 9330.00 samples/sec Loss 10.7117 LearningRate 0.0898 Epoch: 1 Global Step: 17570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:48,262-Speed 9512.26 samples/sec Loss 10.7749 LearningRate 0.0897 Epoch: 1 Global Step: 17580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:49,347-Speed 9446.43 samples/sec Loss 10.6617 LearningRate 0.0897 Epoch: 1 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:50,439-Speed 9378.70 samples/sec Loss 10.8839 LearningRate 0.0897 Epoch: 1 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:51,524-Speed 9439.05 samples/sec Loss 10.7569 LearningRate 0.0897 Epoch: 1 Global Step: 17610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:52,638-Speed 9201.56 samples/sec Loss 10.7667 LearningRate 0.0897 Epoch: 1 Global Step: 17620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:14:53,691-Speed 9744.60 samples/sec Loss 10.8386 LearningRate 0.0897 Epoch: 1 Global Step: 17630 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:54,745-Speed 9726.03 samples/sec Loss 10.6433 LearningRate 0.0897 Epoch: 1 Global Step: 17640 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:55,839-Speed 9363.82 samples/sec Loss 10.7146 LearningRate 0.0897 Epoch: 1 Global Step: 17650 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:56,919-Speed 9483.05 samples/sec Loss 10.7236 LearningRate 0.0897 Epoch: 1 Global Step: 17660 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:58,001-Speed 9474.31 samples/sec Loss 10.7369 LearningRate 0.0897 Epoch: 1 Global Step: 17670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:14:59,129-Speed 9083.52 samples/sec Loss 10.7287 LearningRate 0.0897 Epoch: 1 Global Step: 17680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:15:00,230-Speed 9306.58 samples/sec Loss 10.7028 LearningRate 0.0897 Epoch: 1 Global Step: 17690 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:15:01,335-Speed 9271.02 samples/sec Loss 10.7478 LearningRate 0.0897 Epoch: 1 Global Step: 17700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:15:02,403-Speed 9599.96 samples/sec Loss 10.7061 LearningRate 0.0897 Epoch: 1 Global Step: 17710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:03,549-Speed 8933.24 samples/sec Loss 10.6568 LearningRate 0.0897 Epoch: 1 Global Step: 17720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:04,625-Speed 9527.01 samples/sec Loss 10.7015 LearningRate 0.0897 Epoch: 1 Global Step: 17730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:05,684-Speed 9675.92 samples/sec Loss 10.6739 LearningRate 0.0897 Epoch: 1 Global Step: 17740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:06,778-Speed 9366.17 samples/sec Loss 10.7983 LearningRate 0.0896 Epoch: 1 Global Step: 17750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:07,897-Speed 9158.73 samples/sec Loss 10.7613 LearningRate 0.0896 Epoch: 1 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:09,045-Speed 8919.99 samples/sec Loss 10.8561 LearningRate 0.0896 Epoch: 1 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:10,139-Speed 9370.41 samples/sec Loss 10.6237 LearningRate 0.0896 Epoch: 1 Global Step: 17780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:11,230-Speed 9389.87 samples/sec Loss 10.8128 LearningRate 0.0896 Epoch: 1 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:12,298-Speed 9596.19 samples/sec Loss 10.6789 LearningRate 0.0896 Epoch: 1 Global Step: 17800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:13,377-Speed 9499.78 samples/sec Loss 10.7996 LearningRate 0.0896 Epoch: 1 Global Step: 17810 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:15:14,450-Speed 9544.12 samples/sec Loss 10.7511 LearningRate 0.0896 Epoch: 1 Global Step: 17820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:15,575-Speed 9113.34 samples/sec Loss 10.9599 LearningRate 0.0896 Epoch: 1 Global Step: 17830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:16,643-Speed 9596.66 samples/sec Loss 10.8447 LearningRate 0.0896 Epoch: 1 Global Step: 17840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:17,720-Speed 9514.75 samples/sec Loss 10.7658 LearningRate 0.0896 Epoch: 1 Global Step: 17850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:18,799-Speed 9494.49 samples/sec Loss 10.7086 LearningRate 0.0896 Epoch: 1 Global Step: 17860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:19,927-Speed 9078.99 samples/sec Loss 10.7801 LearningRate 0.0896 Epoch: 1 Global Step: 17870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:21,023-Speed 9349.13 samples/sec Loss 10.7843 LearningRate 0.0896 Epoch: 1 Global Step: 17880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:22,385-Speed 7523.84 samples/sec Loss 10.8288 LearningRate 0.0896 Epoch: 1 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:23,438-Speed 9733.05 samples/sec Loss 10.6581 LearningRate 0.0896 Epoch: 1 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:24,558-Speed 9150.61 samples/sec Loss 10.6091 LearningRate 0.0896 Epoch: 1 Global Step: 17910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:25,655-Speed 9337.21 samples/sec Loss 10.7650 LearningRate 0.0896 Epoch: 1 Global Step: 17920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:26,760-Speed 9275.58 samples/sec Loss 10.7901 LearningRate 0.0895 Epoch: 1 Global Step: 17930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:27,854-Speed 9361.36 samples/sec Loss 10.6588 LearningRate 0.0895 Epoch: 1 Global Step: 17940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:29,333-Speed 6932.18 samples/sec Loss 10.7497 LearningRate 0.0895 Epoch: 1 Global Step: 17950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:30,559-Speed 8355.71 samples/sec Loss 10.7167 LearningRate 0.0895 Epoch: 1 Global Step: 17960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:31,638-Speed 9489.84 samples/sec Loss 10.7602 LearningRate 0.0895 Epoch: 1 Global Step: 17970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:32,735-Speed 9337.43 samples/sec Loss 10.5699 LearningRate 0.0895 Epoch: 1 Global Step: 17980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:33,821-Speed 9441.66 samples/sec Loss 10.7921 LearningRate 0.0895 Epoch: 1 Global Step: 17990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:15:34,912-Speed 9386.98 samples/sec Loss 10.8397 LearningRate 0.0895 Epoch: 1 Global Step: 18000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:15:57,021-[lfw][18000]XNorm: 13.881175 Training: 2022-04-11 12:15:57,022-[lfw][18000]Accuracy-Flip: 0.99167+-0.00408 Training: 2022-04-11 12:15:57,022-[lfw][18000]Accuracy-Highest: 0.99317 Training: 2022-04-11 12:16:22,537-[cfp_fp][18000]XNorm: 11.811630 Training: 2022-04-11 12:16:22,538-[cfp_fp][18000]Accuracy-Flip: 0.92114+-0.01359 Training: 2022-04-11 12:16:22,538-[cfp_fp][18000]Accuracy-Highest: 0.92114 Training: 2022-04-11 12:16:44,568-[agedb_30][18000]XNorm: 13.412701 Training: 2022-04-11 12:16:44,569-[agedb_30][18000]Accuracy-Flip: 0.93433+-0.01104 Training: 2022-04-11 12:16:44,570-[agedb_30][18000]Accuracy-Highest: 0.93433 Training: 2022-04-11 12:16:45,690-Speed 144.68 samples/sec Loss 10.7648 LearningRate 0.0895 Epoch: 1 Global Step: 18010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:46,752-Speed 9642.72 samples/sec Loss 10.8985 LearningRate 0.0895 Epoch: 1 Global Step: 18020 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:16:47,812-Speed 9667.42 samples/sec Loss 10.8695 LearningRate 0.0895 Epoch: 1 Global Step: 18030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:48,873-Speed 9653.67 samples/sec Loss 10.7784 LearningRate 0.0895 Epoch: 1 Global Step: 18040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:49,943-Speed 9578.69 samples/sec Loss 10.7272 LearningRate 0.0895 Epoch: 1 Global Step: 18050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:51,020-Speed 9512.72 samples/sec Loss 10.7445 LearningRate 0.0895 Epoch: 1 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:52,118-Speed 9332.90 samples/sec Loss 10.7755 LearningRate 0.0895 Epoch: 1 Global Step: 18070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:53,223-Speed 9269.41 samples/sec Loss 10.8234 LearningRate 0.0895 Epoch: 1 Global Step: 18080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:54,280-Speed 9694.02 samples/sec Loss 10.7611 LearningRate 0.0895 Epoch: 1 Global Step: 18090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:55,392-Speed 9212.14 samples/sec Loss 10.7489 LearningRate 0.0894 Epoch: 1 Global Step: 18100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:56,507-Speed 9197.05 samples/sec Loss 10.7562 LearningRate 0.0894 Epoch: 1 Global Step: 18110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:57,613-Speed 9263.45 samples/sec Loss 10.8251 LearningRate 0.0894 Epoch: 1 Global Step: 18120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:16:58,736-Speed 9119.76 samples/sec Loss 10.7783 LearningRate 0.0894 Epoch: 1 Global Step: 18130 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:16:59,831-Speed 9356.81 samples/sec Loss 10.7964 LearningRate 0.0894 Epoch: 1 Global Step: 18140 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:17:00,891-Speed 9671.11 samples/sec Loss 10.7908 LearningRate 0.0894 Epoch: 1 Global Step: 18150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:01,999-Speed 9258.08 samples/sec Loss 10.6389 LearningRate 0.0894 Epoch: 1 Global Step: 18160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:03,090-Speed 9392.59 samples/sec Loss 10.7936 LearningRate 0.0894 Epoch: 1 Global Step: 18170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:04,179-Speed 9409.84 samples/sec Loss 10.5934 LearningRate 0.0894 Epoch: 1 Global Step: 18180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:05,293-Speed 9198.77 samples/sec Loss 10.7449 LearningRate 0.0894 Epoch: 1 Global Step: 18190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:06,375-Speed 9473.07 samples/sec Loss 10.7884 LearningRate 0.0894 Epoch: 1 Global Step: 18200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:07,475-Speed 9315.12 samples/sec Loss 10.6524 LearningRate 0.0894 Epoch: 1 Global Step: 18210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:08,563-Speed 9412.50 samples/sec Loss 10.7620 LearningRate 0.0894 Epoch: 1 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:09,705-Speed 8971.56 samples/sec Loss 10.8025 LearningRate 0.0894 Epoch: 1 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:10,800-Speed 9361.99 samples/sec Loss 10.8515 LearningRate 0.0894 Epoch: 1 Global Step: 18240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:11,849-Speed 9762.22 samples/sec Loss 10.8577 LearningRate 0.0894 Epoch: 1 Global Step: 18250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:12,957-Speed 9250.06 samples/sec Loss 10.7544 LearningRate 0.0894 Epoch: 1 Global Step: 18260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:14,053-Speed 9353.83 samples/sec Loss 10.7826 LearningRate 0.0894 Epoch: 1 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:15,105-Speed 9737.11 samples/sec Loss 10.8343 LearningRate 0.0893 Epoch: 1 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:16,216-Speed 9221.64 samples/sec Loss 10.7500 LearningRate 0.0893 Epoch: 1 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:17,347-Speed 9066.66 samples/sec Loss 10.7348 LearningRate 0.0893 Epoch: 1 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:18,419-Speed 9551.62 samples/sec Loss 10.7707 LearningRate 0.0893 Epoch: 1 Global Step: 18310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:17:19,535-Speed 9181.65 samples/sec Loss 10.8129 LearningRate 0.0893 Epoch: 1 Global Step: 18320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:17:20,620-Speed 9446.53 samples/sec Loss 10.7385 LearningRate 0.0893 Epoch: 1 Global Step: 18330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:17:21,741-Speed 9144.01 samples/sec Loss 10.8042 LearningRate 0.0893 Epoch: 1 Global Step: 18340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:17:22,785-Speed 9813.36 samples/sec Loss 10.6808 LearningRate 0.0893 Epoch: 1 Global Step: 18350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:17:23,958-Speed 8733.83 samples/sec Loss 10.8229 LearningRate 0.0893 Epoch: 1 Global Step: 18360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:17:25,042-Speed 9451.75 samples/sec Loss 10.7219 LearningRate 0.0893 Epoch: 1 Global Step: 18370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:17:26,117-Speed 9531.08 samples/sec Loss 10.8298 LearningRate 0.0893 Epoch: 1 Global Step: 18380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:17:27,206-Speed 9403.56 samples/sec Loss 10.7938 LearningRate 0.0893 Epoch: 1 Global Step: 18390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:17:28,281-Speed 9535.85 samples/sec Loss 10.7535 LearningRate 0.0893 Epoch: 1 Global Step: 18400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:17:29,347-Speed 9609.71 samples/sec Loss 10.7682 LearningRate 0.0893 Epoch: 1 Global Step: 18410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:30,443-Speed 9353.85 samples/sec Loss 10.6428 LearningRate 0.0893 Epoch: 1 Global Step: 18420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:31,533-Speed 9403.17 samples/sec Loss 10.7851 LearningRate 0.0893 Epoch: 1 Global Step: 18430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:32,619-Speed 9430.98 samples/sec Loss 10.6875 LearningRate 0.0893 Epoch: 1 Global Step: 18440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:33,702-Speed 9465.98 samples/sec Loss 10.9015 LearningRate 0.0893 Epoch: 1 Global Step: 18450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:34,781-Speed 9492.79 samples/sec Loss 10.6635 LearningRate 0.0892 Epoch: 1 Global Step: 18460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:35,844-Speed 9641.69 samples/sec Loss 10.8039 LearningRate 0.0892 Epoch: 1 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:36,905-Speed 9654.10 samples/sec Loss 10.7454 LearningRate 0.0892 Epoch: 1 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:38,012-Speed 9258.60 samples/sec Loss 10.7924 LearningRate 0.0892 Epoch: 1 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:39,064-Speed 9739.35 samples/sec Loss 10.7296 LearningRate 0.0892 Epoch: 1 Global Step: 18500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:40,135-Speed 9566.90 samples/sec Loss 10.7681 LearningRate 0.0892 Epoch: 1 Global Step: 18510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:41,201-Speed 9620.74 samples/sec Loss 10.7388 LearningRate 0.0892 Epoch: 1 Global Step: 18520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:42,262-Speed 9652.89 samples/sec Loss 10.7238 LearningRate 0.0892 Epoch: 1 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:43,337-Speed 9529.06 samples/sec Loss 10.7184 LearningRate 0.0892 Epoch: 1 Global Step: 18540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:44,419-Speed 9475.13 samples/sec Loss 10.8271 LearningRate 0.0892 Epoch: 1 Global Step: 18550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:45,492-Speed 9549.15 samples/sec Loss 10.8114 LearningRate 0.0892 Epoch: 1 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:46,561-Speed 9580.89 samples/sec Loss 10.7836 LearningRate 0.0892 Epoch: 1 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:47,610-Speed 9768.96 samples/sec Loss 10.8046 LearningRate 0.0892 Epoch: 1 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:48,689-Speed 9501.80 samples/sec Loss 10.8640 LearningRate 0.0892 Epoch: 1 Global Step: 18590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:49,781-Speed 9376.63 samples/sec Loss 10.7711 LearningRate 0.0892 Epoch: 1 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:50,846-Speed 9625.76 samples/sec Loss 10.8198 LearningRate 0.0892 Epoch: 1 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:51,951-Speed 9268.25 samples/sec Loss 10.7694 LearningRate 0.0892 Epoch: 1 Global Step: 18620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:53,018-Speed 9611.06 samples/sec Loss 10.8065 LearningRate 0.0891 Epoch: 1 Global Step: 18630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:54,110-Speed 9381.23 samples/sec Loss 10.8746 LearningRate 0.0891 Epoch: 1 Global Step: 18640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:55,213-Speed 9285.69 samples/sec Loss 10.7582 LearningRate 0.0891 Epoch: 1 Global Step: 18650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:56,305-Speed 9384.27 samples/sec Loss 10.7459 LearningRate 0.0891 Epoch: 1 Global Step: 18660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:57,396-Speed 9390.48 samples/sec Loss 10.8122 LearningRate 0.0891 Epoch: 1 Global Step: 18670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:58,485-Speed 9406.17 samples/sec Loss 10.7963 LearningRate 0.0891 Epoch: 1 Global Step: 18680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:17:59,584-Speed 9322.98 samples/sec Loss 10.7727 LearningRate 0.0891 Epoch: 1 Global Step: 18690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:00,665-Speed 9483.59 samples/sec Loss 10.7413 LearningRate 0.0891 Epoch: 1 Global Step: 18700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:01,746-Speed 9476.19 samples/sec Loss 10.7090 LearningRate 0.0891 Epoch: 1 Global Step: 18710 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:18:02,824-Speed 9509.37 samples/sec Loss 10.6695 LearningRate 0.0891 Epoch: 1 Global Step: 18720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:18:03,858-Speed 9909.10 samples/sec Loss 10.7877 LearningRate 0.0891 Epoch: 1 Global Step: 18730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:18:04,969-Speed 9222.56 samples/sec Loss 10.6971 LearningRate 0.0891 Epoch: 1 Global Step: 18740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:06,046-Speed 9505.25 samples/sec Loss 10.7900 LearningRate 0.0891 Epoch: 1 Global Step: 18750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:07,147-Speed 9312.80 samples/sec Loss 10.6809 LearningRate 0.0891 Epoch: 1 Global Step: 18760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:08,254-Speed 9251.25 samples/sec Loss 10.6463 LearningRate 0.0891 Epoch: 1 Global Step: 18770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:09,309-Speed 9708.03 samples/sec Loss 10.8416 LearningRate 0.0891 Epoch: 1 Global Step: 18780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:10,379-Speed 9589.76 samples/sec Loss 10.8635 LearningRate 0.0891 Epoch: 1 Global Step: 18790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:11,458-Speed 9497.63 samples/sec Loss 10.7948 LearningRate 0.0891 Epoch: 1 Global Step: 18800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:12,565-Speed 9254.55 samples/sec Loss 10.7881 LearningRate 0.0890 Epoch: 1 Global Step: 18810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:13,666-Speed 9303.34 samples/sec Loss 10.7646 LearningRate 0.0890 Epoch: 1 Global Step: 18820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:14,766-Speed 9311.31 samples/sec Loss 10.8125 LearningRate 0.0890 Epoch: 1 Global Step: 18830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:15,881-Speed 9188.43 samples/sec Loss 10.7233 LearningRate 0.0890 Epoch: 1 Global Step: 18840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:16,971-Speed 9408.06 samples/sec Loss 10.6236 LearningRate 0.0890 Epoch: 1 Global Step: 18850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:18,076-Speed 9269.75 samples/sec Loss 10.7893 LearningRate 0.0890 Epoch: 1 Global Step: 18860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:19,229-Speed 8888.72 samples/sec Loss 10.7349 LearningRate 0.0890 Epoch: 1 Global Step: 18870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:20,312-Speed 9459.49 samples/sec Loss 10.8793 LearningRate 0.0890 Epoch: 1 Global Step: 18880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:21,384-Speed 9559.35 samples/sec Loss 10.8265 LearningRate 0.0890 Epoch: 1 Global Step: 18890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:22,462-Speed 9504.39 samples/sec Loss 10.7562 LearningRate 0.0890 Epoch: 1 Global Step: 18900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:23,532-Speed 9576.88 samples/sec Loss 10.8779 LearningRate 0.0890 Epoch: 1 Global Step: 18910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:24,624-Speed 9379.79 samples/sec Loss 10.8022 LearningRate 0.0890 Epoch: 1 Global Step: 18920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:25,694-Speed 9572.47 samples/sec Loss 10.7339 LearningRate 0.0890 Epoch: 1 Global Step: 18930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:26,821-Speed 9093.94 samples/sec Loss 10.7065 LearningRate 0.0890 Epoch: 1 Global Step: 18940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:27,914-Speed 9377.87 samples/sec Loss 10.7215 LearningRate 0.0890 Epoch: 1 Global Step: 18950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:28,976-Speed 9651.07 samples/sec Loss 10.7712 LearningRate 0.0890 Epoch: 1 Global Step: 18960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:30,074-Speed 9333.24 samples/sec Loss 10.7761 LearningRate 0.0890 Epoch: 1 Global Step: 18970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:31,162-Speed 9415.38 samples/sec Loss 10.7703 LearningRate 0.0890 Epoch: 1 Global Step: 18980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:18:32,285-Speed 9127.37 samples/sec Loss 10.6911 LearningRate 0.0889 Epoch: 1 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:33,362-Speed 9508.06 samples/sec Loss 10.7797 LearningRate 0.0889 Epoch: 1 Global Step: 19000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:34,415-Speed 9730.67 samples/sec Loss 10.8568 LearningRate 0.0889 Epoch: 1 Global Step: 19010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:35,490-Speed 9534.97 samples/sec Loss 10.7805 LearningRate 0.0889 Epoch: 1 Global Step: 19020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:36,558-Speed 9587.58 samples/sec Loss 10.6699 LearningRate 0.0889 Epoch: 1 Global Step: 19030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:37,652-Speed 9366.95 samples/sec Loss 10.7952 LearningRate 0.0889 Epoch: 1 Global Step: 19040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:38,766-Speed 9199.69 samples/sec Loss 10.7535 LearningRate 0.0889 Epoch: 1 Global Step: 19050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:39,852-Speed 9434.61 samples/sec Loss 10.7693 LearningRate 0.0889 Epoch: 1 Global Step: 19060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:40,962-Speed 9228.08 samples/sec Loss 10.6477 LearningRate 0.0889 Epoch: 1 Global Step: 19070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:42,080-Speed 9170.62 samples/sec Loss 10.7889 LearningRate 0.0889 Epoch: 1 Global Step: 19080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:43,161-Speed 9474.60 samples/sec Loss 10.7729 LearningRate 0.0889 Epoch: 1 Global Step: 19090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:44,244-Speed 9463.19 samples/sec Loss 10.7642 LearningRate 0.0889 Epoch: 1 Global Step: 19100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:45,340-Speed 9352.10 samples/sec Loss 10.6663 LearningRate 0.0889 Epoch: 1 Global Step: 19110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:46,412-Speed 9554.73 samples/sec Loss 10.7662 LearningRate 0.0889 Epoch: 1 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:47,486-Speed 9545.79 samples/sec Loss 10.6726 LearningRate 0.0889 Epoch: 1 Global Step: 19130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:48,571-Speed 9441.63 samples/sec Loss 10.7560 LearningRate 0.0889 Epoch: 1 Global Step: 19140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:49,623-Speed 9737.26 samples/sec Loss 10.7658 LearningRate 0.0889 Epoch: 1 Global Step: 19150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:50,677-Speed 9727.73 samples/sec Loss 10.8267 LearningRate 0.0889 Epoch: 1 Global Step: 19160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:51,727-Speed 9755.99 samples/sec Loss 10.6676 LearningRate 0.0888 Epoch: 1 Global Step: 19170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:52,815-Speed 9419.16 samples/sec Loss 10.7561 LearningRate 0.0888 Epoch: 1 Global Step: 19180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:53,874-Speed 9676.54 samples/sec Loss 10.9518 LearningRate 0.0888 Epoch: 1 Global Step: 19190 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:18:54,930-Speed 9697.64 samples/sec Loss 10.7417 LearningRate 0.0888 Epoch: 1 Global Step: 19200 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:18:56,018-Speed 9420.84 samples/sec Loss 10.8675 LearningRate 0.0888 Epoch: 1 Global Step: 19210 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:18:57,125-Speed 9254.24 samples/sec Loss 10.7876 LearningRate 0.0888 Epoch: 1 Global Step: 19220 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:18:58,186-Speed 9654.83 samples/sec Loss 10.8671 LearningRate 0.0888 Epoch: 1 Global Step: 19230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:18:59,243-Speed 9699.74 samples/sec Loss 10.7873 LearningRate 0.0888 Epoch: 1 Global Step: 19240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:00,335-Speed 9386.29 samples/sec Loss 10.7893 LearningRate 0.0888 Epoch: 1 Global Step: 19250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:01,458-Speed 9115.78 samples/sec Loss 10.7994 LearningRate 0.0888 Epoch: 1 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:02,511-Speed 9737.40 samples/sec Loss 10.7695 LearningRate 0.0888 Epoch: 1 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:03,632-Speed 9133.88 samples/sec Loss 10.7937 LearningRate 0.0888 Epoch: 1 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:04,689-Speed 9692.75 samples/sec Loss 10.8030 LearningRate 0.0888 Epoch: 1 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:05,750-Speed 9659.94 samples/sec Loss 10.7087 LearningRate 0.0888 Epoch: 1 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:06,815-Speed 9622.66 samples/sec Loss 10.7739 LearningRate 0.0888 Epoch: 1 Global Step: 19310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:07,889-Speed 9540.84 samples/sec Loss 10.7178 LearningRate 0.0888 Epoch: 1 Global Step: 19320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:08,983-Speed 9360.54 samples/sec Loss 10.7433 LearningRate 0.0888 Epoch: 1 Global Step: 19330 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:19:10,080-Speed 9342.22 samples/sec Loss 10.7484 LearningRate 0.0887 Epoch: 1 Global Step: 19340 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:19:11,183-Speed 9295.43 samples/sec Loss 10.8462 LearningRate 0.0887 Epoch: 1 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:12,265-Speed 9463.07 samples/sec Loss 10.7893 LearningRate 0.0887 Epoch: 1 Global Step: 19360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:13,346-Speed 9482.73 samples/sec Loss 10.8337 LearningRate 0.0887 Epoch: 1 Global Step: 19370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:19:14,495-Speed 8916.55 samples/sec Loss 10.7566 LearningRate 0.0887 Epoch: 1 Global Step: 19380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:19:15,621-Speed 9096.58 samples/sec Loss 10.6387 LearningRate 0.0887 Epoch: 1 Global Step: 19390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:19:16,687-Speed 9613.27 samples/sec Loss 10.7331 LearningRate 0.0887 Epoch: 1 Global Step: 19400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:19:17,800-Speed 9219.50 samples/sec Loss 10.7802 LearningRate 0.0887 Epoch: 1 Global Step: 19410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:19:18,900-Speed 9312.90 samples/sec Loss 10.6722 LearningRate 0.0887 Epoch: 1 Global Step: 19420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:19:20,026-Speed 9097.30 samples/sec Loss 10.6167 LearningRate 0.0887 Epoch: 1 Global Step: 19430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:19:21,135-Speed 9240.02 samples/sec Loss 10.7549 LearningRate 0.0887 Epoch: 1 Global Step: 19440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:19:22,264-Speed 9075.82 samples/sec Loss 10.6715 LearningRate 0.0887 Epoch: 1 Global Step: 19450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:19:23,389-Speed 9100.59 samples/sec Loss 10.8036 LearningRate 0.0887 Epoch: 1 Global Step: 19460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:19:24,488-Speed 9325.09 samples/sec Loss 10.7606 LearningRate 0.0887 Epoch: 1 Global Step: 19470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:25,554-Speed 9615.37 samples/sec Loss 10.7314 LearningRate 0.0887 Epoch: 1 Global Step: 19480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:26,623-Speed 9583.86 samples/sec Loss 10.7554 LearningRate 0.0887 Epoch: 1 Global Step: 19490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:27,676-Speed 9729.78 samples/sec Loss 10.7534 LearningRate 0.0887 Epoch: 1 Global Step: 19500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:28,768-Speed 9389.42 samples/sec Loss 10.7790 LearningRate 0.0887 Epoch: 1 Global Step: 19510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:29,867-Speed 9324.82 samples/sec Loss 10.7053 LearningRate 0.0886 Epoch: 1 Global Step: 19520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:30,978-Speed 9216.68 samples/sec Loss 10.7521 LearningRate 0.0886 Epoch: 1 Global Step: 19530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:32,058-Speed 9490.41 samples/sec Loss 10.8390 LearningRate 0.0886 Epoch: 1 Global Step: 19540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:33,167-Speed 9241.25 samples/sec Loss 10.6409 LearningRate 0.0886 Epoch: 1 Global Step: 19550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:34,282-Speed 9188.14 samples/sec Loss 10.6438 LearningRate 0.0886 Epoch: 1 Global Step: 19560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:35,383-Speed 9305.96 samples/sec Loss 10.7333 LearningRate 0.0886 Epoch: 1 Global Step: 19570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:36,495-Speed 9213.83 samples/sec Loss 10.7123 LearningRate 0.0886 Epoch: 1 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:37,613-Speed 9164.81 samples/sec Loss 10.8527 LearningRate 0.0886 Epoch: 1 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:38,711-Speed 9330.78 samples/sec Loss 10.6695 LearningRate 0.0886 Epoch: 1 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:39,791-Speed 9479.95 samples/sec Loss 10.7843 LearningRate 0.0886 Epoch: 1 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:40,907-Speed 9190.87 samples/sec Loss 10.7466 LearningRate 0.0886 Epoch: 1 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:42,000-Speed 9373.64 samples/sec Loss 10.7201 LearningRate 0.0886 Epoch: 1 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:19:43,109-Speed 9238.22 samples/sec Loss 10.7362 LearningRate 0.0886 Epoch: 1 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:44,189-Speed 9484.83 samples/sec Loss 10.5849 LearningRate 0.0886 Epoch: 1 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:45,314-Speed 9106.59 samples/sec Loss 10.7296 LearningRate 0.0886 Epoch: 1 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:46,403-Speed 9414.63 samples/sec Loss 10.8111 LearningRate 0.0886 Epoch: 1 Global Step: 19670 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:19:47,479-Speed 9519.55 samples/sec Loss 10.8104 LearningRate 0.0886 Epoch: 1 Global Step: 19680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:48,569-Speed 9403.96 samples/sec Loss 10.7312 LearningRate 0.0886 Epoch: 1 Global Step: 19690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:49,657-Speed 9419.49 samples/sec Loss 10.7839 LearningRate 0.0885 Epoch: 1 Global Step: 19700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:50,743-Speed 9434.24 samples/sec Loss 10.7490 LearningRate 0.0885 Epoch: 1 Global Step: 19710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:51,793-Speed 9755.70 samples/sec Loss 10.7593 LearningRate 0.0885 Epoch: 1 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:52,880-Speed 9428.24 samples/sec Loss 10.7427 LearningRate 0.0885 Epoch: 1 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:54,008-Speed 9082.30 samples/sec Loss 10.7352 LearningRate 0.0885 Epoch: 1 Global Step: 19740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:55,092-Speed 9450.35 samples/sec Loss 10.6681 LearningRate 0.0885 Epoch: 1 Global Step: 19750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:56,185-Speed 9379.32 samples/sec Loss 10.7835 LearningRate 0.0885 Epoch: 1 Global Step: 19760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:57,304-Speed 9153.65 samples/sec Loss 10.8730 LearningRate 0.0885 Epoch: 1 Global Step: 19770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:58,413-Speed 9235.73 samples/sec Loss 10.7490 LearningRate 0.0885 Epoch: 1 Global Step: 19780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:19:59,512-Speed 9324.19 samples/sec Loss 10.7979 LearningRate 0.0885 Epoch: 1 Global Step: 19790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:00,632-Speed 9142.59 samples/sec Loss 10.6528 LearningRate 0.0885 Epoch: 1 Global Step: 19800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:01,706-Speed 9543.68 samples/sec Loss 10.8222 LearningRate 0.0885 Epoch: 1 Global Step: 19810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:02,799-Speed 9375.99 samples/sec Loss 10.6647 LearningRate 0.0885 Epoch: 1 Global Step: 19820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:03,859-Speed 9667.53 samples/sec Loss 10.8203 LearningRate 0.0885 Epoch: 1 Global Step: 19830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:04,927-Speed 9599.74 samples/sec Loss 10.6521 LearningRate 0.0885 Epoch: 1 Global Step: 19840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:06,095-Speed 8770.15 samples/sec Loss 10.7133 LearningRate 0.0885 Epoch: 1 Global Step: 19850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:07,181-Speed 9435.54 samples/sec Loss 10.7909 LearningRate 0.0885 Epoch: 1 Global Step: 19860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:08,313-Speed 9050.20 samples/sec Loss 10.7242 LearningRate 0.0884 Epoch: 1 Global Step: 19870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:09,385-Speed 9560.06 samples/sec Loss 10.5222 LearningRate 0.0884 Epoch: 1 Global Step: 19880 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:20:10,462-Speed 9507.28 samples/sec Loss 10.8281 LearningRate 0.0884 Epoch: 1 Global Step: 19890 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:20:11,611-Speed 8915.04 samples/sec Loss 10.6846 LearningRate 0.0884 Epoch: 1 Global Step: 19900 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:20:12,713-Speed 9300.04 samples/sec Loss 10.6810 LearningRate 0.0884 Epoch: 1 Global Step: 19910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:13,830-Speed 9171.29 samples/sec Loss 10.6147 LearningRate 0.0884 Epoch: 1 Global Step: 19920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:14,923-Speed 9378.19 samples/sec Loss 10.6906 LearningRate 0.0884 Epoch: 1 Global Step: 19930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:16,020-Speed 9337.40 samples/sec Loss 10.6311 LearningRate 0.0884 Epoch: 1 Global Step: 19940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:17,096-Speed 9521.39 samples/sec Loss 10.6934 LearningRate 0.0884 Epoch: 1 Global Step: 19950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:18,153-Speed 9699.96 samples/sec Loss 10.6639 LearningRate 0.0884 Epoch: 1 Global Step: 19960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:19,255-Speed 9293.17 samples/sec Loss 10.7168 LearningRate 0.0884 Epoch: 1 Global Step: 19970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:20,375-Speed 9151.86 samples/sec Loss 10.7195 LearningRate 0.0884 Epoch: 1 Global Step: 19980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:21,455-Speed 9486.40 samples/sec Loss 10.6113 LearningRate 0.0884 Epoch: 1 Global Step: 19990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:22,568-Speed 9211.05 samples/sec Loss 10.5440 LearningRate 0.0884 Epoch: 1 Global Step: 20000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:20:44,654-[lfw][20000]XNorm: 13.832026 Training: 2022-04-11 12:20:44,655-[lfw][20000]Accuracy-Flip: 0.99383+-0.00373 Training: 2022-04-11 12:20:44,655-[lfw][20000]Accuracy-Highest: 0.99383 Training: 2022-04-11 12:21:10,148-[cfp_fp][20000]XNorm: 11.748980 Training: 2022-04-11 12:21:10,149-[cfp_fp][20000]Accuracy-Flip: 0.92257+-0.01751 Training: 2022-04-11 12:21:10,150-[cfp_fp][20000]Accuracy-Highest: 0.92257 Training: 2022-04-11 12:21:32,199-[agedb_30][20000]XNorm: 13.340283 Training: 2022-04-11 12:21:32,200-[agedb_30][20000]Accuracy-Flip: 0.93417+-0.01375 Training: 2022-04-11 12:21:32,201-[agedb_30][20000]Accuracy-Highest: 0.93433 Training: 2022-04-11 12:21:33,294-Speed 144.79 samples/sec Loss 10.7104 LearningRate 0.0884 Epoch: 1 Global Step: 20010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:34,389-Speed 9358.17 samples/sec Loss 10.5733 LearningRate 0.0884 Epoch: 1 Global Step: 20020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:35,464-Speed 9526.09 samples/sec Loss 10.7952 LearningRate 0.0884 Epoch: 1 Global Step: 20030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:36,535-Speed 9567.56 samples/sec Loss 10.5917 LearningRate 0.0884 Epoch: 1 Global Step: 20040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:37,625-Speed 9402.65 samples/sec Loss 10.8281 LearningRate 0.0883 Epoch: 1 Global Step: 20050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:38,711-Speed 9432.96 samples/sec Loss 10.7512 LearningRate 0.0883 Epoch: 1 Global Step: 20060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:39,779-Speed 9600.10 samples/sec Loss 10.6674 LearningRate 0.0883 Epoch: 1 Global Step: 20070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:40,822-Speed 9820.31 samples/sec Loss 10.6337 LearningRate 0.0883 Epoch: 1 Global Step: 20080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:41,889-Speed 9600.71 samples/sec Loss 10.7269 LearningRate 0.0883 Epoch: 1 Global Step: 20090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:42,953-Speed 9632.33 samples/sec Loss 10.6887 LearningRate 0.0883 Epoch: 1 Global Step: 20100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:44,020-Speed 9602.11 samples/sec Loss 10.6978 LearningRate 0.0883 Epoch: 1 Global Step: 20110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:21:45,124-Speed 9276.55 samples/sec Loss 10.6848 LearningRate 0.0883 Epoch: 1 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:46,209-Speed 9444.66 samples/sec Loss 10.7738 LearningRate 0.0883 Epoch: 1 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:47,294-Speed 9449.90 samples/sec Loss 10.7653 LearningRate 0.0883 Epoch: 1 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:48,402-Speed 9241.77 samples/sec Loss 10.7595 LearningRate 0.0883 Epoch: 1 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:49,543-Speed 8981.81 samples/sec Loss 10.8360 LearningRate 0.0883 Epoch: 1 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:50,608-Speed 9620.90 samples/sec Loss 10.7603 LearningRate 0.0883 Epoch: 1 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:51,707-Speed 9320.06 samples/sec Loss 10.8175 LearningRate 0.0883 Epoch: 1 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:52,778-Speed 9569.44 samples/sec Loss 10.5257 LearningRate 0.0883 Epoch: 1 Global Step: 20190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:53,873-Speed 9361.08 samples/sec Loss 10.7392 LearningRate 0.0883 Epoch: 1 Global Step: 20200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:21:54,953-Speed 9491.20 samples/sec Loss 10.7620 LearningRate 0.0883 Epoch: 1 Global Step: 20210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:21:56,038-Speed 9440.48 samples/sec Loss 10.7555 LearningRate 0.0883 Epoch: 1 Global Step: 20220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:21:57,161-Speed 9121.05 samples/sec Loss 10.5662 LearningRate 0.0882 Epoch: 1 Global Step: 20230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:21:58,299-Speed 9002.25 samples/sec Loss 10.7779 LearningRate 0.0882 Epoch: 1 Global Step: 20240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:21:59,402-Speed 9292.56 samples/sec Loss 10.7917 LearningRate 0.0882 Epoch: 1 Global Step: 20250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:00,455-Speed 9730.85 samples/sec Loss 10.7439 LearningRate 0.0882 Epoch: 1 Global Step: 20260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:01,506-Speed 9742.01 samples/sec Loss 10.6818 LearningRate 0.0882 Epoch: 1 Global Step: 20270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:02,560-Speed 9725.40 samples/sec Loss 10.6588 LearningRate 0.0882 Epoch: 1 Global Step: 20280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:03,606-Speed 9789.53 samples/sec Loss 10.8553 LearningRate 0.0882 Epoch: 1 Global Step: 20290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:04,647-Speed 9848.81 samples/sec Loss 10.7270 LearningRate 0.0882 Epoch: 1 Global Step: 20300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:05,708-Speed 9655.86 samples/sec Loss 10.7101 LearningRate 0.0882 Epoch: 1 Global Step: 20310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:06,770-Speed 9646.87 samples/sec Loss 10.6262 LearningRate 0.0882 Epoch: 1 Global Step: 20320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:07,824-Speed 9721.94 samples/sec Loss 10.5392 LearningRate 0.0882 Epoch: 1 Global Step: 20330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:08,921-Speed 9339.14 samples/sec Loss 10.7871 LearningRate 0.0882 Epoch: 1 Global Step: 20340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:10,026-Speed 9272.90 samples/sec Loss 10.8169 LearningRate 0.0882 Epoch: 1 Global Step: 20350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:11,133-Speed 9257.92 samples/sec Loss 10.7157 LearningRate 0.0882 Epoch: 1 Global Step: 20360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:12,204-Speed 9571.56 samples/sec Loss 10.6630 LearningRate 0.0882 Epoch: 1 Global Step: 20370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:13,259-Speed 9708.01 samples/sec Loss 10.6168 LearningRate 0.0882 Epoch: 1 Global Step: 20380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:14,319-Speed 9670.34 samples/sec Loss 10.6506 LearningRate 0.0882 Epoch: 1 Global Step: 20390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:15,379-Speed 9663.27 samples/sec Loss 10.7491 LearningRate 0.0882 Epoch: 1 Global Step: 20400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:16,478-Speed 9330.35 samples/sec Loss 10.7160 LearningRate 0.0881 Epoch: 1 Global Step: 20410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:17,543-Speed 9613.39 samples/sec Loss 10.6581 LearningRate 0.0881 Epoch: 1 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:18,649-Speed 9269.98 samples/sec Loss 10.7002 LearningRate 0.0881 Epoch: 1 Global Step: 20430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:19,739-Speed 9402.25 samples/sec Loss 10.5835 LearningRate 0.0881 Epoch: 1 Global Step: 20440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:20,819-Speed 9479.57 samples/sec Loss 10.7084 LearningRate 0.0881 Epoch: 1 Global Step: 20450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:21,910-Speed 9395.98 samples/sec Loss 10.7959 LearningRate 0.0881 Epoch: 1 Global Step: 20460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:22,985-Speed 9537.59 samples/sec Loss 10.8569 LearningRate 0.0881 Epoch: 1 Global Step: 20470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:24,090-Speed 9268.49 samples/sec Loss 10.7308 LearningRate 0.0881 Epoch: 1 Global Step: 20480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:25,214-Speed 9117.21 samples/sec Loss 10.7536 LearningRate 0.0881 Epoch: 1 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:26,285-Speed 9567.95 samples/sec Loss 10.6756 LearningRate 0.0881 Epoch: 1 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:27,385-Speed 9313.38 samples/sec Loss 10.8099 LearningRate 0.0881 Epoch: 1 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:22:28,457-Speed 9557.76 samples/sec Loss 10.5507 LearningRate 0.0881 Epoch: 1 Global Step: 20520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:29,551-Speed 9362.64 samples/sec Loss 10.5772 LearningRate 0.0881 Epoch: 1 Global Step: 20530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:30,624-Speed 9556.66 samples/sec Loss 10.6399 LearningRate 0.0881 Epoch: 1 Global Step: 20540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:31,673-Speed 9762.39 samples/sec Loss 10.7334 LearningRate 0.0881 Epoch: 1 Global Step: 20550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:32,783-Speed 9231.15 samples/sec Loss 10.6409 LearningRate 0.0881 Epoch: 1 Global Step: 20560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:33,865-Speed 9469.29 samples/sec Loss 10.7275 LearningRate 0.0881 Epoch: 1 Global Step: 20570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:34,937-Speed 9563.17 samples/sec Loss 10.7162 LearningRate 0.0881 Epoch: 1 Global Step: 20580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:36,031-Speed 9369.80 samples/sec Loss 10.7910 LearningRate 0.0880 Epoch: 1 Global Step: 20590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:37,099-Speed 9590.96 samples/sec Loss 10.7013 LearningRate 0.0880 Epoch: 1 Global Step: 20600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:38,242-Speed 8964.08 samples/sec Loss 10.6827 LearningRate 0.0880 Epoch: 1 Global Step: 20610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:39,328-Speed 9431.18 samples/sec Loss 10.8137 LearningRate 0.0880 Epoch: 1 Global Step: 20620 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:22:40,430-Speed 9298.94 samples/sec Loss 10.6835 LearningRate 0.0880 Epoch: 1 Global Step: 20630 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:22:41,499-Speed 9589.58 samples/sec Loss 10.6843 LearningRate 0.0880 Epoch: 1 Global Step: 20640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:42,598-Speed 9321.82 samples/sec Loss 10.6728 LearningRate 0.0880 Epoch: 1 Global Step: 20650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:43,684-Speed 9430.33 samples/sec Loss 10.7580 LearningRate 0.0880 Epoch: 1 Global Step: 20660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:44,744-Speed 9667.88 samples/sec Loss 10.5834 LearningRate 0.0880 Epoch: 1 Global Step: 20670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:45,829-Speed 9448.97 samples/sec Loss 10.7169 LearningRate 0.0880 Epoch: 1 Global Step: 20680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:46,910-Speed 9473.43 samples/sec Loss 10.5097 LearningRate 0.0880 Epoch: 1 Global Step: 20690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:47,995-Speed 9443.55 samples/sec Loss 10.6973 LearningRate 0.0880 Epoch: 1 Global Step: 20700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:49,088-Speed 9372.09 samples/sec Loss 10.6618 LearningRate 0.0880 Epoch: 1 Global Step: 20710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:50,181-Speed 9381.33 samples/sec Loss 10.6414 LearningRate 0.0880 Epoch: 1 Global Step: 20720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:51,248-Speed 9595.21 samples/sec Loss 10.6237 LearningRate 0.0880 Epoch: 1 Global Step: 20730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:52,291-Speed 9825.96 samples/sec Loss 10.7041 LearningRate 0.0880 Epoch: 1 Global Step: 20740 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:22:53,364-Speed 9556.88 samples/sec Loss 10.6172 LearningRate 0.0880 Epoch: 1 Global Step: 20750 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:22:54,435-Speed 9562.41 samples/sec Loss 10.5922 LearningRate 0.0879 Epoch: 1 Global Step: 20760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:22:55,492-Speed 9694.21 samples/sec Loss 10.6505 LearningRate 0.0879 Epoch: 1 Global Step: 20770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:56,591-Speed 9325.25 samples/sec Loss 10.7077 LearningRate 0.0879 Epoch: 1 Global Step: 20780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:57,668-Speed 9514.31 samples/sec Loss 10.5843 LearningRate 0.0879 Epoch: 1 Global Step: 20790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:58,785-Speed 9169.09 samples/sec Loss 10.7184 LearningRate 0.0879 Epoch: 1 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:22:59,851-Speed 9615.20 samples/sec Loss 10.6391 LearningRate 0.0879 Epoch: 1 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:00,920-Speed 9581.43 samples/sec Loss 10.7262 LearningRate 0.0879 Epoch: 1 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:02,010-Speed 9403.65 samples/sec Loss 10.6754 LearningRate 0.0879 Epoch: 1 Global Step: 20830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:03,110-Speed 9318.71 samples/sec Loss 10.5945 LearningRate 0.0879 Epoch: 1 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:04,245-Speed 9028.10 samples/sec Loss 10.7485 LearningRate 0.0879 Epoch: 1 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:05,290-Speed 9806.63 samples/sec Loss 10.6739 LearningRate 0.0879 Epoch: 1 Global Step: 20860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:06,353-Speed 9636.90 samples/sec Loss 10.7512 LearningRate 0.0879 Epoch: 1 Global Step: 20870 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:23:07,411-Speed 9681.46 samples/sec Loss 10.6768 LearningRate 0.0879 Epoch: 1 Global Step: 20880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:23:08,546-Speed 9031.45 samples/sec Loss 10.5162 LearningRate 0.0879 Epoch: 1 Global Step: 20890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:09,671-Speed 9104.98 samples/sec Loss 10.6642 LearningRate 0.0879 Epoch: 1 Global Step: 20900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:10,740-Speed 9586.05 samples/sec Loss 10.7696 LearningRate 0.0879 Epoch: 1 Global Step: 20910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:11,828-Speed 9416.76 samples/sec Loss 10.5781 LearningRate 0.0879 Epoch: 1 Global Step: 20920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:12,990-Speed 8812.44 samples/sec Loss 10.5929 LearningRate 0.0879 Epoch: 1 Global Step: 20930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:14,095-Speed 9278.26 samples/sec Loss 10.6150 LearningRate 0.0878 Epoch: 1 Global Step: 20940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:15,161-Speed 9613.75 samples/sec Loss 10.7179 LearningRate 0.0878 Epoch: 1 Global Step: 20950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:16,266-Speed 9276.92 samples/sec Loss 10.6442 LearningRate 0.0878 Epoch: 1 Global Step: 20960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:17,379-Speed 9201.86 samples/sec Loss 10.6734 LearningRate 0.0878 Epoch: 1 Global Step: 20970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:18,450-Speed 9569.85 samples/sec Loss 10.6120 LearningRate 0.0878 Epoch: 1 Global Step: 20980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:19,528-Speed 9502.53 samples/sec Loss 10.6470 LearningRate 0.0878 Epoch: 1 Global Step: 20990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:20,614-Speed 9430.52 samples/sec Loss 10.5393 LearningRate 0.0878 Epoch: 1 Global Step: 21000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:21,669-Speed 9721.09 samples/sec Loss 10.5581 LearningRate 0.0878 Epoch: 1 Global Step: 21010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:22,764-Speed 9358.34 samples/sec Loss 10.6982 LearningRate 0.0878 Epoch: 1 Global Step: 21020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:23,889-Speed 9105.20 samples/sec Loss 10.7087 LearningRate 0.0878 Epoch: 1 Global Step: 21030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:24,983-Speed 9366.11 samples/sec Loss 10.6099 LearningRate 0.0878 Epoch: 1 Global Step: 21040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:26,071-Speed 9419.56 samples/sec Loss 10.5759 LearningRate 0.0878 Epoch: 1 Global Step: 21050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:27,149-Speed 9498.01 samples/sec Loss 10.6313 LearningRate 0.0878 Epoch: 1 Global Step: 21060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:28,281-Speed 9053.32 samples/sec Loss 10.5412 LearningRate 0.0878 Epoch: 1 Global Step: 21070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:29,433-Speed 8890.88 samples/sec Loss 10.6015 LearningRate 0.0878 Epoch: 1 Global Step: 21080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:30,504-Speed 9566.08 samples/sec Loss 10.5520 LearningRate 0.0878 Epoch: 1 Global Step: 21090 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:23:31,562-Speed 9687.39 samples/sec Loss 10.4695 LearningRate 0.0878 Epoch: 1 Global Step: 21100 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:23:32,700-Speed 9009.23 samples/sec Loss 10.6797 LearningRate 0.0878 Epoch: 1 Global Step: 21110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:23:33,804-Speed 9274.61 samples/sec Loss 10.6909 LearningRate 0.0877 Epoch: 1 Global Step: 21120 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:23:34,857-Speed 9738.28 samples/sec Loss 10.5583 LearningRate 0.0877 Epoch: 1 Global Step: 21130 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:23:35,936-Speed 9491.43 samples/sec Loss 10.4924 LearningRate 0.0877 Epoch: 1 Global Step: 21140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:37,018-Speed 9474.62 samples/sec Loss 10.4969 LearningRate 0.0877 Epoch: 1 Global Step: 21150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:38,095-Speed 9512.89 samples/sec Loss 10.6115 LearningRate 0.0877 Epoch: 1 Global Step: 21160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:39,188-Speed 9372.97 samples/sec Loss 10.5438 LearningRate 0.0877 Epoch: 1 Global Step: 21170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:40,285-Speed 9335.68 samples/sec Loss 10.5456 LearningRate 0.0877 Epoch: 1 Global Step: 21180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:41,354-Speed 9587.28 samples/sec Loss 10.5583 LearningRate 0.0877 Epoch: 1 Global Step: 21190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:42,428-Speed 9541.07 samples/sec Loss 10.6985 LearningRate 0.0877 Epoch: 1 Global Step: 21200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:43,482-Speed 9714.79 samples/sec Loss 10.4817 LearningRate 0.0877 Epoch: 1 Global Step: 21210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:23:44,570-Speed 9422.49 samples/sec Loss 10.6604 LearningRate 0.0877 Epoch: 1 Global Step: 21220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:45,666-Speed 9343.48 samples/sec Loss 10.5686 LearningRate 0.0877 Epoch: 1 Global Step: 21230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:46,757-Speed 9399.27 samples/sec Loss 10.6045 LearningRate 0.0877 Epoch: 1 Global Step: 21240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:23:47,823-Speed 9610.25 samples/sec Loss 10.5822 LearningRate 0.0877 Epoch: 1 Global Step: 21250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:48,895-Speed 9559.15 samples/sec Loss 10.6647 LearningRate 0.0877 Epoch: 1 Global Step: 21260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:49,964-Speed 9583.09 samples/sec Loss 10.6536 LearningRate 0.0877 Epoch: 1 Global Step: 21270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:51,078-Speed 9196.98 samples/sec Loss 10.7121 LearningRate 0.0877 Epoch: 1 Global Step: 21280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:52,199-Speed 9141.61 samples/sec Loss 10.6063 LearningRate 0.0877 Epoch: 1 Global Step: 21290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:53,265-Speed 9611.22 samples/sec Loss 10.7077 LearningRate 0.0876 Epoch: 1 Global Step: 21300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:54,333-Speed 9600.37 samples/sec Loss 10.5154 LearningRate 0.0876 Epoch: 1 Global Step: 21310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:55,392-Speed 9670.93 samples/sec Loss 10.5617 LearningRate 0.0876 Epoch: 1 Global Step: 21320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:56,450-Speed 9684.39 samples/sec Loss 10.5770 LearningRate 0.0876 Epoch: 1 Global Step: 21330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:57,519-Speed 9580.98 samples/sec Loss 10.6684 LearningRate 0.0876 Epoch: 1 Global Step: 21340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:58,600-Speed 9477.51 samples/sec Loss 10.6837 LearningRate 0.0876 Epoch: 1 Global Step: 21350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:23:59,762-Speed 8822.39 samples/sec Loss 10.5150 LearningRate 0.0876 Epoch: 1 Global Step: 21360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:00,861-Speed 9325.45 samples/sec Loss 10.5159 LearningRate 0.0876 Epoch: 1 Global Step: 21370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:01,969-Speed 9246.54 samples/sec Loss 10.7236 LearningRate 0.0876 Epoch: 1 Global Step: 21380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:03,008-Speed 9856.70 samples/sec Loss 10.6637 LearningRate 0.0876 Epoch: 1 Global Step: 21390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:04,096-Speed 9417.27 samples/sec Loss 10.5238 LearningRate 0.0876 Epoch: 1 Global Step: 21400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:05,199-Speed 9290.82 samples/sec Loss 10.5995 LearningRate 0.0876 Epoch: 1 Global Step: 21410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:06,268-Speed 9588.75 samples/sec Loss 10.5176 LearningRate 0.0876 Epoch: 1 Global Step: 21420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:07,366-Speed 9328.53 samples/sec Loss 10.6418 LearningRate 0.0876 Epoch: 1 Global Step: 21430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:08,443-Speed 9520.21 samples/sec Loss 10.4559 LearningRate 0.0876 Epoch: 1 Global Step: 21440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:09,555-Speed 9211.80 samples/sec Loss 10.5420 LearningRate 0.0876 Epoch: 1 Global Step: 21450 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:24:10,626-Speed 9567.96 samples/sec Loss 10.5625 LearningRate 0.0876 Epoch: 1 Global Step: 21460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:24:11,685-Speed 9673.95 samples/sec Loss 10.6341 LearningRate 0.0876 Epoch: 1 Global Step: 21470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:12,744-Speed 9680.05 samples/sec Loss 10.7319 LearningRate 0.0875 Epoch: 1 Global Step: 21480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:13,838-Speed 9362.33 samples/sec Loss 10.5825 LearningRate 0.0875 Epoch: 1 Global Step: 21490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:14,925-Speed 9425.06 samples/sec Loss 10.5363 LearningRate 0.0875 Epoch: 1 Global Step: 21500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:15,978-Speed 9729.06 samples/sec Loss 10.3902 LearningRate 0.0875 Epoch: 1 Global Step: 21510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:17,060-Speed 9469.24 samples/sec Loss 10.5600 LearningRate 0.0875 Epoch: 1 Global Step: 21520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:18,144-Speed 9453.41 samples/sec Loss 10.5266 LearningRate 0.0875 Epoch: 1 Global Step: 21530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:19,196-Speed 9739.49 samples/sec Loss 10.6076 LearningRate 0.0875 Epoch: 1 Global Step: 21540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:20,285-Speed 9407.20 samples/sec Loss 10.6040 LearningRate 0.0875 Epoch: 1 Global Step: 21550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:21,344-Speed 9678.54 samples/sec Loss 10.4400 LearningRate 0.0875 Epoch: 1 Global Step: 21560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:22,457-Speed 9204.99 samples/sec Loss 10.5324 LearningRate 0.0875 Epoch: 1 Global Step: 21570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:24:23,540-Speed 9465.15 samples/sec Loss 10.5145 LearningRate 0.0875 Epoch: 1 Global Step: 21580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:24,622-Speed 9472.46 samples/sec Loss 10.6145 LearningRate 0.0875 Epoch: 1 Global Step: 21590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:25,664-Speed 9833.83 samples/sec Loss 10.5065 LearningRate 0.0875 Epoch: 1 Global Step: 21600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:26,758-Speed 9366.00 samples/sec Loss 10.5579 LearningRate 0.0875 Epoch: 1 Global Step: 21610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:27,822-Speed 9621.68 samples/sec Loss 10.5528 LearningRate 0.0875 Epoch: 1 Global Step: 21620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:28,922-Speed 9315.80 samples/sec Loss 10.6590 LearningRate 0.0875 Epoch: 1 Global Step: 21630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:30,038-Speed 9183.14 samples/sec Loss 10.5925 LearningRate 0.0875 Epoch: 1 Global Step: 21640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:31,151-Speed 9206.13 samples/sec Loss 10.6071 LearningRate 0.0874 Epoch: 1 Global Step: 21650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:32,221-Speed 9580.89 samples/sec Loss 10.5458 LearningRate 0.0874 Epoch: 1 Global Step: 21660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:33,323-Speed 9292.69 samples/sec Loss 10.5553 LearningRate 0.0874 Epoch: 1 Global Step: 21670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:34,408-Speed 9441.62 samples/sec Loss 10.5002 LearningRate 0.0874 Epoch: 1 Global Step: 21680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:35,493-Speed 9452.32 samples/sec Loss 10.5427 LearningRate 0.0874 Epoch: 1 Global Step: 21690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:36,600-Speed 9254.45 samples/sec Loss 10.5575 LearningRate 0.0874 Epoch: 1 Global Step: 21700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:37,688-Speed 9413.65 samples/sec Loss 10.5588 LearningRate 0.0874 Epoch: 1 Global Step: 21710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:38,727-Speed 9858.18 samples/sec Loss 10.6535 LearningRate 0.0874 Epoch: 1 Global Step: 21720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:39,807-Speed 9487.58 samples/sec Loss 10.5903 LearningRate 0.0874 Epoch: 1 Global Step: 21730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:40,915-Speed 9244.15 samples/sec Loss 10.4848 LearningRate 0.0874 Epoch: 1 Global Step: 21740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:41,996-Speed 9486.32 samples/sec Loss 10.5308 LearningRate 0.0874 Epoch: 1 Global Step: 21750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:43,140-Speed 8956.45 samples/sec Loss 10.5724 LearningRate 0.0874 Epoch: 1 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:44,278-Speed 9000.32 samples/sec Loss 10.5411 LearningRate 0.0874 Epoch: 1 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:45,361-Speed 9460.47 samples/sec Loss 10.6020 LearningRate 0.0874 Epoch: 1 Global Step: 21780 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:24:46,427-Speed 9612.61 samples/sec Loss 10.5885 LearningRate 0.0874 Epoch: 1 Global Step: 21790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:47,522-Speed 9364.40 samples/sec Loss 10.6697 LearningRate 0.0874 Epoch: 1 Global Step: 21800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:24:48,636-Speed 9190.18 samples/sec Loss 10.5559 LearningRate 0.0874 Epoch: 1 Global Step: 21810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:24:49,759-Speed 9124.74 samples/sec Loss 10.5105 LearningRate 0.0874 Epoch: 1 Global Step: 21820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:24:50,853-Speed 9366.26 samples/sec Loss 10.4907 LearningRate 0.0873 Epoch: 1 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:24:51,912-Speed 9672.61 samples/sec Loss 10.5998 LearningRate 0.0873 Epoch: 1 Global Step: 21840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:24:52,977-Speed 9625.22 samples/sec Loss 10.5149 LearningRate 0.0873 Epoch: 1 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:24:54,064-Speed 9428.23 samples/sec Loss 10.5805 LearningRate 0.0873 Epoch: 1 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:24:55,166-Speed 9295.89 samples/sec Loss 10.6261 LearningRate 0.0873 Epoch: 1 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:24:56,259-Speed 9375.04 samples/sec Loss 10.4220 LearningRate 0.0873 Epoch: 1 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:24:57,367-Speed 9248.19 samples/sec Loss 10.6654 LearningRate 0.0873 Epoch: 1 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:24:58,431-Speed 9627.02 samples/sec Loss 10.7219 LearningRate 0.0873 Epoch: 1 Global Step: 21900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:24:59,524-Speed 9376.85 samples/sec Loss 10.5424 LearningRate 0.0873 Epoch: 1 Global Step: 21910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:25:00,633-Speed 9241.12 samples/sec Loss 10.6173 LearningRate 0.0873 Epoch: 1 Global Step: 21920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:25:01,730-Speed 9335.12 samples/sec Loss 10.5134 LearningRate 0.0873 Epoch: 1 Global Step: 21930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:25:02,783-Speed 9729.91 samples/sec Loss 10.5015 LearningRate 0.0873 Epoch: 1 Global Step: 21940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:25:03,873-Speed 9402.38 samples/sec Loss 10.5379 LearningRate 0.0873 Epoch: 1 Global Step: 21950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:25:04,946-Speed 9549.53 samples/sec Loss 10.5907 LearningRate 0.0873 Epoch: 1 Global Step: 21960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:25:06,017-Speed 9573.16 samples/sec Loss 10.4607 LearningRate 0.0873 Epoch: 1 Global Step: 21970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:25:07,113-Speed 9346.40 samples/sec Loss 10.5958 LearningRate 0.0873 Epoch: 1 Global Step: 21980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:25:08,210-Speed 9342.57 samples/sec Loss 10.4098 LearningRate 0.0873 Epoch: 1 Global Step: 21990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:25:09,288-Speed 9498.10 samples/sec Loss 10.5701 LearningRate 0.0873 Epoch: 1 Global Step: 22000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:25:31,191-[lfw][22000]XNorm: 14.017649 Training: 2022-04-11 12:25:31,192-[lfw][22000]Accuracy-Flip: 0.99267+-0.00410 Training: 2022-04-11 12:25:31,193-[lfw][22000]Accuracy-Highest: 0.99383 Training: 2022-04-11 12:25:56,456-[cfp_fp][22000]XNorm: 11.786355 Training: 2022-04-11 12:25:56,456-[cfp_fp][22000]Accuracy-Flip: 0.92629+-0.01027 Training: 2022-04-11 12:25:56,457-[cfp_fp][22000]Accuracy-Highest: 0.92629 Training: 2022-04-11 12:26:18,244-[agedb_30][22000]XNorm: 13.529340 Training: 2022-04-11 12:26:18,244-[agedb_30][22000]Accuracy-Flip: 0.92400+-0.01711 Training: 2022-04-11 12:26:18,245-[agedb_30][22000]Accuracy-Highest: 0.93433 Training: 2022-04-11 12:26:19,338-Speed 146.18 samples/sec Loss 10.6185 LearningRate 0.0872 Epoch: 1 Global Step: 22010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:26:20,405-Speed 9601.90 samples/sec Loss 10.5722 LearningRate 0.0872 Epoch: 1 Global Step: 22020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:21,526-Speed 9142.29 samples/sec Loss 10.5398 LearningRate 0.0872 Epoch: 1 Global Step: 22030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:22,608-Speed 9470.54 samples/sec Loss 10.4360 LearningRate 0.0872 Epoch: 1 Global Step: 22040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:23,667-Speed 9671.48 samples/sec Loss 10.5502 LearningRate 0.0872 Epoch: 1 Global Step: 22050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:24,760-Speed 9374.10 samples/sec Loss 10.4915 LearningRate 0.0872 Epoch: 1 Global Step: 22060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:25,843-Speed 9460.89 samples/sec Loss 10.4926 LearningRate 0.0872 Epoch: 1 Global Step: 22070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:26,898-Speed 9713.26 samples/sec Loss 10.5980 LearningRate 0.0872 Epoch: 1 Global Step: 22080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:27,950-Speed 9737.61 samples/sec Loss 10.6281 LearningRate 0.0872 Epoch: 1 Global Step: 22090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:29,006-Speed 9703.04 samples/sec Loss 10.5684 LearningRate 0.0872 Epoch: 1 Global Step: 22100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:30,087-Speed 9480.14 samples/sec Loss 10.6833 LearningRate 0.0872 Epoch: 1 Global Step: 22110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:31,157-Speed 9575.90 samples/sec Loss 10.5762 LearningRate 0.0872 Epoch: 1 Global Step: 22120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:32,230-Speed 9551.10 samples/sec Loss 10.5693 LearningRate 0.0872 Epoch: 1 Global Step: 22130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:33,301-Speed 9569.62 samples/sec Loss 10.5089 LearningRate 0.0872 Epoch: 1 Global Step: 22140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:34,365-Speed 9626.73 samples/sec Loss 10.5593 LearningRate 0.0872 Epoch: 1 Global Step: 22150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:35,464-Speed 9319.52 samples/sec Loss 10.5475 LearningRate 0.0872 Epoch: 1 Global Step: 22160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:36,583-Speed 9159.43 samples/sec Loss 10.4974 LearningRate 0.0872 Epoch: 1 Global Step: 22170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:37,695-Speed 9212.44 samples/sec Loss 10.4946 LearningRate 0.0872 Epoch: 1 Global Step: 22180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:38,739-Speed 9819.85 samples/sec Loss 10.5188 LearningRate 0.0871 Epoch: 1 Global Step: 22190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:40,057-Speed 7771.73 samples/sec Loss 10.4372 LearningRate 0.0871 Epoch: 1 Global Step: 22200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:41,209-Speed 8894.11 samples/sec Loss 10.4903 LearningRate 0.0871 Epoch: 1 Global Step: 22210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:42,314-Speed 9268.41 samples/sec Loss 10.5758 LearningRate 0.0871 Epoch: 1 Global Step: 22220 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:26:43,419-Speed 9269.54 samples/sec Loss 10.3990 LearningRate 0.0871 Epoch: 1 Global Step: 22230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:44,507-Speed 9418.84 samples/sec Loss 10.6372 LearningRate 0.0871 Epoch: 1 Global Step: 22240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:45,573-Speed 9610.08 samples/sec Loss 10.4960 LearningRate 0.0871 Epoch: 1 Global Step: 22250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:46,643-Speed 9575.02 samples/sec Loss 10.4013 LearningRate 0.0871 Epoch: 1 Global Step: 22260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:47,716-Speed 9552.98 samples/sec Loss 10.4872 LearningRate 0.0871 Epoch: 1 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:48,858-Speed 8972.53 samples/sec Loss 10.3963 LearningRate 0.0871 Epoch: 1 Global Step: 22280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:49,919-Speed 9649.68 samples/sec Loss 10.4794 LearningRate 0.0871 Epoch: 1 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:50,998-Speed 9502.10 samples/sec Loss 10.5723 LearningRate 0.0871 Epoch: 1 Global Step: 22300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:52,061-Speed 9632.34 samples/sec Loss 10.4710 LearningRate 0.0871 Epoch: 1 Global Step: 22310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:53,135-Speed 9548.93 samples/sec Loss 10.4484 LearningRate 0.0871 Epoch: 1 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:54,206-Speed 9561.71 samples/sec Loss 10.5098 LearningRate 0.0871 Epoch: 1 Global Step: 22330 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:26:55,269-Speed 9644.83 samples/sec Loss 10.4977 LearningRate 0.0871 Epoch: 1 Global Step: 22340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:56,350-Speed 9484.88 samples/sec Loss 10.4794 LearningRate 0.0871 Epoch: 1 Global Step: 22350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:57,424-Speed 9544.63 samples/sec Loss 10.3705 LearningRate 0.0871 Epoch: 1 Global Step: 22360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:58,514-Speed 9397.11 samples/sec Loss 10.4055 LearningRate 0.0870 Epoch: 1 Global Step: 22370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:26:59,550-Speed 9895.21 samples/sec Loss 10.4563 LearningRate 0.0870 Epoch: 1 Global Step: 22380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:00,669-Speed 9162.84 samples/sec Loss 10.5205 LearningRate 0.0870 Epoch: 1 Global Step: 22390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:01,729-Speed 9665.90 samples/sec Loss 10.4946 LearningRate 0.0870 Epoch: 1 Global Step: 22400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:02,797-Speed 9586.01 samples/sec Loss 10.3338 LearningRate 0.0870 Epoch: 1 Global Step: 22410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:03,886-Speed 9411.88 samples/sec Loss 10.4533 LearningRate 0.0870 Epoch: 1 Global Step: 22420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:05,023-Speed 9015.70 samples/sec Loss 10.4959 LearningRate 0.0870 Epoch: 1 Global Step: 22430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:06,100-Speed 9507.49 samples/sec Loss 10.4430 LearningRate 0.0870 Epoch: 1 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:27:07,182-Speed 9471.26 samples/sec Loss 10.5298 LearningRate 0.0870 Epoch: 1 Global Step: 22450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:27:08,299-Speed 9171.74 samples/sec Loss 10.5088 LearningRate 0.0870 Epoch: 1 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:27:09,434-Speed 9024.83 samples/sec Loss 10.5200 LearningRate 0.0870 Epoch: 1 Global Step: 22470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:27:10,508-Speed 9538.51 samples/sec Loss 10.4413 LearningRate 0.0870 Epoch: 1 Global Step: 22480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:27:11,567-Speed 9680.25 samples/sec Loss 10.5200 LearningRate 0.0870 Epoch: 1 Global Step: 22490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:27:12,633-Speed 9653.08 samples/sec Loss 10.4436 LearningRate 0.0870 Epoch: 1 Global Step: 22500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:27:13,718-Speed 9443.15 samples/sec Loss 10.5221 LearningRate 0.0870 Epoch: 1 Global Step: 22510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:27:14,801-Speed 9471.04 samples/sec Loss 10.4392 LearningRate 0.0870 Epoch: 1 Global Step: 22520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:27:15,840-Speed 9857.72 samples/sec Loss 10.5514 LearningRate 0.0870 Epoch: 1 Global Step: 22530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 12:27:16,903-Speed 9638.39 samples/sec Loss 10.5099 LearningRate 0.0870 Epoch: 1 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:17,991-Speed 9420.41 samples/sec Loss 10.4617 LearningRate 0.0869 Epoch: 1 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:19,090-Speed 9322.59 samples/sec Loss 10.4549 LearningRate 0.0869 Epoch: 1 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:20,212-Speed 9127.45 samples/sec Loss 10.3980 LearningRate 0.0869 Epoch: 1 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:21,287-Speed 9539.08 samples/sec Loss 10.5740 LearningRate 0.0869 Epoch: 1 Global Step: 22580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:22,362-Speed 9528.50 samples/sec Loss 10.4127 LearningRate 0.0869 Epoch: 1 Global Step: 22590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:23,436-Speed 9539.29 samples/sec Loss 10.5041 LearningRate 0.0869 Epoch: 1 Global Step: 22600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:24,530-Speed 9366.94 samples/sec Loss 10.5665 LearningRate 0.0869 Epoch: 1 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:25,585-Speed 9715.13 samples/sec Loss 10.4543 LearningRate 0.0869 Epoch: 1 Global Step: 22620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:26,648-Speed 9636.80 samples/sec Loss 10.4644 LearningRate 0.0869 Epoch: 1 Global Step: 22630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:27,769-Speed 9138.08 samples/sec Loss 10.5529 LearningRate 0.0869 Epoch: 1 Global Step: 22640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:28,850-Speed 9479.90 samples/sec Loss 10.4384 LearningRate 0.0869 Epoch: 1 Global Step: 22650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:29,934-Speed 9455.11 samples/sec Loss 10.5410 LearningRate 0.0869 Epoch: 1 Global Step: 22660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:31,011-Speed 9508.98 samples/sec Loss 10.4172 LearningRate 0.0869 Epoch: 1 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:32,093-Speed 9469.39 samples/sec Loss 10.4144 LearningRate 0.0869 Epoch: 1 Global Step: 22680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:33,211-Speed 9165.84 samples/sec Loss 10.4763 LearningRate 0.0869 Epoch: 1 Global Step: 22690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:34,326-Speed 9196.81 samples/sec Loss 10.3885 LearningRate 0.0869 Epoch: 1 Global Step: 22700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:35,401-Speed 9524.91 samples/sec Loss 10.3788 LearningRate 0.0869 Epoch: 1 Global Step: 22710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:27:36,496-Speed 9360.45 samples/sec Loss 10.4153 LearningRate 0.0869 Epoch: 1 Global Step: 22720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:37,584-Speed 9417.91 samples/sec Loss 10.4106 LearningRate 0.0868 Epoch: 1 Global Step: 22730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:38,658-Speed 9542.49 samples/sec Loss 10.3428 LearningRate 0.0868 Epoch: 1 Global Step: 22740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:39,700-Speed 9830.88 samples/sec Loss 10.5456 LearningRate 0.0868 Epoch: 1 Global Step: 22750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:40,798-Speed 9326.37 samples/sec Loss 10.3381 LearningRate 0.0868 Epoch: 1 Global Step: 22760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:41,881-Speed 9466.40 samples/sec Loss 10.3417 LearningRate 0.0868 Epoch: 1 Global Step: 22770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:42,965-Speed 9452.55 samples/sec Loss 10.5312 LearningRate 0.0868 Epoch: 1 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:44,051-Speed 9431.80 samples/sec Loss 10.3223 LearningRate 0.0868 Epoch: 1 Global Step: 22790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:45,151-Speed 9313.58 samples/sec Loss 10.4199 LearningRate 0.0868 Epoch: 1 Global Step: 22800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:46,222-Speed 9568.79 samples/sec Loss 10.4502 LearningRate 0.0868 Epoch: 1 Global Step: 22810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:47,335-Speed 9203.95 samples/sec Loss 10.5242 LearningRate 0.0868 Epoch: 1 Global Step: 22820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:48,422-Speed 9428.87 samples/sec Loss 10.4508 LearningRate 0.0868 Epoch: 1 Global Step: 22830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:49,539-Speed 9175.33 samples/sec Loss 10.5605 LearningRate 0.0868 Epoch: 1 Global Step: 22840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:27:50,583-Speed 9811.68 samples/sec Loss 10.4451 LearningRate 0.0868 Epoch: 1 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:51,682-Speed 9326.83 samples/sec Loss 10.4346 LearningRate 0.0868 Epoch: 1 Global Step: 22860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:52,743-Speed 9657.48 samples/sec Loss 10.2977 LearningRate 0.0868 Epoch: 1 Global Step: 22870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:53,823-Speed 9487.75 samples/sec Loss 10.4713 LearningRate 0.0868 Epoch: 1 Global Step: 22880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:54,935-Speed 9211.66 samples/sec Loss 10.4657 LearningRate 0.0868 Epoch: 1 Global Step: 22890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:56,052-Speed 9174.59 samples/sec Loss 10.3979 LearningRate 0.0868 Epoch: 1 Global Step: 22900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:57,129-Speed 9509.50 samples/sec Loss 10.5161 LearningRate 0.0867 Epoch: 1 Global Step: 22910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:58,211-Speed 9472.26 samples/sec Loss 10.5457 LearningRate 0.0867 Epoch: 1 Global Step: 22920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:27:59,286-Speed 9531.67 samples/sec Loss 10.3816 LearningRate 0.0867 Epoch: 1 Global Step: 22930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:00,384-Speed 9334.81 samples/sec Loss 10.4568 LearningRate 0.0867 Epoch: 1 Global Step: 22940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:01,469-Speed 9442.34 samples/sec Loss 10.3981 LearningRate 0.0867 Epoch: 1 Global Step: 22950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:28:02,549-Speed 9481.18 samples/sec Loss 10.5083 LearningRate 0.0867 Epoch: 1 Global Step: 22960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:03,633-Speed 9459.29 samples/sec Loss 10.3865 LearningRate 0.0867 Epoch: 1 Global Step: 22970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:04,687-Speed 9715.48 samples/sec Loss 10.3331 LearningRate 0.0867 Epoch: 1 Global Step: 22980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:05,722-Speed 9899.25 samples/sec Loss 10.3295 LearningRate 0.0867 Epoch: 1 Global Step: 22990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:06,807-Speed 9444.76 samples/sec Loss 10.4023 LearningRate 0.0867 Epoch: 1 Global Step: 23000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:07,913-Speed 9264.52 samples/sec Loss 10.4233 LearningRate 0.0867 Epoch: 1 Global Step: 23010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:08,991-Speed 9506.67 samples/sec Loss 10.5148 LearningRate 0.0867 Epoch: 1 Global Step: 23020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:10,107-Speed 9179.82 samples/sec Loss 10.4635 LearningRate 0.0867 Epoch: 1 Global Step: 23030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:11,183-Speed 9526.33 samples/sec Loss 10.3380 LearningRate 0.0867 Epoch: 1 Global Step: 23040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:12,240-Speed 9696.96 samples/sec Loss 10.6054 LearningRate 0.0867 Epoch: 1 Global Step: 23050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:13,282-Speed 9829.93 samples/sec Loss 10.4399 LearningRate 0.0867 Epoch: 1 Global Step: 23060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:14,340-Speed 9685.42 samples/sec Loss 10.4291 LearningRate 0.0867 Epoch: 1 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:15,418-Speed 9506.52 samples/sec Loss 10.4541 LearningRate 0.0867 Epoch: 1 Global Step: 23080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:16,531-Speed 9205.34 samples/sec Loss 10.3394 LearningRate 0.0866 Epoch: 1 Global Step: 23090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:17,599-Speed 9590.38 samples/sec Loss 10.3851 LearningRate 0.0866 Epoch: 1 Global Step: 23100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:18,673-Speed 9543.22 samples/sec Loss 10.4196 LearningRate 0.0866 Epoch: 1 Global Step: 23110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:19,758-Speed 9440.74 samples/sec Loss 10.4841 LearningRate 0.0866 Epoch: 1 Global Step: 23120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:20,850-Speed 9381.42 samples/sec Loss 10.4346 LearningRate 0.0866 Epoch: 1 Global Step: 23130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:21,940-Speed 9400.18 samples/sec Loss 10.3589 LearningRate 0.0866 Epoch: 1 Global Step: 23140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:23,026-Speed 9448.18 samples/sec Loss 10.4876 LearningRate 0.0866 Epoch: 1 Global Step: 23150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:24,136-Speed 9229.60 samples/sec Loss 10.4672 LearningRate 0.0866 Epoch: 1 Global Step: 23160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:25,189-Speed 9732.52 samples/sec Loss 10.3791 LearningRate 0.0866 Epoch: 1 Global Step: 23170 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:28:26,282-Speed 9369.41 samples/sec Loss 10.2918 LearningRate 0.0866 Epoch: 1 Global Step: 23180 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:28:27,389-Speed 9259.65 samples/sec Loss 10.3239 LearningRate 0.0866 Epoch: 1 Global Step: 23190 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:28:28,447-Speed 9680.82 samples/sec Loss 10.4871 LearningRate 0.0866 Epoch: 1 Global Step: 23200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:29,547-Speed 9320.42 samples/sec Loss 10.3638 LearningRate 0.0866 Epoch: 1 Global Step: 23210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:30,632-Speed 9445.21 samples/sec Loss 10.4758 LearningRate 0.0866 Epoch: 1 Global Step: 23220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:31,717-Speed 9439.90 samples/sec Loss 10.2838 LearningRate 0.0866 Epoch: 1 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:32,790-Speed 9549.67 samples/sec Loss 10.4091 LearningRate 0.0866 Epoch: 1 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:33,941-Speed 8902.43 samples/sec Loss 10.3885 LearningRate 0.0866 Epoch: 1 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:35,038-Speed 9344.99 samples/sec Loss 10.3453 LearningRate 0.0865 Epoch: 1 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:36,162-Speed 9114.90 samples/sec Loss 10.3331 LearningRate 0.0865 Epoch: 1 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:37,281-Speed 9154.13 samples/sec Loss 10.4371 LearningRate 0.0865 Epoch: 1 Global Step: 23280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:38,329-Speed 9778.19 samples/sec Loss 10.4002 LearningRate 0.0865 Epoch: 1 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:39,407-Speed 9504.06 samples/sec Loss 10.3147 LearningRate 0.0865 Epoch: 1 Global Step: 23300 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:28:40,514-Speed 9249.04 samples/sec Loss 10.4119 LearningRate 0.0865 Epoch: 1 Global Step: 23310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:41,604-Speed 9408.06 samples/sec Loss 10.4838 LearningRate 0.0865 Epoch: 1 Global Step: 23320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:42,680-Speed 9523.25 samples/sec Loss 10.3396 LearningRate 0.0865 Epoch: 1 Global Step: 23330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:43,818-Speed 9002.30 samples/sec Loss 10.4057 LearningRate 0.0865 Epoch: 1 Global Step: 23340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:44,887-Speed 9584.57 samples/sec Loss 10.4303 LearningRate 0.0865 Epoch: 1 Global Step: 23350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:45,988-Speed 9300.13 samples/sec Loss 10.3897 LearningRate 0.0865 Epoch: 1 Global Step: 23360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:47,113-Speed 9106.95 samples/sec Loss 10.4756 LearningRate 0.0865 Epoch: 1 Global Step: 23370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:48,211-Speed 9339.63 samples/sec Loss 10.4864 LearningRate 0.0865 Epoch: 1 Global Step: 23380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:49,305-Speed 9369.88 samples/sec Loss 10.4501 LearningRate 0.0865 Epoch: 1 Global Step: 23390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:50,386-Speed 9473.18 samples/sec Loss 10.3423 LearningRate 0.0865 Epoch: 1 Global Step: 23400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:51,482-Speed 9351.04 samples/sec Loss 10.4167 LearningRate 0.0865 Epoch: 1 Global Step: 23410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:52,573-Speed 9392.24 samples/sec Loss 10.3492 LearningRate 0.0865 Epoch: 1 Global Step: 23420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:28:53,643-Speed 9578.36 samples/sec Loss 10.2862 LearningRate 0.0865 Epoch: 1 Global Step: 23430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:54,801-Speed 8844.33 samples/sec Loss 10.2811 LearningRate 0.0864 Epoch: 1 Global Step: 23440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:55,863-Speed 9648.99 samples/sec Loss 10.4162 LearningRate 0.0864 Epoch: 1 Global Step: 23450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:56,907-Speed 9818.63 samples/sec Loss 10.4633 LearningRate 0.0864 Epoch: 1 Global Step: 23460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:57,984-Speed 9512.20 samples/sec Loss 10.3827 LearningRate 0.0864 Epoch: 1 Global Step: 23470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:28:59,046-Speed 9647.93 samples/sec Loss 10.3232 LearningRate 0.0864 Epoch: 1 Global Step: 23480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:00,082-Speed 9888.83 samples/sec Loss 10.4324 LearningRate 0.0864 Epoch: 1 Global Step: 23490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:01,138-Speed 9706.42 samples/sec Loss 10.3440 LearningRate 0.0864 Epoch: 1 Global Step: 23500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:02,257-Speed 9152.18 samples/sec Loss 10.4804 LearningRate 0.0864 Epoch: 1 Global Step: 23510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:03,357-Speed 9319.49 samples/sec Loss 10.3063 LearningRate 0.0864 Epoch: 1 Global Step: 23520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:04,436-Speed 9493.82 samples/sec Loss 10.4642 LearningRate 0.0864 Epoch: 1 Global Step: 23530 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:29:05,497-Speed 9660.11 samples/sec Loss 10.2735 LearningRate 0.0864 Epoch: 1 Global Step: 23540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:06,592-Speed 9357.86 samples/sec Loss 10.3615 LearningRate 0.0864 Epoch: 1 Global Step: 23550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:07,668-Speed 9526.74 samples/sec Loss 10.3765 LearningRate 0.0864 Epoch: 1 Global Step: 23560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:08,723-Speed 9706.95 samples/sec Loss 10.3680 LearningRate 0.0864 Epoch: 1 Global Step: 23570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:09,778-Speed 9715.47 samples/sec Loss 10.4589 LearningRate 0.0864 Epoch: 1 Global Step: 23580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:10,878-Speed 9313.45 samples/sec Loss 10.2711 LearningRate 0.0864 Epoch: 1 Global Step: 23590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:11,932-Speed 9721.54 samples/sec Loss 10.3269 LearningRate 0.0864 Epoch: 1 Global Step: 23600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:13,008-Speed 9521.31 samples/sec Loss 10.4981 LearningRate 0.0864 Epoch: 1 Global Step: 23610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:14,075-Speed 9604.75 samples/sec Loss 10.2604 LearningRate 0.0863 Epoch: 1 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:15,209-Speed 9037.64 samples/sec Loss 10.3717 LearningRate 0.0863 Epoch: 1 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:16,306-Speed 9338.96 samples/sec Loss 10.3038 LearningRate 0.0863 Epoch: 1 Global Step: 23640 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:29:17,384-Speed 9505.35 samples/sec Loss 10.2949 LearningRate 0.0863 Epoch: 1 Global Step: 23650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:18,458-Speed 9531.48 samples/sec Loss 10.3498 LearningRate 0.0863 Epoch: 1 Global Step: 23660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:19,552-Speed 9367.49 samples/sec Loss 10.3255 LearningRate 0.0863 Epoch: 1 Global Step: 23670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:20,650-Speed 9329.45 samples/sec Loss 10.2648 LearningRate 0.0863 Epoch: 1 Global Step: 23680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:21,744-Speed 9366.11 samples/sec Loss 10.2792 LearningRate 0.0863 Epoch: 1 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:22,848-Speed 9289.67 samples/sec Loss 10.2730 LearningRate 0.0863 Epoch: 1 Global Step: 23700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:23,929-Speed 9475.70 samples/sec Loss 10.4155 LearningRate 0.0863 Epoch: 1 Global Step: 23710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:25,011-Speed 9475.00 samples/sec Loss 10.3391 LearningRate 0.0863 Epoch: 1 Global Step: 23720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:26,090-Speed 9498.74 samples/sec Loss 10.4074 LearningRate 0.0863 Epoch: 1 Global Step: 23730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:27,188-Speed 9328.50 samples/sec Loss 10.3183 LearningRate 0.0863 Epoch: 1 Global Step: 23740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:28,324-Speed 9022.00 samples/sec Loss 10.2913 LearningRate 0.0863 Epoch: 1 Global Step: 23750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:29,427-Speed 9290.11 samples/sec Loss 10.4577 LearningRate 0.0863 Epoch: 1 Global Step: 23760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:30,496-Speed 9584.92 samples/sec Loss 10.3439 LearningRate 0.0863 Epoch: 1 Global Step: 23770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:31,575-Speed 9499.36 samples/sec Loss 10.3414 LearningRate 0.0863 Epoch: 1 Global Step: 23780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:32,702-Speed 9087.09 samples/sec Loss 10.2964 LearningRate 0.0863 Epoch: 1 Global Step: 23790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:33,787-Speed 9445.72 samples/sec Loss 10.3920 LearningRate 0.0862 Epoch: 1 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:34,850-Speed 9635.24 samples/sec Loss 10.2887 LearningRate 0.0862 Epoch: 1 Global Step: 23810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:35,929-Speed 9498.78 samples/sec Loss 10.3817 LearningRate 0.0862 Epoch: 1 Global Step: 23820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:37,020-Speed 9388.65 samples/sec Loss 10.4462 LearningRate 0.0862 Epoch: 1 Global Step: 23830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:38,056-Speed 9890.98 samples/sec Loss 10.3404 LearningRate 0.0862 Epoch: 1 Global Step: 23840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:29:39,127-Speed 9561.93 samples/sec Loss 10.3438 LearningRate 0.0862 Epoch: 1 Global Step: 23850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:29:40,239-Speed 9215.37 samples/sec Loss 10.2881 LearningRate 0.0862 Epoch: 1 Global Step: 23860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:29:41,309-Speed 9582.41 samples/sec Loss 10.3065 LearningRate 0.0862 Epoch: 1 Global Step: 23870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:29:42,385-Speed 9523.53 samples/sec Loss 10.3817 LearningRate 0.0862 Epoch: 1 Global Step: 23880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:29:43,460-Speed 9529.93 samples/sec Loss 10.4226 LearningRate 0.0862 Epoch: 1 Global Step: 23890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:29:44,525-Speed 9624.60 samples/sec Loss 10.3662 LearningRate 0.0862 Epoch: 1 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:29:45,583-Speed 9687.20 samples/sec Loss 10.2329 LearningRate 0.0862 Epoch: 1 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:29:46,647-Speed 9626.46 samples/sec Loss 10.3397 LearningRate 0.0862 Epoch: 1 Global Step: 23920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:29:47,710-Speed 9637.74 samples/sec Loss 10.3983 LearningRate 0.0862 Epoch: 1 Global Step: 23930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:29:48,821-Speed 9235.94 samples/sec Loss 10.3939 LearningRate 0.0862 Epoch: 1 Global Step: 23940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:49,956-Speed 9024.90 samples/sec Loss 10.1666 LearningRate 0.0862 Epoch: 1 Global Step: 23950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:51,042-Speed 9433.47 samples/sec Loss 10.2557 LearningRate 0.0862 Epoch: 1 Global Step: 23960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:52,150-Speed 9250.61 samples/sec Loss 10.5061 LearningRate 0.0862 Epoch: 1 Global Step: 23970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:53,254-Speed 9282.25 samples/sec Loss 10.2282 LearningRate 0.0861 Epoch: 1 Global Step: 23980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:54,272-Speed 10060.13 samples/sec Loss 10.3002 LearningRate 0.0861 Epoch: 1 Global Step: 23990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:29:55,363-Speed 9393.49 samples/sec Loss 10.2970 LearningRate 0.0861 Epoch: 1 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:30:17,123-[lfw][24000]XNorm: 13.734053 Training: 2022-04-11 12:30:17,124-[lfw][24000]Accuracy-Flip: 0.99317+-0.00369 Training: 2022-04-11 12:30:17,124-[lfw][24000]Accuracy-Highest: 0.99383 Training: 2022-04-11 12:30:42,364-[cfp_fp][24000]XNorm: 11.648384 Training: 2022-04-11 12:30:42,365-[cfp_fp][24000]Accuracy-Flip: 0.93186+-0.01275 Training: 2022-04-11 12:30:42,366-[cfp_fp][24000]Accuracy-Highest: 0.93186 Training: 2022-04-11 12:31:04,111-[agedb_30][24000]XNorm: 13.287366 Training: 2022-04-11 12:31:04,112-[agedb_30][24000]Accuracy-Flip: 0.94283+-0.01145 Training: 2022-04-11 12:31:04,112-[agedb_30][24000]Accuracy-Highest: 0.94283 Training: 2022-04-11 12:31:05,203-Speed 146.62 samples/sec Loss 10.2748 LearningRate 0.0861 Epoch: 1 Global Step: 24010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:06,264-Speed 9661.05 samples/sec Loss 10.3185 LearningRate 0.0861 Epoch: 1 Global Step: 24020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:07,345-Speed 9478.34 samples/sec Loss 10.3732 LearningRate 0.0861 Epoch: 1 Global Step: 24030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:08,402-Speed 9691.85 samples/sec Loss 10.3596 LearningRate 0.0861 Epoch: 1 Global Step: 24040 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:31:09,471-Speed 9579.90 samples/sec Loss 10.3140 LearningRate 0.0861 Epoch: 1 Global Step: 24050 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:31:10,534-Speed 9638.30 samples/sec Loss 10.2468 LearningRate 0.0861 Epoch: 1 Global Step: 24060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:11,625-Speed 9400.64 samples/sec Loss 10.2657 LearningRate 0.0861 Epoch: 1 Global Step: 24070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:12,696-Speed 9567.33 samples/sec Loss 10.3354 LearningRate 0.0861 Epoch: 1 Global Step: 24080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:13,771-Speed 9528.62 samples/sec Loss 10.3918 LearningRate 0.0861 Epoch: 1 Global Step: 24090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:14,883-Speed 9209.88 samples/sec Loss 10.3198 LearningRate 0.0861 Epoch: 1 Global Step: 24100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:15,970-Speed 9426.02 samples/sec Loss 10.2842 LearningRate 0.0861 Epoch: 1 Global Step: 24110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:17,059-Speed 9416.36 samples/sec Loss 10.2389 LearningRate 0.0861 Epoch: 1 Global Step: 24120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:18,128-Speed 9585.28 samples/sec Loss 10.3663 LearningRate 0.0861 Epoch: 1 Global Step: 24130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:19,232-Speed 9276.70 samples/sec Loss 10.3412 LearningRate 0.0861 Epoch: 1 Global Step: 24140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:20,305-Speed 9548.54 samples/sec Loss 10.4007 LearningRate 0.0861 Epoch: 1 Global Step: 24150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 12:31:21,411-Speed 9265.08 samples/sec Loss 10.1090 LearningRate 0.0860 Epoch: 1 Global Step: 24160 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 12:31:22,509-Speed 9331.14 samples/sec Loss 10.3444 LearningRate 0.0860 Epoch: 1 Global Step: 24170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:23,570-Speed 9659.91 samples/sec Loss 10.2378 LearningRate 0.0860 Epoch: 1 Global Step: 24180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:24,647-Speed 9509.16 samples/sec Loss 10.2758 LearningRate 0.0860 Epoch: 1 Global Step: 24190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:25,739-Speed 9382.32 samples/sec Loss 10.2772 LearningRate 0.0860 Epoch: 1 Global Step: 24200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:26,814-Speed 9534.64 samples/sec Loss 10.2905 LearningRate 0.0860 Epoch: 1 Global Step: 24210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:27,905-Speed 9391.64 samples/sec Loss 10.2088 LearningRate 0.0860 Epoch: 1 Global Step: 24220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:29,007-Speed 9295.45 samples/sec Loss 10.1116 LearningRate 0.0860 Epoch: 1 Global Step: 24230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:30,066-Speed 9679.45 samples/sec Loss 10.1976 LearningRate 0.0860 Epoch: 1 Global Step: 24240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:31,121-Speed 9713.94 samples/sec Loss 10.1934 LearningRate 0.0860 Epoch: 1 Global Step: 24250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:32,158-Speed 9877.63 samples/sec Loss 10.4111 LearningRate 0.0860 Epoch: 1 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:33,223-Speed 9618.16 samples/sec Loss 10.3129 LearningRate 0.0860 Epoch: 1 Global Step: 24270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:34,331-Speed 9249.60 samples/sec Loss 10.2311 LearningRate 0.0860 Epoch: 1 Global Step: 24280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:35,420-Speed 9417.52 samples/sec Loss 10.2134 LearningRate 0.0860 Epoch: 1 Global Step: 24290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:36,518-Speed 9326.59 samples/sec Loss 10.3142 LearningRate 0.0860 Epoch: 1 Global Step: 24300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:37,572-Speed 9721.99 samples/sec Loss 10.2754 LearningRate 0.0860 Epoch: 1 Global Step: 24310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:38,683-Speed 9222.12 samples/sec Loss 10.3866 LearningRate 0.0860 Epoch: 1 Global Step: 24320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:39,801-Speed 9162.64 samples/sec Loss 10.2194 LearningRate 0.0860 Epoch: 1 Global Step: 24330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:40,931-Speed 9069.83 samples/sec Loss 10.2737 LearningRate 0.0859 Epoch: 1 Global Step: 24340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:42,019-Speed 9414.71 samples/sec Loss 10.3541 LearningRate 0.0859 Epoch: 1 Global Step: 24350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:43,104-Speed 9446.51 samples/sec Loss 10.3393 LearningRate 0.0859 Epoch: 1 Global Step: 24360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:44,182-Speed 9504.11 samples/sec Loss 10.2647 LearningRate 0.0859 Epoch: 1 Global Step: 24370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:31:45,228-Speed 9796.67 samples/sec Loss 10.3709 LearningRate 0.0859 Epoch: 1 Global Step: 24380 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:31:46,320-Speed 9384.87 samples/sec Loss 10.3044 LearningRate 0.0859 Epoch: 1 Global Step: 24390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:47,441-Speed 9134.37 samples/sec Loss 10.2129 LearningRate 0.0859 Epoch: 1 Global Step: 24400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:48,472-Speed 9938.51 samples/sec Loss 10.3648 LearningRate 0.0859 Epoch: 1 Global Step: 24410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:49,563-Speed 9389.89 samples/sec Loss 10.3042 LearningRate 0.0859 Epoch: 1 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:50,655-Speed 9384.18 samples/sec Loss 10.2733 LearningRate 0.0859 Epoch: 1 Global Step: 24430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:51,739-Speed 9451.82 samples/sec Loss 10.1586 LearningRate 0.0859 Epoch: 1 Global Step: 24440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:52,816-Speed 9517.95 samples/sec Loss 10.2811 LearningRate 0.0859 Epoch: 1 Global Step: 24450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:53,922-Speed 9267.61 samples/sec Loss 10.2797 LearningRate 0.0859 Epoch: 1 Global Step: 24460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:55,043-Speed 9140.98 samples/sec Loss 10.1499 LearningRate 0.0859 Epoch: 1 Global Step: 24470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:56,090-Speed 9784.33 samples/sec Loss 10.4613 LearningRate 0.0859 Epoch: 1 Global Step: 24480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:57,195-Speed 9274.38 samples/sec Loss 10.1778 LearningRate 0.0859 Epoch: 1 Global Step: 24490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:31:58,273-Speed 9509.88 samples/sec Loss 10.2431 LearningRate 0.0859 Epoch: 1 Global Step: 24500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:31:59,348-Speed 9526.12 samples/sec Loss 10.2044 LearningRate 0.0859 Epoch: 1 Global Step: 24510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:00,434-Speed 9433.89 samples/sec Loss 10.4041 LearningRate 0.0858 Epoch: 1 Global Step: 24520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:01,506-Speed 9557.58 samples/sec Loss 10.2936 LearningRate 0.0858 Epoch: 1 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:02,538-Speed 9930.26 samples/sec Loss 10.1982 LearningRate 0.0858 Epoch: 1 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:03,635-Speed 9338.34 samples/sec Loss 10.3205 LearningRate 0.0858 Epoch: 1 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:04,705-Speed 9589.59 samples/sec Loss 10.4277 LearningRate 0.0858 Epoch: 1 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:05,791-Speed 9430.03 samples/sec Loss 10.3227 LearningRate 0.0858 Epoch: 1 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:06,915-Speed 9122.41 samples/sec Loss 10.2522 LearningRate 0.0858 Epoch: 1 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:08,043-Speed 9076.48 samples/sec Loss 10.2981 LearningRate 0.0858 Epoch: 1 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:09,129-Speed 9435.59 samples/sec Loss 10.2600 LearningRate 0.0858 Epoch: 1 Global Step: 24600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:32:10,229-Speed 9313.56 samples/sec Loss 10.2395 LearningRate 0.0858 Epoch: 1 Global Step: 24610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:32:11,303-Speed 9542.45 samples/sec Loss 10.3210 LearningRate 0.0858 Epoch: 1 Global Step: 24620 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:32:12,398-Speed 9358.50 samples/sec Loss 10.1966 LearningRate 0.0858 Epoch: 1 Global Step: 24630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:13,467-Speed 9583.00 samples/sec Loss 10.1968 LearningRate 0.0858 Epoch: 1 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:14,536-Speed 9587.96 samples/sec Loss 10.1916 LearningRate 0.0858 Epoch: 1 Global Step: 24650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:15,611-Speed 9531.21 samples/sec Loss 10.3482 LearningRate 0.0858 Epoch: 1 Global Step: 24660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:16,721-Speed 9229.96 samples/sec Loss 10.2210 LearningRate 0.0858 Epoch: 1 Global Step: 24670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:17,815-Speed 9367.86 samples/sec Loss 10.2028 LearningRate 0.0858 Epoch: 1 Global Step: 24680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:18,911-Speed 9344.33 samples/sec Loss 10.3417 LearningRate 0.0858 Epoch: 1 Global Step: 24690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:19,996-Speed 9441.02 samples/sec Loss 10.3139 LearningRate 0.0857 Epoch: 1 Global Step: 24700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:21,073-Speed 9515.51 samples/sec Loss 10.2916 LearningRate 0.0857 Epoch: 1 Global Step: 24710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:22,129-Speed 9700.44 samples/sec Loss 10.1658 LearningRate 0.0857 Epoch: 1 Global Step: 24720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:23,173-Speed 9818.08 samples/sec Loss 10.2489 LearningRate 0.0857 Epoch: 1 Global Step: 24730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:32:24,237-Speed 9631.22 samples/sec Loss 10.2731 LearningRate 0.0857 Epoch: 1 Global Step: 24740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:25,304-Speed 9598.64 samples/sec Loss 10.4324 LearningRate 0.0857 Epoch: 1 Global Step: 24750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:26,360-Speed 9703.40 samples/sec Loss 10.2017 LearningRate 0.0857 Epoch: 1 Global Step: 24760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:27,408-Speed 9773.97 samples/sec Loss 10.1716 LearningRate 0.0857 Epoch: 1 Global Step: 24770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:28,450-Speed 9843.69 samples/sec Loss 10.1465 LearningRate 0.0857 Epoch: 1 Global Step: 24780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:29,549-Speed 9324.58 samples/sec Loss 10.1483 LearningRate 0.0857 Epoch: 1 Global Step: 24790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:30,667-Speed 9163.62 samples/sec Loss 10.3208 LearningRate 0.0857 Epoch: 1 Global Step: 24800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:31,724-Speed 9697.37 samples/sec Loss 10.1908 LearningRate 0.0857 Epoch: 1 Global Step: 24810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:32,785-Speed 9657.80 samples/sec Loss 10.1523 LearningRate 0.0857 Epoch: 1 Global Step: 24820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:33,861-Speed 9515.07 samples/sec Loss 10.2906 LearningRate 0.0857 Epoch: 1 Global Step: 24830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:34,939-Speed 9510.60 samples/sec Loss 10.2392 LearningRate 0.0857 Epoch: 1 Global Step: 24840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:32:36,003-Speed 9631.75 samples/sec Loss 10.2440 LearningRate 0.0857 Epoch: 1 Global Step: 24850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:32:37,115-Speed 9215.29 samples/sec Loss 10.1303 LearningRate 0.0857 Epoch: 1 Global Step: 24860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:32:38,219-Speed 9273.94 samples/sec Loss 10.2271 LearningRate 0.0857 Epoch: 1 Global Step: 24870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:39,327-Speed 9263.46 samples/sec Loss 10.2352 LearningRate 0.0856 Epoch: 1 Global Step: 24880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:40,420-Speed 9375.66 samples/sec Loss 10.2538 LearningRate 0.0856 Epoch: 1 Global Step: 24890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:41,546-Speed 9098.27 samples/sec Loss 10.2217 LearningRate 0.0856 Epoch: 1 Global Step: 24900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:42,653-Speed 9257.10 samples/sec Loss 10.4120 LearningRate 0.0856 Epoch: 1 Global Step: 24910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:43,725-Speed 9557.87 samples/sec Loss 10.2801 LearningRate 0.0856 Epoch: 1 Global Step: 24920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:44,796-Speed 9565.59 samples/sec Loss 10.1868 LearningRate 0.0856 Epoch: 1 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:32:45,872-Speed 9534.80 samples/sec Loss 10.2288 LearningRate 0.0856 Epoch: 1 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:32:46,956-Speed 9457.44 samples/sec Loss 10.1478 LearningRate 0.0856 Epoch: 1 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:32:48,028-Speed 9557.79 samples/sec Loss 10.2918 LearningRate 0.0856 Epoch: 1 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:32:49,070-Speed 9828.10 samples/sec Loss 10.2419 LearningRate 0.0856 Epoch: 1 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:32:50,190-Speed 9156.30 samples/sec Loss 10.2173 LearningRate 0.0856 Epoch: 1 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:32:51,262-Speed 9555.55 samples/sec Loss 10.3063 LearningRate 0.0856 Epoch: 1 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:32:52,368-Speed 9263.41 samples/sec Loss 10.1940 LearningRate 0.0856 Epoch: 1 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:32:53,454-Speed 9437.82 samples/sec Loss 10.2017 LearningRate 0.0856 Epoch: 1 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:32:54,530-Speed 9520.70 samples/sec Loss 10.2602 LearningRate 0.0856 Epoch: 1 Global Step: 25020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:32:55,631-Speed 9302.44 samples/sec Loss 10.1338 LearningRate 0.0856 Epoch: 1 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:56,728-Speed 9344.78 samples/sec Loss 10.1588 LearningRate 0.0856 Epoch: 1 Global Step: 25040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:57,790-Speed 9642.34 samples/sec Loss 10.2005 LearningRate 0.0856 Epoch: 1 Global Step: 25050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:58,848-Speed 9689.49 samples/sec Loss 10.1808 LearningRate 0.0855 Epoch: 1 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:32:59,902-Speed 9720.17 samples/sec Loss 10.1074 LearningRate 0.0855 Epoch: 1 Global Step: 25070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:01,025-Speed 9123.91 samples/sec Loss 10.2182 LearningRate 0.0855 Epoch: 1 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:02,107-Speed 9467.28 samples/sec Loss 10.0747 LearningRate 0.0855 Epoch: 1 Global Step: 25090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:03,215-Speed 9245.30 samples/sec Loss 10.2481 LearningRate 0.0855 Epoch: 1 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:04,318-Speed 9293.52 samples/sec Loss 10.1581 LearningRate 0.0855 Epoch: 1 Global Step: 25110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:05,387-Speed 9593.00 samples/sec Loss 10.1250 LearningRate 0.0855 Epoch: 1 Global Step: 25120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:06,461-Speed 9536.43 samples/sec Loss 10.1712 LearningRate 0.0855 Epoch: 1 Global Step: 25130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:33:07,509-Speed 9775.48 samples/sec Loss 10.1233 LearningRate 0.0855 Epoch: 1 Global Step: 25140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:08,604-Speed 9357.11 samples/sec Loss 10.0820 LearningRate 0.0855 Epoch: 1 Global Step: 25150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:09,687-Speed 9462.15 samples/sec Loss 10.2164 LearningRate 0.0855 Epoch: 1 Global Step: 25160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:10,762-Speed 9538.73 samples/sec Loss 10.2448 LearningRate 0.0855 Epoch: 1 Global Step: 25170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:11,842-Speed 9483.02 samples/sec Loss 10.1754 LearningRate 0.0855 Epoch: 1 Global Step: 25180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:12,938-Speed 9347.28 samples/sec Loss 10.1369 LearningRate 0.0855 Epoch: 1 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:14,033-Speed 9355.42 samples/sec Loss 10.1008 LearningRate 0.0855 Epoch: 1 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:15,104-Speed 9570.39 samples/sec Loss 10.1534 LearningRate 0.0855 Epoch: 1 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:16,199-Speed 9355.45 samples/sec Loss 10.1750 LearningRate 0.0855 Epoch: 1 Global Step: 25220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:17,307-Speed 9248.07 samples/sec Loss 10.2709 LearningRate 0.0855 Epoch: 1 Global Step: 25230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:18,384-Speed 9519.26 samples/sec Loss 10.1398 LearningRate 0.0854 Epoch: 1 Global Step: 25240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:33:19,468-Speed 9450.60 samples/sec Loss 10.1249 LearningRate 0.0854 Epoch: 1 Global Step: 25250 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:33:20,582-Speed 9191.99 samples/sec Loss 10.1582 LearningRate 0.0854 Epoch: 1 Global Step: 25260 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:33:21,701-Speed 9157.09 samples/sec Loss 10.1569 LearningRate 0.0854 Epoch: 1 Global Step: 25270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:22,809-Speed 9248.74 samples/sec Loss 10.0831 LearningRate 0.0854 Epoch: 1 Global Step: 25280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:23,911-Speed 9297.04 samples/sec Loss 10.2078 LearningRate 0.0854 Epoch: 1 Global Step: 25290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:24,986-Speed 9529.16 samples/sec Loss 10.2295 LearningRate 0.0854 Epoch: 1 Global Step: 25300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:26,102-Speed 9188.86 samples/sec Loss 9.9842 LearningRate 0.0854 Epoch: 1 Global Step: 25310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:27,220-Speed 9161.03 samples/sec Loss 10.2942 LearningRate 0.0854 Epoch: 1 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:28,377-Speed 8861.37 samples/sec Loss 10.2184 LearningRate 0.0854 Epoch: 1 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:29,459-Speed 9468.25 samples/sec Loss 10.1595 LearningRate 0.0854 Epoch: 1 Global Step: 25340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:30,510-Speed 9742.63 samples/sec Loss 10.2893 LearningRate 0.0854 Epoch: 1 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:31,553-Speed 9829.50 samples/sec Loss 10.1897 LearningRate 0.0854 Epoch: 1 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:32,597-Speed 9806.76 samples/sec Loss 10.1476 LearningRate 0.0854 Epoch: 1 Global Step: 25370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:33:33,651-Speed 9725.13 samples/sec Loss 10.2328 LearningRate 0.0854 Epoch: 1 Global Step: 25380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:34,770-Speed 9160.49 samples/sec Loss 10.1433 LearningRate 0.0854 Epoch: 1 Global Step: 25390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:35,821-Speed 9746.02 samples/sec Loss 10.2132 LearningRate 0.0854 Epoch: 1 Global Step: 25400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:36,888-Speed 9601.79 samples/sec Loss 10.1114 LearningRate 0.0854 Epoch: 1 Global Step: 25410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:37,963-Speed 9531.59 samples/sec Loss 10.2312 LearningRate 0.0854 Epoch: 1 Global Step: 25420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:39,099-Speed 9023.40 samples/sec Loss 10.2477 LearningRate 0.0853 Epoch: 1 Global Step: 25430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:40,183-Speed 9447.93 samples/sec Loss 10.1793 LearningRate 0.0853 Epoch: 1 Global Step: 25440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:41,251-Speed 9596.94 samples/sec Loss 10.2038 LearningRate 0.0853 Epoch: 1 Global Step: 25450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:42,369-Speed 9156.51 samples/sec Loss 10.1932 LearningRate 0.0853 Epoch: 1 Global Step: 25460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:43,474-Speed 9274.87 samples/sec Loss 10.2018 LearningRate 0.0853 Epoch: 1 Global Step: 25470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:44,599-Speed 9108.56 samples/sec Loss 10.2026 LearningRate 0.0853 Epoch: 1 Global Step: 25480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:33:45,699-Speed 9318.93 samples/sec Loss 10.2223 LearningRate 0.0853 Epoch: 1 Global Step: 25490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:46,787-Speed 9413.32 samples/sec Loss 10.1370 LearningRate 0.0853 Epoch: 1 Global Step: 25500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:47,854-Speed 9602.96 samples/sec Loss 10.1266 LearningRate 0.0853 Epoch: 1 Global Step: 25510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:48,928-Speed 9546.01 samples/sec Loss 10.0250 LearningRate 0.0853 Epoch: 1 Global Step: 25520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:49,988-Speed 9663.55 samples/sec Loss 10.0897 LearningRate 0.0853 Epoch: 1 Global Step: 25530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:51,116-Speed 9085.78 samples/sec Loss 10.2813 LearningRate 0.0853 Epoch: 1 Global Step: 25540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:52,195-Speed 9493.41 samples/sec Loss 10.2366 LearningRate 0.0853 Epoch: 1 Global Step: 25550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:53,251-Speed 9704.99 samples/sec Loss 10.2582 LearningRate 0.0853 Epoch: 1 Global Step: 25560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:54,336-Speed 9443.43 samples/sec Loss 10.0590 LearningRate 0.0853 Epoch: 1 Global Step: 25570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:55,432-Speed 9345.53 samples/sec Loss 10.0920 LearningRate 0.0853 Epoch: 1 Global Step: 25580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:56,557-Speed 9101.36 samples/sec Loss 10.1094 LearningRate 0.0853 Epoch: 1 Global Step: 25590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:57,606-Speed 9768.63 samples/sec Loss 10.2430 LearningRate 0.0853 Epoch: 1 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:58,702-Speed 9358.48 samples/sec Loss 10.0635 LearningRate 0.0852 Epoch: 1 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:33:59,795-Speed 9370.19 samples/sec Loss 10.0968 LearningRate 0.0852 Epoch: 1 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:00,919-Speed 9115.61 samples/sec Loss 10.1183 LearningRate 0.0852 Epoch: 1 Global Step: 25630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:02,005-Speed 9439.42 samples/sec Loss 10.3032 LearningRate 0.0852 Epoch: 1 Global Step: 25640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:03,073-Speed 9599.50 samples/sec Loss 10.1273 LearningRate 0.0852 Epoch: 1 Global Step: 25650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:04,149-Speed 9522.04 samples/sec Loss 10.1185 LearningRate 0.0852 Epoch: 1 Global Step: 25660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:05,200-Speed 9752.59 samples/sec Loss 10.0458 LearningRate 0.0852 Epoch: 1 Global Step: 25670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:06,287-Speed 9423.71 samples/sec Loss 10.2324 LearningRate 0.0852 Epoch: 1 Global Step: 25680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:07,397-Speed 9228.92 samples/sec Loss 10.0508 LearningRate 0.0852 Epoch: 1 Global Step: 25690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:34:08,483-Speed 9434.52 samples/sec Loss 10.2287 LearningRate 0.0852 Epoch: 1 Global Step: 25700 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:34:09,639-Speed 8868.78 samples/sec Loss 10.0152 LearningRate 0.0852 Epoch: 1 Global Step: 25710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:10,735-Speed 9342.62 samples/sec Loss 10.2103 LearningRate 0.0852 Epoch: 1 Global Step: 25720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:11,853-Speed 9170.18 samples/sec Loss 10.2013 LearningRate 0.0852 Epoch: 1 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:12,936-Speed 9460.33 samples/sec Loss 10.1466 LearningRate 0.0852 Epoch: 1 Global Step: 25740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:14,047-Speed 9217.61 samples/sec Loss 10.0597 LearningRate 0.0852 Epoch: 1 Global Step: 25750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:15,146-Speed 9328.60 samples/sec Loss 10.1323 LearningRate 0.0852 Epoch: 1 Global Step: 25760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:16,202-Speed 9699.69 samples/sec Loss 9.9995 LearningRate 0.0852 Epoch: 1 Global Step: 25770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:17,280-Speed 9507.65 samples/sec Loss 10.1769 LearningRate 0.0852 Epoch: 1 Global Step: 25780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:18,368-Speed 9417.04 samples/sec Loss 10.1072 LearningRate 0.0851 Epoch: 1 Global Step: 25790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:19,448-Speed 9485.13 samples/sec Loss 10.1634 LearningRate 0.0851 Epoch: 1 Global Step: 25800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:20,506-Speed 9683.94 samples/sec Loss 10.1837 LearningRate 0.0851 Epoch: 1 Global Step: 25810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:34:21,612-Speed 9266.48 samples/sec Loss 10.0252 LearningRate 0.0851 Epoch: 1 Global Step: 25820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:22,672-Speed 9663.20 samples/sec Loss 10.0243 LearningRate 0.0851 Epoch: 1 Global Step: 25830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:23,713-Speed 9844.59 samples/sec Loss 10.1306 LearningRate 0.0851 Epoch: 1 Global Step: 25840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:24,785-Speed 9561.78 samples/sec Loss 10.0585 LearningRate 0.0851 Epoch: 1 Global Step: 25850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:25,843-Speed 9686.50 samples/sec Loss 10.1257 LearningRate 0.0851 Epoch: 1 Global Step: 25860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:26,908-Speed 9614.58 samples/sec Loss 10.0814 LearningRate 0.0851 Epoch: 1 Global Step: 25870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:28,042-Speed 9039.55 samples/sec Loss 9.9582 LearningRate 0.0851 Epoch: 1 Global Step: 25880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:29,119-Speed 9511.67 samples/sec Loss 10.1009 LearningRate 0.0851 Epoch: 1 Global Step: 25890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:30,211-Speed 9388.49 samples/sec Loss 10.1269 LearningRate 0.0851 Epoch: 1 Global Step: 25900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:31,313-Speed 9296.88 samples/sec Loss 10.0865 LearningRate 0.0851 Epoch: 1 Global Step: 25910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:32,388-Speed 9523.88 samples/sec Loss 10.1059 LearningRate 0.0851 Epoch: 1 Global Step: 25920 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:34:33,464-Speed 9529.88 samples/sec Loss 10.1539 LearningRate 0.0851 Epoch: 1 Global Step: 25930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:34,556-Speed 9380.69 samples/sec Loss 10.0344 LearningRate 0.0851 Epoch: 1 Global Step: 25940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:35,652-Speed 9349.85 samples/sec Loss 10.0931 LearningRate 0.0851 Epoch: 1 Global Step: 25950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:36,754-Speed 9296.91 samples/sec Loss 9.9897 LearningRate 0.0851 Epoch: 1 Global Step: 25960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:37,819-Speed 9621.92 samples/sec Loss 9.9910 LearningRate 0.0850 Epoch: 1 Global Step: 25970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:38,925-Speed 9260.29 samples/sec Loss 10.1812 LearningRate 0.0850 Epoch: 1 Global Step: 25980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:40,029-Speed 9283.06 samples/sec Loss 9.9923 LearningRate 0.0850 Epoch: 1 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:34:41,120-Speed 9396.35 samples/sec Loss 10.1747 LearningRate 0.0850 Epoch: 1 Global Step: 26000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:35:03,182-[lfw][26000]XNorm: 14.049655 Training: 2022-04-11 12:35:03,182-[lfw][26000]Accuracy-Flip: 0.99317+-0.00302 Training: 2022-04-11 12:35:03,183-[lfw][26000]Accuracy-Highest: 0.99383 Training: 2022-04-11 12:35:28,655-[cfp_fp][26000]XNorm: 11.619706 Training: 2022-04-11 12:35:28,655-[cfp_fp][26000]Accuracy-Flip: 0.93614+-0.01509 Training: 2022-04-11 12:35:28,656-[cfp_fp][26000]Accuracy-Highest: 0.93614 Training: 2022-04-11 12:35:50,672-[agedb_30][26000]XNorm: 13.422707 Training: 2022-04-11 12:35:50,673-[agedb_30][26000]Accuracy-Flip: 0.94400+-0.00892 Training: 2022-04-11 12:35:50,673-[agedb_30][26000]Accuracy-Highest: 0.94400 Training: 2022-04-11 12:35:51,795-Speed 144.89 samples/sec Loss 10.0688 LearningRate 0.0850 Epoch: 1 Global Step: 26010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:35:52,848-Speed 9728.87 samples/sec Loss 9.8276 LearningRate 0.0850 Epoch: 1 Global Step: 26020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:35:53,938-Speed 9404.96 samples/sec Loss 10.0729 LearningRate 0.0850 Epoch: 1 Global Step: 26030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:35:55,040-Speed 9292.66 samples/sec Loss 10.0992 LearningRate 0.0850 Epoch: 1 Global Step: 26040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:35:56,104-Speed 9637.04 samples/sec Loss 10.1112 LearningRate 0.0850 Epoch: 1 Global Step: 26050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:35:57,188-Speed 9447.12 samples/sec Loss 10.1035 LearningRate 0.0850 Epoch: 1 Global Step: 26060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:35:58,288-Speed 9318.70 samples/sec Loss 10.1875 LearningRate 0.0850 Epoch: 1 Global Step: 26070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:35:59,366-Speed 9505.46 samples/sec Loss 10.0487 LearningRate 0.0850 Epoch: 1 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:00,413-Speed 9788.30 samples/sec Loss 10.1959 LearningRate 0.0850 Epoch: 1 Global Step: 26090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:01,492-Speed 9488.36 samples/sec Loss 9.9845 LearningRate 0.0850 Epoch: 1 Global Step: 26100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:02,561-Speed 9587.11 samples/sec Loss 10.0784 LearningRate 0.0850 Epoch: 1 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:03,624-Speed 9640.27 samples/sec Loss 10.0609 LearningRate 0.0850 Epoch: 1 Global Step: 26120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:04,743-Speed 9150.61 samples/sec Loss 10.0633 LearningRate 0.0850 Epoch: 1 Global Step: 26130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:05,786-Speed 9827.08 samples/sec Loss 10.1050 LearningRate 0.0850 Epoch: 1 Global Step: 26140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:06,841-Speed 9712.82 samples/sec Loss 10.1604 LearningRate 0.0849 Epoch: 1 Global Step: 26150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:07,928-Speed 9433.79 samples/sec Loss 10.2028 LearningRate 0.0849 Epoch: 1 Global Step: 26160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:09,014-Speed 9430.27 samples/sec Loss 10.0521 LearningRate 0.0849 Epoch: 1 Global Step: 26170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:10,145-Speed 9063.55 samples/sec Loss 10.1578 LearningRate 0.0849 Epoch: 1 Global Step: 26180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:11,229-Speed 9444.18 samples/sec Loss 10.0532 LearningRate 0.0849 Epoch: 1 Global Step: 26190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:12,288-Speed 9677.03 samples/sec Loss 10.0856 LearningRate 0.0849 Epoch: 1 Global Step: 26200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:13,371-Speed 9462.70 samples/sec Loss 10.0665 LearningRate 0.0849 Epoch: 1 Global Step: 26210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:14,487-Speed 9183.38 samples/sec Loss 10.0165 LearningRate 0.0849 Epoch: 1 Global Step: 26220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:15,580-Speed 9377.20 samples/sec Loss 10.0652 LearningRate 0.0849 Epoch: 1 Global Step: 26230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:16,675-Speed 9354.59 samples/sec Loss 10.0263 LearningRate 0.0849 Epoch: 1 Global Step: 26240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:17,728-Speed 9730.19 samples/sec Loss 9.9913 LearningRate 0.0849 Epoch: 1 Global Step: 26250 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:18,856-Speed 9080.91 samples/sec Loss 10.0916 LearningRate 0.0849 Epoch: 1 Global Step: 26260 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:19,913-Speed 9699.49 samples/sec Loss 10.0293 LearningRate 0.0849 Epoch: 1 Global Step: 26270 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:21,015-Speed 9291.29 samples/sec Loss 10.0257 LearningRate 0.0849 Epoch: 1 Global Step: 26280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:22,111-Speed 9348.43 samples/sec Loss 10.1668 LearningRate 0.0849 Epoch: 1 Global Step: 26290 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:23,208-Speed 9340.42 samples/sec Loss 10.0246 LearningRate 0.0849 Epoch: 1 Global Step: 26300 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:24,297-Speed 9406.35 samples/sec Loss 9.9714 LearningRate 0.0849 Epoch: 1 Global Step: 26310 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:25,405-Speed 9249.97 samples/sec Loss 9.9603 LearningRate 0.0849 Epoch: 1 Global Step: 26320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:26,504-Speed 9333.14 samples/sec Loss 10.2102 LearningRate 0.0848 Epoch: 1 Global Step: 26330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:27,592-Speed 9421.16 samples/sec Loss 9.9052 LearningRate 0.0848 Epoch: 1 Global Step: 26340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:28,714-Speed 9124.46 samples/sec Loss 9.8990 LearningRate 0.0848 Epoch: 1 Global Step: 26350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:36:29,819-Speed 9278.65 samples/sec Loss 9.9747 LearningRate 0.0848 Epoch: 1 Global Step: 26360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:36:30,936-Speed 9166.60 samples/sec Loss 10.0562 LearningRate 0.0848 Epoch: 1 Global Step: 26370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:36:32,006-Speed 9574.37 samples/sec Loss 9.9547 LearningRate 0.0848 Epoch: 1 Global Step: 26380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:36:33,104-Speed 9335.94 samples/sec Loss 10.0615 LearningRate 0.0848 Epoch: 1 Global Step: 26390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:36:34,200-Speed 9349.07 samples/sec Loss 9.9837 LearningRate 0.0848 Epoch: 1 Global Step: 26400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:36:35,286-Speed 9430.30 samples/sec Loss 10.0306 LearningRate 0.0848 Epoch: 1 Global Step: 26410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:36:36,386-Speed 9311.17 samples/sec Loss 10.0765 LearningRate 0.0848 Epoch: 1 Global Step: 26420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:36:37,486-Speed 9323.02 samples/sec Loss 9.9343 LearningRate 0.0848 Epoch: 1 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:36:38,597-Speed 9215.09 samples/sec Loss 10.0799 LearningRate 0.0848 Epoch: 1 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:36:39,686-Speed 9412.71 samples/sec Loss 9.9477 LearningRate 0.0848 Epoch: 1 Global Step: 26450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:40,763-Speed 9513.22 samples/sec Loss 10.1436 LearningRate 0.0848 Epoch: 1 Global Step: 26460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:41,821-Speed 9687.07 samples/sec Loss 10.1571 LearningRate 0.0848 Epoch: 1 Global Step: 26470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:42,896-Speed 9530.91 samples/sec Loss 9.9304 LearningRate 0.0848 Epoch: 1 Global Step: 26480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:43,968-Speed 9556.19 samples/sec Loss 10.0393 LearningRate 0.0848 Epoch: 1 Global Step: 26490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:45,053-Speed 9444.46 samples/sec Loss 9.9978 LearningRate 0.0848 Epoch: 1 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:46,125-Speed 9562.83 samples/sec Loss 10.0830 LearningRate 0.0847 Epoch: 1 Global Step: 26510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:47,209-Speed 9446.71 samples/sec Loss 10.0801 LearningRate 0.0847 Epoch: 1 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:48,286-Speed 9520.18 samples/sec Loss 10.0059 LearningRate 0.0847 Epoch: 1 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:49,363-Speed 9516.18 samples/sec Loss 9.9751 LearningRate 0.0847 Epoch: 1 Global Step: 26540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:50,425-Speed 9648.06 samples/sec Loss 9.9776 LearningRate 0.0847 Epoch: 1 Global Step: 26550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:51,529-Speed 9277.27 samples/sec Loss 9.9661 LearningRate 0.0847 Epoch: 1 Global Step: 26560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:52,608-Speed 9497.45 samples/sec Loss 10.0462 LearningRate 0.0847 Epoch: 1 Global Step: 26570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:53,703-Speed 9353.62 samples/sec Loss 10.0591 LearningRate 0.0847 Epoch: 1 Global Step: 26580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:36:54,766-Speed 9647.66 samples/sec Loss 10.0799 LearningRate 0.0847 Epoch: 1 Global Step: 26590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:55,833-Speed 9607.98 samples/sec Loss 10.0339 LearningRate 0.0847 Epoch: 1 Global Step: 26600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:56,891-Speed 9691.92 samples/sec Loss 10.0778 LearningRate 0.0847 Epoch: 1 Global Step: 26610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:57,935-Speed 9810.53 samples/sec Loss 10.1204 LearningRate 0.0847 Epoch: 1 Global Step: 26620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:36:59,000-Speed 9618.09 samples/sec Loss 9.9669 LearningRate 0.0847 Epoch: 1 Global Step: 26630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:00,068-Speed 9594.32 samples/sec Loss 9.9708 LearningRate 0.0847 Epoch: 1 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:01,162-Speed 9371.28 samples/sec Loss 9.9764 LearningRate 0.0847 Epoch: 1 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:02,221-Speed 9673.86 samples/sec Loss 10.0013 LearningRate 0.0847 Epoch: 1 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:03,269-Speed 9775.02 samples/sec Loss 10.1088 LearningRate 0.0847 Epoch: 1 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:04,345-Speed 9521.98 samples/sec Loss 10.0185 LearningRate 0.0847 Epoch: 1 Global Step: 26680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:05,396-Speed 9751.21 samples/sec Loss 9.9978 LearningRate 0.0846 Epoch: 1 Global Step: 26690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:37:06,475-Speed 9494.17 samples/sec Loss 10.0263 LearningRate 0.0846 Epoch: 1 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:07,540-Speed 9628.41 samples/sec Loss 9.9378 LearningRate 0.0846 Epoch: 1 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:08,621-Speed 9476.57 samples/sec Loss 9.9922 LearningRate 0.0846 Epoch: 1 Global Step: 26720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:09,675-Speed 9718.71 samples/sec Loss 9.9514 LearningRate 0.0846 Epoch: 1 Global Step: 26730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:10,711-Speed 9894.69 samples/sec Loss 10.0078 LearningRate 0.0846 Epoch: 1 Global Step: 26740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:11,801-Speed 9399.73 samples/sec Loss 10.0679 LearningRate 0.0846 Epoch: 1 Global Step: 26750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:12,927-Speed 9096.01 samples/sec Loss 10.0237 LearningRate 0.0846 Epoch: 1 Global Step: 26760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:13,988-Speed 9659.78 samples/sec Loss 10.0274 LearningRate 0.0846 Epoch: 1 Global Step: 26770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:15,043-Speed 9712.66 samples/sec Loss 10.0140 LearningRate 0.0846 Epoch: 1 Global Step: 26780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:16,124-Speed 9480.36 samples/sec Loss 10.1123 LearningRate 0.0846 Epoch: 1 Global Step: 26790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:17,196-Speed 9550.92 samples/sec Loss 10.1631 LearningRate 0.0846 Epoch: 1 Global Step: 26800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:18,286-Speed 9400.25 samples/sec Loss 10.1681 LearningRate 0.0846 Epoch: 1 Global Step: 26810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:19,377-Speed 9394.12 samples/sec Loss 10.0142 LearningRate 0.0846 Epoch: 1 Global Step: 26820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:20,494-Speed 9170.18 samples/sec Loss 10.0365 LearningRate 0.0846 Epoch: 1 Global Step: 26830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:21,594-Speed 9320.15 samples/sec Loss 9.8891 LearningRate 0.0846 Epoch: 1 Global Step: 26840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:22,711-Speed 9169.72 samples/sec Loss 9.9426 LearningRate 0.0846 Epoch: 1 Global Step: 26850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:23,796-Speed 9446.45 samples/sec Loss 10.0418 LearningRate 0.0846 Epoch: 1 Global Step: 26860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:24,944-Speed 8929.45 samples/sec Loss 10.0490 LearningRate 0.0845 Epoch: 1 Global Step: 26870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:26,019-Speed 9533.13 samples/sec Loss 10.0492 LearningRate 0.0845 Epoch: 1 Global Step: 26880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:27,128-Speed 9233.42 samples/sec Loss 9.9413 LearningRate 0.0845 Epoch: 1 Global Step: 26890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:28,154-Speed 9988.86 samples/sec Loss 10.0283 LearningRate 0.0845 Epoch: 1 Global Step: 26900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:29,233-Speed 9496.77 samples/sec Loss 9.8660 LearningRate 0.0845 Epoch: 1 Global Step: 26910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:30,312-Speed 9492.14 samples/sec Loss 9.8875 LearningRate 0.0845 Epoch: 1 Global Step: 26920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:31,384-Speed 9562.57 samples/sec Loss 9.9279 LearningRate 0.0845 Epoch: 1 Global Step: 26930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:32,483-Speed 9322.93 samples/sec Loss 10.1063 LearningRate 0.0845 Epoch: 1 Global Step: 26940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:33,608-Speed 9100.21 samples/sec Loss 9.9585 LearningRate 0.0845 Epoch: 1 Global Step: 26950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:34,700-Speed 9386.75 samples/sec Loss 9.9617 LearningRate 0.0845 Epoch: 1 Global Step: 26960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:35,807-Speed 9259.32 samples/sec Loss 10.0042 LearningRate 0.0845 Epoch: 1 Global Step: 26970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:36,898-Speed 9391.06 samples/sec Loss 10.0912 LearningRate 0.0845 Epoch: 1 Global Step: 26980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:37,981-Speed 9462.79 samples/sec Loss 9.9638 LearningRate 0.0845 Epoch: 1 Global Step: 26990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:39,051-Speed 9573.91 samples/sec Loss 10.0546 LearningRate 0.0845 Epoch: 1 Global Step: 27000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:37:40,107-Speed 9705.46 samples/sec Loss 9.9517 LearningRate 0.0845 Epoch: 1 Global Step: 27010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:37:41,218-Speed 9218.72 samples/sec Loss 9.9245 LearningRate 0.0845 Epoch: 1 Global Step: 27020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:37:42,299-Speed 9486.92 samples/sec Loss 10.0468 LearningRate 0.0845 Epoch: 1 Global Step: 27030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:37:43,390-Speed 9388.15 samples/sec Loss 9.9209 LearningRate 0.0845 Epoch: 1 Global Step: 27040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:37:44,467-Speed 9524.99 samples/sec Loss 9.9395 LearningRate 0.0845 Epoch: 1 Global Step: 27050 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:37:45,561-Speed 9363.85 samples/sec Loss 10.0654 LearningRate 0.0844 Epoch: 1 Global Step: 27060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:46,664-Speed 9286.13 samples/sec Loss 10.0459 LearningRate 0.0844 Epoch: 1 Global Step: 27070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:47,737-Speed 9548.75 samples/sec Loss 9.8473 LearningRate 0.0844 Epoch: 1 Global Step: 27080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:48,856-Speed 9155.39 samples/sec Loss 9.9951 LearningRate 0.0844 Epoch: 1 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:49,927-Speed 9570.20 samples/sec Loss 9.9768 LearningRate 0.0844 Epoch: 1 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:51,028-Speed 9306.24 samples/sec Loss 9.9888 LearningRate 0.0844 Epoch: 1 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:52,096-Speed 9587.97 samples/sec Loss 9.9965 LearningRate 0.0844 Epoch: 1 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:53,227-Speed 9060.97 samples/sec Loss 10.0133 LearningRate 0.0844 Epoch: 1 Global Step: 27130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:54,299-Speed 9554.45 samples/sec Loss 10.0079 LearningRate 0.0844 Epoch: 1 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:55,369-Speed 9576.82 samples/sec Loss 9.8037 LearningRate 0.0844 Epoch: 1 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:56,449-Speed 9481.87 samples/sec Loss 9.8827 LearningRate 0.0844 Epoch: 1 Global Step: 27160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:57,536-Speed 9438.82 samples/sec Loss 10.1171 LearningRate 0.0844 Epoch: 1 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:37:58,582-Speed 9787.68 samples/sec Loss 9.9871 LearningRate 0.0844 Epoch: 1 Global Step: 27180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:37:59,687-Speed 9273.10 samples/sec Loss 9.8889 LearningRate 0.0844 Epoch: 1 Global Step: 27190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:00,762-Speed 9538.15 samples/sec Loss 9.9999 LearningRate 0.0844 Epoch: 1 Global Step: 27200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:01,842-Speed 9486.95 samples/sec Loss 9.8751 LearningRate 0.0844 Epoch: 1 Global Step: 27210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:02,946-Speed 9279.68 samples/sec Loss 9.8742 LearningRate 0.0844 Epoch: 1 Global Step: 27220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:04,028-Speed 9469.90 samples/sec Loss 10.0553 LearningRate 0.0844 Epoch: 1 Global Step: 27230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:05,190-Speed 8819.30 samples/sec Loss 10.0157 LearningRate 0.0843 Epoch: 1 Global Step: 27240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:06,304-Speed 9194.36 samples/sec Loss 9.8818 LearningRate 0.0843 Epoch: 1 Global Step: 27250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:07,379-Speed 9533.85 samples/sec Loss 10.0201 LearningRate 0.0843 Epoch: 1 Global Step: 27260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:08,455-Speed 9524.70 samples/sec Loss 9.9848 LearningRate 0.0843 Epoch: 1 Global Step: 27270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:09,524-Speed 9583.97 samples/sec Loss 10.0171 LearningRate 0.0843 Epoch: 1 Global Step: 27280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:10,586-Speed 9644.09 samples/sec Loss 9.9775 LearningRate 0.0843 Epoch: 1 Global Step: 27290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:11,674-Speed 9417.06 samples/sec Loss 9.8774 LearningRate 0.0843 Epoch: 1 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:12,787-Speed 9209.06 samples/sec Loss 10.0837 LearningRate 0.0843 Epoch: 1 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:13,853-Speed 9612.89 samples/sec Loss 9.9041 LearningRate 0.0843 Epoch: 1 Global Step: 27320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:14,963-Speed 9233.79 samples/sec Loss 9.9356 LearningRate 0.0843 Epoch: 1 Global Step: 27330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:16,036-Speed 9550.46 samples/sec Loss 9.9079 LearningRate 0.0843 Epoch: 1 Global Step: 27340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:17,096-Speed 9659.15 samples/sec Loss 9.9574 LearningRate 0.0843 Epoch: 1 Global Step: 27350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:18,192-Speed 9350.65 samples/sec Loss 10.0757 LearningRate 0.0843 Epoch: 1 Global Step: 27360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:19,283-Speed 9392.79 samples/sec Loss 10.0034 LearningRate 0.0843 Epoch: 1 Global Step: 27370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:20,393-Speed 9231.39 samples/sec Loss 9.9204 LearningRate 0.0843 Epoch: 1 Global Step: 27380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:21,493-Speed 9314.21 samples/sec Loss 9.8857 LearningRate 0.0843 Epoch: 1 Global Step: 27390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:22,593-Speed 9323.59 samples/sec Loss 9.8660 LearningRate 0.0843 Epoch: 1 Global Step: 27400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:23,708-Speed 9187.46 samples/sec Loss 9.8000 LearningRate 0.0843 Epoch: 1 Global Step: 27410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:38:24,794-Speed 9427.55 samples/sec Loss 9.9048 LearningRate 0.0842 Epoch: 1 Global Step: 27420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:25,858-Speed 9630.86 samples/sec Loss 9.9282 LearningRate 0.0842 Epoch: 1 Global Step: 27430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:26,933-Speed 9535.89 samples/sec Loss 10.1023 LearningRate 0.0842 Epoch: 1 Global Step: 27440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:28,017-Speed 9458.10 samples/sec Loss 9.9186 LearningRate 0.0842 Epoch: 1 Global Step: 27450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:29,099-Speed 9464.68 samples/sec Loss 9.9907 LearningRate 0.0842 Epoch: 1 Global Step: 27460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:30,219-Speed 9146.36 samples/sec Loss 9.8790 LearningRate 0.0842 Epoch: 1 Global Step: 27470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:31,312-Speed 9374.19 samples/sec Loss 9.9777 LearningRate 0.0842 Epoch: 1 Global Step: 27480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:32,406-Speed 9363.91 samples/sec Loss 10.0124 LearningRate 0.0842 Epoch: 1 Global Step: 27490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:33,503-Speed 9345.98 samples/sec Loss 9.9034 LearningRate 0.0842 Epoch: 1 Global Step: 27500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:34,570-Speed 9600.79 samples/sec Loss 9.9298 LearningRate 0.0842 Epoch: 1 Global Step: 27510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:35,683-Speed 9205.04 samples/sec Loss 9.9015 LearningRate 0.0842 Epoch: 1 Global Step: 27520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:38:36,753-Speed 9577.25 samples/sec Loss 9.9984 LearningRate 0.0842 Epoch: 1 Global Step: 27530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:37,852-Speed 9329.06 samples/sec Loss 10.0656 LearningRate 0.0842 Epoch: 1 Global Step: 27540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:38,970-Speed 9165.15 samples/sec Loss 10.0126 LearningRate 0.0842 Epoch: 1 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:40,043-Speed 9551.61 samples/sec Loss 9.9536 LearningRate 0.0842 Epoch: 1 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:41,147-Speed 9279.99 samples/sec Loss 9.9427 LearningRate 0.0842 Epoch: 1 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:42,237-Speed 9398.87 samples/sec Loss 9.9984 LearningRate 0.0842 Epoch: 1 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:43,319-Speed 9469.75 samples/sec Loss 9.9106 LearningRate 0.0842 Epoch: 1 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:44,413-Speed 9367.11 samples/sec Loss 9.9183 LearningRate 0.0841 Epoch: 1 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:45,493-Speed 9488.60 samples/sec Loss 9.9036 LearningRate 0.0841 Epoch: 1 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:46,587-Speed 9362.33 samples/sec Loss 9.9777 LearningRate 0.0841 Epoch: 1 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:47,670-Speed 9465.22 samples/sec Loss 9.7919 LearningRate 0.0841 Epoch: 1 Global Step: 27630 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:38:48,722-Speed 9739.20 samples/sec Loss 9.8366 LearningRate 0.0841 Epoch: 1 Global Step: 27640 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:38:49,808-Speed 9432.95 samples/sec Loss 9.9307 LearningRate 0.0841 Epoch: 1 Global Step: 27650 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:38:50,898-Speed 9396.05 samples/sec Loss 9.8410 LearningRate 0.0841 Epoch: 1 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:51,973-Speed 9533.13 samples/sec Loss 9.9027 LearningRate 0.0841 Epoch: 1 Global Step: 27670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:53,044-Speed 9570.39 samples/sec Loss 9.9146 LearningRate 0.0841 Epoch: 1 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:54,180-Speed 9017.03 samples/sec Loss 9.7807 LearningRate 0.0841 Epoch: 1 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:55,256-Speed 9520.20 samples/sec Loss 9.9441 LearningRate 0.0841 Epoch: 1 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:56,356-Speed 9316.70 samples/sec Loss 9.8531 LearningRate 0.0841 Epoch: 1 Global Step: 27710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:57,460-Speed 9282.03 samples/sec Loss 9.8548 LearningRate 0.0841 Epoch: 1 Global Step: 27720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:58,567-Speed 9262.29 samples/sec Loss 9.7866 LearningRate 0.0841 Epoch: 1 Global Step: 27730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:38:59,681-Speed 9193.54 samples/sec Loss 9.8516 LearningRate 0.0841 Epoch: 1 Global Step: 27740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:00,775-Speed 9383.40 samples/sec Loss 9.8570 LearningRate 0.0841 Epoch: 1 Global Step: 27750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:01,872-Speed 9336.10 samples/sec Loss 9.9685 LearningRate 0.0841 Epoch: 1 Global Step: 27760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:02,974-Speed 9309.36 samples/sec Loss 9.9362 LearningRate 0.0841 Epoch: 1 Global Step: 27770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:04,055-Speed 9484.24 samples/sec Loss 9.8887 LearningRate 0.0840 Epoch: 1 Global Step: 27780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:05,114-Speed 9667.84 samples/sec Loss 9.9470 LearningRate 0.0840 Epoch: 1 Global Step: 27790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:06,230-Speed 9184.27 samples/sec Loss 9.8526 LearningRate 0.0840 Epoch: 1 Global Step: 27800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:07,339-Speed 9238.02 samples/sec Loss 9.8043 LearningRate 0.0840 Epoch: 1 Global Step: 27810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:08,432-Speed 9384.79 samples/sec Loss 9.8880 LearningRate 0.0840 Epoch: 1 Global Step: 27820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:09,534-Speed 9295.91 samples/sec Loss 9.7993 LearningRate 0.0840 Epoch: 1 Global Step: 27830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:10,657-Speed 9123.11 samples/sec Loss 9.7936 LearningRate 0.0840 Epoch: 1 Global Step: 27840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:11,739-Speed 9465.35 samples/sec Loss 9.9932 LearningRate 0.0840 Epoch: 1 Global Step: 27850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:12,847-Speed 9250.35 samples/sec Loss 9.9120 LearningRate 0.0840 Epoch: 1 Global Step: 27860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:39:13,936-Speed 9410.09 samples/sec Loss 9.9262 LearningRate 0.0840 Epoch: 1 Global Step: 27870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:15,057-Speed 9137.88 samples/sec Loss 9.8769 LearningRate 0.0840 Epoch: 1 Global Step: 27880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:16,155-Speed 9338.09 samples/sec Loss 9.9240 LearningRate 0.0840 Epoch: 1 Global Step: 27890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:17,224-Speed 9584.21 samples/sec Loss 9.8329 LearningRate 0.0840 Epoch: 1 Global Step: 27900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:18,312-Speed 9414.42 samples/sec Loss 9.8530 LearningRate 0.0840 Epoch: 1 Global Step: 27910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:19,388-Speed 9523.11 samples/sec Loss 9.8343 LearningRate 0.0840 Epoch: 1 Global Step: 27920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:20,510-Speed 9137.17 samples/sec Loss 9.8701 LearningRate 0.0840 Epoch: 1 Global Step: 27930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:21,626-Speed 9180.15 samples/sec Loss 9.8124 LearningRate 0.0840 Epoch: 1 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:22,770-Speed 8956.35 samples/sec Loss 9.9604 LearningRate 0.0840 Epoch: 1 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:23,863-Speed 9367.16 samples/sec Loss 9.8978 LearningRate 0.0839 Epoch: 1 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:24,953-Speed 9405.20 samples/sec Loss 9.9326 LearningRate 0.0839 Epoch: 1 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:26,024-Speed 9561.62 samples/sec Loss 9.8772 LearningRate 0.0839 Epoch: 1 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:39:27,118-Speed 9370.26 samples/sec Loss 9.9454 LearningRate 0.0839 Epoch: 1 Global Step: 27990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:39:28,229-Speed 9221.30 samples/sec Loss 9.7739 LearningRate 0.0839 Epoch: 1 Global Step: 28000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:39:50,209-[lfw][28000]XNorm: 13.534033 Training: 2022-04-11 12:39:50,210-[lfw][28000]Accuracy-Flip: 0.99450+-0.00299 Training: 2022-04-11 12:39:50,211-[lfw][28000]Accuracy-Highest: 0.99450 Training: 2022-04-11 12:40:15,659-[cfp_fp][28000]XNorm: 11.307848 Training: 2022-04-11 12:40:15,660-[cfp_fp][28000]Accuracy-Flip: 0.93257+-0.01217 Training: 2022-04-11 12:40:15,660-[cfp_fp][28000]Accuracy-Highest: 0.93614 Training: 2022-04-11 12:40:37,543-[agedb_30][28000]XNorm: 13.094939 Training: 2022-04-11 12:40:37,544-[agedb_30][28000]Accuracy-Flip: 0.94133+-0.01273 Training: 2022-04-11 12:40:37,544-[agedb_30][28000]Accuracy-Highest: 0.94400 Training: 2022-04-11 12:40:38,618-Speed 145.48 samples/sec Loss 9.9535 LearningRate 0.0839 Epoch: 1 Global Step: 28010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:40:39,687-Speed 9584.55 samples/sec Loss 9.8609 LearningRate 0.0839 Epoch: 1 Global Step: 28020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:40:40,734-Speed 9785.39 samples/sec Loss 9.8625 LearningRate 0.0839 Epoch: 1 Global Step: 28030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:40:41,818-Speed 9450.62 samples/sec Loss 9.8539 LearningRate 0.0839 Epoch: 1 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:40:42,869-Speed 9745.79 samples/sec Loss 9.7582 LearningRate 0.0839 Epoch: 1 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:40:43,915-Speed 9804.05 samples/sec Loss 9.8632 LearningRate 0.0839 Epoch: 1 Global Step: 28060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:40:44,964-Speed 9766.87 samples/sec Loss 9.7644 LearningRate 0.0839 Epoch: 1 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:40:46,045-Speed 9480.48 samples/sec Loss 9.7484 LearningRate 0.0839 Epoch: 1 Global Step: 28080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:40:47,172-Speed 9086.41 samples/sec Loss 9.8030 LearningRate 0.0839 Epoch: 1 Global Step: 28090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:48,270-Speed 9332.01 samples/sec Loss 9.9332 LearningRate 0.0839 Epoch: 1 Global Step: 28100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:49,342-Speed 9560.92 samples/sec Loss 9.8449 LearningRate 0.0839 Epoch: 1 Global Step: 28110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:50,444-Speed 9297.45 samples/sec Loss 9.8789 LearningRate 0.0839 Epoch: 1 Global Step: 28120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:51,507-Speed 9635.25 samples/sec Loss 9.8481 LearningRate 0.0839 Epoch: 1 Global Step: 28130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:52,632-Speed 9109.93 samples/sec Loss 9.9771 LearningRate 0.0839 Epoch: 1 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:53,768-Speed 9017.49 samples/sec Loss 9.7107 LearningRate 0.0838 Epoch: 1 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:54,850-Speed 9469.02 samples/sec Loss 9.7940 LearningRate 0.0838 Epoch: 1 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:55,906-Speed 9702.64 samples/sec Loss 9.8586 LearningRate 0.0838 Epoch: 1 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:56,977-Speed 9571.03 samples/sec Loss 9.8399 LearningRate 0.0838 Epoch: 1 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:58,070-Speed 9370.52 samples/sec Loss 9.8361 LearningRate 0.0838 Epoch: 1 Global Step: 28190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:40:59,179-Speed 9238.44 samples/sec Loss 9.8645 LearningRate 0.0838 Epoch: 1 Global Step: 28200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:00,316-Speed 9015.30 samples/sec Loss 9.7977 LearningRate 0.0838 Epoch: 1 Global Step: 28210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:01,423-Speed 9250.11 samples/sec Loss 9.8031 LearningRate 0.0838 Epoch: 1 Global Step: 28220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:02,501-Speed 9509.27 samples/sec Loss 9.7855 LearningRate 0.0838 Epoch: 1 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:03,680-Speed 8687.49 samples/sec Loss 9.9462 LearningRate 0.0838 Epoch: 1 Global Step: 28240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:04,776-Speed 9349.88 samples/sec Loss 9.8401 LearningRate 0.0838 Epoch: 1 Global Step: 28250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:05,854-Speed 9507.07 samples/sec Loss 9.7712 LearningRate 0.0838 Epoch: 1 Global Step: 28260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:06,958-Speed 9286.11 samples/sec Loss 9.8307 LearningRate 0.0838 Epoch: 1 Global Step: 28270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:08,055-Speed 9337.68 samples/sec Loss 9.8938 LearningRate 0.0838 Epoch: 1 Global Step: 28280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:09,142-Speed 9425.36 samples/sec Loss 9.8761 LearningRate 0.0838 Epoch: 1 Global Step: 28290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:10,246-Speed 9287.04 samples/sec Loss 9.9945 LearningRate 0.0838 Epoch: 1 Global Step: 28300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:11,307-Speed 9656.50 samples/sec Loss 10.0122 LearningRate 0.0838 Epoch: 1 Global Step: 28310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:12,411-Speed 9272.05 samples/sec Loss 10.0450 LearningRate 0.0838 Epoch: 1 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:13,476-Speed 9624.71 samples/sec Loss 9.8663 LearningRate 0.0837 Epoch: 1 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:14,587-Speed 9227.51 samples/sec Loss 9.7223 LearningRate 0.0837 Epoch: 1 Global Step: 28340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:16,582-Speed 5133.30 samples/sec Loss 9.7920 LearningRate 0.0837 Epoch: 1 Global Step: 28350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:17,685-Speed 9286.86 samples/sec Loss 9.8426 LearningRate 0.0837 Epoch: 1 Global Step: 28360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:18,769-Speed 9455.93 samples/sec Loss 9.8703 LearningRate 0.0837 Epoch: 1 Global Step: 28370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:19,874-Speed 9268.96 samples/sec Loss 9.7349 LearningRate 0.0837 Epoch: 1 Global Step: 28380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:20,929-Speed 9717.51 samples/sec Loss 9.8571 LearningRate 0.0837 Epoch: 1 Global Step: 28390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:21,990-Speed 9649.13 samples/sec Loss 9.8200 LearningRate 0.0837 Epoch: 1 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:23,027-Speed 9881.83 samples/sec Loss 9.8949 LearningRate 0.0837 Epoch: 1 Global Step: 28410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:24,125-Speed 9335.62 samples/sec Loss 9.7652 LearningRate 0.0837 Epoch: 1 Global Step: 28420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:25,162-Speed 9873.62 samples/sec Loss 9.9524 LearningRate 0.0837 Epoch: 1 Global Step: 28430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:26,223-Speed 9660.33 samples/sec Loss 9.7685 LearningRate 0.0837 Epoch: 1 Global Step: 28440 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:41:27,312-Speed 9411.49 samples/sec Loss 9.8903 LearningRate 0.0837 Epoch: 1 Global Step: 28450 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:41:28,388-Speed 9519.46 samples/sec Loss 9.7405 LearningRate 0.0837 Epoch: 1 Global Step: 28460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:41:29,461-Speed 9548.61 samples/sec Loss 9.8726 LearningRate 0.0837 Epoch: 1 Global Step: 28470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:30,546-Speed 9440.93 samples/sec Loss 9.7551 LearningRate 0.0837 Epoch: 1 Global Step: 28480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:31,600-Speed 9722.58 samples/sec Loss 9.7869 LearningRate 0.0837 Epoch: 1 Global Step: 28490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:32,681-Speed 9482.56 samples/sec Loss 9.6997 LearningRate 0.0837 Epoch: 1 Global Step: 28500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:33,808-Speed 9086.36 samples/sec Loss 9.8193 LearningRate 0.0836 Epoch: 1 Global Step: 28510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:34,883-Speed 9530.67 samples/sec Loss 9.7777 LearningRate 0.0836 Epoch: 1 Global Step: 28520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:36,029-Speed 8943.06 samples/sec Loss 9.8635 LearningRate 0.0836 Epoch: 1 Global Step: 28530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:37,156-Speed 9092.14 samples/sec Loss 9.7635 LearningRate 0.0836 Epoch: 1 Global Step: 28540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:38,254-Speed 9338.06 samples/sec Loss 9.9154 LearningRate 0.0836 Epoch: 1 Global Step: 28550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:39,382-Speed 9082.09 samples/sec Loss 9.8291 LearningRate 0.0836 Epoch: 1 Global Step: 28560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:40,495-Speed 9203.71 samples/sec Loss 9.8292 LearningRate 0.0836 Epoch: 1 Global Step: 28570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:41:41,567-Speed 9562.25 samples/sec Loss 9.7328 LearningRate 0.0836 Epoch: 1 Global Step: 28580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:42,659-Speed 9379.69 samples/sec Loss 9.6902 LearningRate 0.0836 Epoch: 1 Global Step: 28590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:43,732-Speed 9545.05 samples/sec Loss 9.8115 LearningRate 0.0836 Epoch: 1 Global Step: 28600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:44,828-Speed 9361.55 samples/sec Loss 9.8403 LearningRate 0.0836 Epoch: 1 Global Step: 28610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:45,878-Speed 9756.13 samples/sec Loss 9.8243 LearningRate 0.0836 Epoch: 1 Global Step: 28620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:46,993-Speed 9190.98 samples/sec Loss 9.8158 LearningRate 0.0836 Epoch: 1 Global Step: 28630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:48,111-Speed 9168.18 samples/sec Loss 9.9877 LearningRate 0.0836 Epoch: 1 Global Step: 28640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:49,226-Speed 9187.91 samples/sec Loss 9.8399 LearningRate 0.0836 Epoch: 1 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:50,285-Speed 9673.07 samples/sec Loss 9.8902 LearningRate 0.0836 Epoch: 1 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:51,387-Speed 9292.82 samples/sec Loss 9.7609 LearningRate 0.0836 Epoch: 1 Global Step: 28670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:52,458-Speed 9574.09 samples/sec Loss 9.7296 LearningRate 0.0836 Epoch: 1 Global Step: 28680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:53,551-Speed 9376.23 samples/sec Loss 9.7665 LearningRate 0.0835 Epoch: 1 Global Step: 28690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:54,637-Speed 9435.98 samples/sec Loss 9.7391 LearningRate 0.0835 Epoch: 1 Global Step: 28700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:55,755-Speed 9165.28 samples/sec Loss 9.8400 LearningRate 0.0835 Epoch: 1 Global Step: 28710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:56,849-Speed 9370.35 samples/sec Loss 9.7430 LearningRate 0.0835 Epoch: 1 Global Step: 28720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:41:57,895-Speed 9794.10 samples/sec Loss 9.8481 LearningRate 0.0835 Epoch: 1 Global Step: 28730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:41:58,968-Speed 9547.70 samples/sec Loss 9.8500 LearningRate 0.0835 Epoch: 1 Global Step: 28740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:00,062-Speed 9369.27 samples/sec Loss 9.7217 LearningRate 0.0835 Epoch: 1 Global Step: 28750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:01,131-Speed 9583.60 samples/sec Loss 9.8039 LearningRate 0.0835 Epoch: 1 Global Step: 28760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:02,218-Speed 9426.02 samples/sec Loss 9.7475 LearningRate 0.0835 Epoch: 1 Global Step: 28770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:03,300-Speed 9461.34 samples/sec Loss 9.7692 LearningRate 0.0835 Epoch: 1 Global Step: 28780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:04,394-Speed 9381.20 samples/sec Loss 9.8589 LearningRate 0.0835 Epoch: 1 Global Step: 28790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:05,497-Speed 9286.81 samples/sec Loss 9.7850 LearningRate 0.0835 Epoch: 1 Global Step: 28800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:06,588-Speed 9391.21 samples/sec Loss 9.7573 LearningRate 0.0835 Epoch: 1 Global Step: 28810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:07,680-Speed 9381.45 samples/sec Loss 9.7311 LearningRate 0.0835 Epoch: 1 Global Step: 28820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:08,794-Speed 9199.78 samples/sec Loss 9.7634 LearningRate 0.0835 Epoch: 1 Global Step: 28830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:09,866-Speed 9559.34 samples/sec Loss 9.7618 LearningRate 0.0835 Epoch: 1 Global Step: 28840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:10,951-Speed 9445.04 samples/sec Loss 9.7612 LearningRate 0.0835 Epoch: 1 Global Step: 28850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:12,044-Speed 9374.60 samples/sec Loss 9.7169 LearningRate 0.0835 Epoch: 1 Global Step: 28860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:13,120-Speed 9520.99 samples/sec Loss 9.8200 LearningRate 0.0835 Epoch: 1 Global Step: 28870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:14,211-Speed 9393.16 samples/sec Loss 9.8483 LearningRate 0.0834 Epoch: 1 Global Step: 28880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:15,311-Speed 9322.03 samples/sec Loss 9.9056 LearningRate 0.0834 Epoch: 1 Global Step: 28890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:16,394-Speed 9468.80 samples/sec Loss 9.7909 LearningRate 0.0834 Epoch: 1 Global Step: 28900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:17,463-Speed 9584.30 samples/sec Loss 9.7840 LearningRate 0.0834 Epoch: 1 Global Step: 28910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:18,572-Speed 9244.16 samples/sec Loss 9.7927 LearningRate 0.0834 Epoch: 1 Global Step: 28920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:19,649-Speed 9510.48 samples/sec Loss 9.7789 LearningRate 0.0834 Epoch: 1 Global Step: 28930 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:42:20,749-Speed 9320.82 samples/sec Loss 9.7503 LearningRate 0.0834 Epoch: 1 Global Step: 28940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:21,854-Speed 9265.81 samples/sec Loss 9.8231 LearningRate 0.0834 Epoch: 1 Global Step: 28950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:22,911-Speed 9700.17 samples/sec Loss 9.8450 LearningRate 0.0834 Epoch: 1 Global Step: 28960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:24,027-Speed 9179.94 samples/sec Loss 9.5556 LearningRate 0.0834 Epoch: 1 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:25,099-Speed 9559.14 samples/sec Loss 9.8143 LearningRate 0.0834 Epoch: 1 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:26,149-Speed 9752.13 samples/sec Loss 9.7728 LearningRate 0.0834 Epoch: 1 Global Step: 28990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:27,219-Speed 9575.67 samples/sec Loss 9.6987 LearningRate 0.0834 Epoch: 1 Global Step: 29000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:28,293-Speed 9546.75 samples/sec Loss 9.8010 LearningRate 0.0834 Epoch: 1 Global Step: 29010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:29,359-Speed 9614.40 samples/sec Loss 9.7725 LearningRate 0.0834 Epoch: 1 Global Step: 29020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:30,428-Speed 9578.72 samples/sec Loss 9.7744 LearningRate 0.0834 Epoch: 1 Global Step: 29030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:31,533-Speed 9271.99 samples/sec Loss 9.9010 LearningRate 0.0834 Epoch: 1 Global Step: 29040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:42:32,629-Speed 9351.61 samples/sec Loss 9.8040 LearningRate 0.0834 Epoch: 1 Global Step: 29050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:33,749-Speed 9154.07 samples/sec Loss 9.7437 LearningRate 0.0833 Epoch: 1 Global Step: 29060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:34,834-Speed 9438.05 samples/sec Loss 9.7267 LearningRate 0.0833 Epoch: 1 Global Step: 29070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:35,900-Speed 9617.74 samples/sec Loss 9.9015 LearningRate 0.0833 Epoch: 1 Global Step: 29080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:36,945-Speed 9800.36 samples/sec Loss 9.6954 LearningRate 0.0833 Epoch: 1 Global Step: 29090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:38,038-Speed 9382.23 samples/sec Loss 9.7252 LearningRate 0.0833 Epoch: 1 Global Step: 29100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:39,150-Speed 9214.37 samples/sec Loss 9.7184 LearningRate 0.0833 Epoch: 1 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:40,199-Speed 9762.48 samples/sec Loss 9.7969 LearningRate 0.0833 Epoch: 1 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:41,280-Speed 9477.66 samples/sec Loss 9.6626 LearningRate 0.0833 Epoch: 1 Global Step: 29130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:42,346-Speed 9613.43 samples/sec Loss 9.7126 LearningRate 0.0833 Epoch: 1 Global Step: 29140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:43,448-Speed 9302.15 samples/sec Loss 9.7062 LearningRate 0.0833 Epoch: 1 Global Step: 29150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:44,594-Speed 8947.12 samples/sec Loss 9.7326 LearningRate 0.0833 Epoch: 1 Global Step: 29160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:45,639-Speed 9808.83 samples/sec Loss 9.7612 LearningRate 0.0833 Epoch: 1 Global Step: 29170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:46,705-Speed 9607.75 samples/sec Loss 9.7020 LearningRate 0.0833 Epoch: 1 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:47,801-Speed 9343.46 samples/sec Loss 9.7697 LearningRate 0.0833 Epoch: 1 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:48,973-Speed 8748.74 samples/sec Loss 9.8007 LearningRate 0.0833 Epoch: 1 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:50,111-Speed 9006.36 samples/sec Loss 9.8041 LearningRate 0.0833 Epoch: 1 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:51,217-Speed 9265.25 samples/sec Loss 9.7313 LearningRate 0.0833 Epoch: 1 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:52,320-Speed 9285.89 samples/sec Loss 9.7267 LearningRate 0.0833 Epoch: 1 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:42:53,376-Speed 9711.17 samples/sec Loss 9.8131 LearningRate 0.0832 Epoch: 1 Global Step: 29240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:54,476-Speed 9311.84 samples/sec Loss 9.7691 LearningRate 0.0832 Epoch: 1 Global Step: 29250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:55,588-Speed 9213.48 samples/sec Loss 9.7768 LearningRate 0.0832 Epoch: 1 Global Step: 29260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:56,647-Speed 9672.84 samples/sec Loss 9.8007 LearningRate 0.0832 Epoch: 1 Global Step: 29270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:57,742-Speed 9355.82 samples/sec Loss 9.7375 LearningRate 0.0832 Epoch: 1 Global Step: 29280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:58,903-Speed 8821.91 samples/sec Loss 9.7665 LearningRate 0.0832 Epoch: 1 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:42:59,999-Speed 9355.95 samples/sec Loss 9.8489 LearningRate 0.0832 Epoch: 1 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:43:01,045-Speed 9799.90 samples/sec Loss 9.8709 LearningRate 0.0832 Epoch: 1 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:43:02,136-Speed 9386.71 samples/sec Loss 9.8000 LearningRate 0.0832 Epoch: 1 Global Step: 29320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:43:03,211-Speed 9536.28 samples/sec Loss 9.7903 LearningRate 0.0832 Epoch: 1 Global Step: 29330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:43:04,276-Speed 9622.47 samples/sec Loss 9.6465 LearningRate 0.0832 Epoch: 1 Global Step: 29340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:05,311-Speed 9898.42 samples/sec Loss 9.8435 LearningRate 0.0832 Epoch: 1 Global Step: 29350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:06,401-Speed 9397.87 samples/sec Loss 9.7817 LearningRate 0.0832 Epoch: 1 Global Step: 29360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:07,485-Speed 9451.13 samples/sec Loss 9.8567 LearningRate 0.0832 Epoch: 1 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:08,602-Speed 9175.47 samples/sec Loss 9.6748 LearningRate 0.0832 Epoch: 1 Global Step: 29380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:09,681-Speed 9493.72 samples/sec Loss 9.7517 LearningRate 0.0832 Epoch: 1 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:10,757-Speed 9518.69 samples/sec Loss 9.7257 LearningRate 0.0832 Epoch: 1 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:11,860-Speed 9292.92 samples/sec Loss 9.7827 LearningRate 0.0832 Epoch: 1 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:12,922-Speed 9646.49 samples/sec Loss 9.5943 LearningRate 0.0832 Epoch: 1 Global Step: 29420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:14,006-Speed 9453.94 samples/sec Loss 9.6260 LearningRate 0.0831 Epoch: 1 Global Step: 29430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:15,097-Speed 9394.56 samples/sec Loss 9.7436 LearningRate 0.0831 Epoch: 1 Global Step: 29440 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:16,198-Speed 9306.73 samples/sec Loss 9.6777 LearningRate 0.0831 Epoch: 1 Global Step: 29450 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:17,323-Speed 9102.02 samples/sec Loss 9.7604 LearningRate 0.0831 Epoch: 1 Global Step: 29460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:18,399-Speed 9525.79 samples/sec Loss 9.8156 LearningRate 0.0831 Epoch: 1 Global Step: 29470 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:19,467-Speed 9597.29 samples/sec Loss 9.7528 LearningRate 0.0831 Epoch: 1 Global Step: 29480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:20,511-Speed 9813.26 samples/sec Loss 9.8298 LearningRate 0.0831 Epoch: 1 Global Step: 29490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:21,566-Speed 9710.73 samples/sec Loss 9.7847 LearningRate 0.0831 Epoch: 1 Global Step: 29500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:22,643-Speed 9520.70 samples/sec Loss 9.6704 LearningRate 0.0831 Epoch: 1 Global Step: 29510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:23,726-Speed 9457.43 samples/sec Loss 9.7301 LearningRate 0.0831 Epoch: 1 Global Step: 29520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:24,822-Speed 9349.32 samples/sec Loss 9.7409 LearningRate 0.0831 Epoch: 1 Global Step: 29530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:25,919-Speed 9333.52 samples/sec Loss 9.6738 LearningRate 0.0831 Epoch: 1 Global Step: 29540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:27,057-Speed 9008.84 samples/sec Loss 9.7943 LearningRate 0.0831 Epoch: 1 Global Step: 29550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:28,172-Speed 9187.95 samples/sec Loss 9.7315 LearningRate 0.0831 Epoch: 1 Global Step: 29560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:29,273-Speed 9305.72 samples/sec Loss 9.8428 LearningRate 0.0831 Epoch: 1 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:30,354-Speed 9477.35 samples/sec Loss 9.6967 LearningRate 0.0831 Epoch: 1 Global Step: 29580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:31,441-Speed 9424.27 samples/sec Loss 9.6767 LearningRate 0.0831 Epoch: 1 Global Step: 29590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:32,539-Speed 9333.13 samples/sec Loss 9.6270 LearningRate 0.0831 Epoch: 1 Global Step: 29600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:33,617-Speed 9505.99 samples/sec Loss 9.6554 LearningRate 0.0830 Epoch: 1 Global Step: 29610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:34,679-Speed 9650.09 samples/sec Loss 9.8603 LearningRate 0.0830 Epoch: 1 Global Step: 29620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:35,748-Speed 9586.89 samples/sec Loss 9.8332 LearningRate 0.0830 Epoch: 1 Global Step: 29630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:36,879-Speed 9057.50 samples/sec Loss 9.8348 LearningRate 0.0830 Epoch: 1 Global Step: 29640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:37,989-Speed 9234.32 samples/sec Loss 9.6102 LearningRate 0.0830 Epoch: 1 Global Step: 29650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:39,070-Speed 9474.33 samples/sec Loss 9.7486 LearningRate 0.0830 Epoch: 1 Global Step: 29660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:40,143-Speed 9544.93 samples/sec Loss 9.6035 LearningRate 0.0830 Epoch: 1 Global Step: 29670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:41,226-Speed 9464.67 samples/sec Loss 9.6308 LearningRate 0.0830 Epoch: 1 Global Step: 29680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:42,344-Speed 9161.80 samples/sec Loss 9.6572 LearningRate 0.0830 Epoch: 1 Global Step: 29690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:43,458-Speed 9197.82 samples/sec Loss 9.7068 LearningRate 0.0830 Epoch: 1 Global Step: 29700 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:44,545-Speed 9431.60 samples/sec Loss 9.6981 LearningRate 0.0830 Epoch: 1 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:45,635-Speed 9394.96 samples/sec Loss 9.7448 LearningRate 0.0830 Epoch: 1 Global Step: 29720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:46,728-Speed 9377.79 samples/sec Loss 9.7888 LearningRate 0.0830 Epoch: 1 Global Step: 29730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:47,822-Speed 9369.62 samples/sec Loss 9.6536 LearningRate 0.0830 Epoch: 1 Global Step: 29740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:48,929-Speed 9254.34 samples/sec Loss 9.6750 LearningRate 0.0830 Epoch: 1 Global Step: 29750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:50,027-Speed 9355.37 samples/sec Loss 9.6684 LearningRate 0.0830 Epoch: 1 Global Step: 29760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:51,094-Speed 9602.05 samples/sec Loss 9.7117 LearningRate 0.0830 Epoch: 1 Global Step: 29770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:52,178-Speed 9450.19 samples/sec Loss 9.6775 LearningRate 0.0830 Epoch: 1 Global Step: 29780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:53,316-Speed 8998.59 samples/sec Loss 9.8390 LearningRate 0.0829 Epoch: 1 Global Step: 29790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:54,407-Speed 9396.88 samples/sec Loss 9.6539 LearningRate 0.0829 Epoch: 1 Global Step: 29800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:43:55,458-Speed 9745.63 samples/sec Loss 9.7128 LearningRate 0.0829 Epoch: 1 Global Step: 29810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:56,550-Speed 9387.08 samples/sec Loss 9.6710 LearningRate 0.0829 Epoch: 1 Global Step: 29820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:57,655-Speed 9268.70 samples/sec Loss 9.6924 LearningRate 0.0829 Epoch: 1 Global Step: 29830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:58,786-Speed 9057.81 samples/sec Loss 9.6304 LearningRate 0.0829 Epoch: 1 Global Step: 29840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:43:59,916-Speed 9069.66 samples/sec Loss 9.7562 LearningRate 0.0829 Epoch: 1 Global Step: 29850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:44:00,985-Speed 9582.57 samples/sec Loss 9.6394 LearningRate 0.0829 Epoch: 1 Global Step: 29860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:44:02,066-Speed 9481.67 samples/sec Loss 9.8580 LearningRate 0.0829 Epoch: 1 Global Step: 29870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:44:03,143-Speed 9511.79 samples/sec Loss 9.7132 LearningRate 0.0829 Epoch: 1 Global Step: 29880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:44:04,212-Speed 9588.08 samples/sec Loss 9.7023 LearningRate 0.0829 Epoch: 1 Global Step: 29890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:44:05,265-Speed 9729.61 samples/sec Loss 9.5744 LearningRate 0.0829 Epoch: 1 Global Step: 29900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:44:06,350-Speed 9446.05 samples/sec Loss 9.7133 LearningRate 0.0829 Epoch: 1 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:44:07,406-Speed 9699.34 samples/sec Loss 9.7025 LearningRate 0.0829 Epoch: 1 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:44:08,487-Speed 9477.73 samples/sec Loss 9.6392 LearningRate 0.0829 Epoch: 1 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:44:09,558-Speed 9568.32 samples/sec Loss 9.7124 LearningRate 0.0829 Epoch: 1 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:44:10,681-Speed 9122.83 samples/sec Loss 9.6752 LearningRate 0.0829 Epoch: 1 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:44:11,792-Speed 9218.21 samples/sec Loss 9.6548 LearningRate 0.0829 Epoch: 1 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:44:12,886-Speed 9363.41 samples/sec Loss 9.8001 LearningRate 0.0829 Epoch: 1 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:44:13,939-Speed 9738.81 samples/sec Loss 9.6802 LearningRate 0.0828 Epoch: 1 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:44:15,019-Speed 9490.29 samples/sec Loss 9.7430 LearningRate 0.0828 Epoch: 1 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:44:16,177-Speed 8844.85 samples/sec Loss 9.9114 LearningRate 0.0828 Epoch: 1 Global Step: 30000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:44:38,088-[lfw][30000]XNorm: 13.702796 Training: 2022-04-11 12:44:38,089-[lfw][30000]Accuracy-Flip: 0.99517+-0.00320 Training: 2022-04-11 12:44:38,089-[lfw][30000]Accuracy-Highest: 0.99517 Training: 2022-04-11 12:45:03,386-[cfp_fp][30000]XNorm: 11.492991 Training: 2022-04-11 12:45:03,387-[cfp_fp][30000]Accuracy-Flip: 0.93014+-0.01568 Training: 2022-04-11 12:45:03,388-[cfp_fp][30000]Accuracy-Highest: 0.93614 Training: 2022-04-11 12:45:25,187-[agedb_30][30000]XNorm: 13.185564 Training: 2022-04-11 12:45:25,188-[agedb_30][30000]Accuracy-Flip: 0.94233+-0.01506 Training: 2022-04-11 12:45:25,189-[agedb_30][30000]Accuracy-Highest: 0.94400 Training: 2022-04-11 12:45:26,283-Speed 146.07 samples/sec Loss 9.7293 LearningRate 0.0828 Epoch: 1 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:45:27,387-Speed 9280.98 samples/sec Loss 9.7441 LearningRate 0.0828 Epoch: 1 Global Step: 30020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:45:28,450-Speed 9641.57 samples/sec Loss 9.7787 LearningRate 0.0828 Epoch: 1 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:45:29,526-Speed 9522.94 samples/sec Loss 9.5943 LearningRate 0.0828 Epoch: 1 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:45:30,627-Speed 9301.24 samples/sec Loss 9.5887 LearningRate 0.0828 Epoch: 1 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:45:31,706-Speed 9496.82 samples/sec Loss 9.7364 LearningRate 0.0828 Epoch: 1 Global Step: 30060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:45:32,794-Speed 9421.16 samples/sec Loss 9.6745 LearningRate 0.0828 Epoch: 1 Global Step: 30070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:45:33,856-Speed 9653.50 samples/sec Loss 9.6352 LearningRate 0.0828 Epoch: 1 Global Step: 30080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:34,925-Speed 9577.13 samples/sec Loss 9.6950 LearningRate 0.0828 Epoch: 1 Global Step: 30090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:36,029-Speed 9286.00 samples/sec Loss 9.7086 LearningRate 0.0828 Epoch: 1 Global Step: 30100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:37,100-Speed 9561.32 samples/sec Loss 9.7223 LearningRate 0.0828 Epoch: 1 Global Step: 30110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:38,185-Speed 9445.45 samples/sec Loss 9.5864 LearningRate 0.0828 Epoch: 1 Global Step: 30120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:39,247-Speed 9648.60 samples/sec Loss 9.7443 LearningRate 0.0828 Epoch: 1 Global Step: 30130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:40,341-Speed 9360.33 samples/sec Loss 9.7940 LearningRate 0.0828 Epoch: 1 Global Step: 30140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:41,415-Speed 9545.97 samples/sec Loss 9.6196 LearningRate 0.0828 Epoch: 1 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:42,542-Speed 9090.14 samples/sec Loss 9.6089 LearningRate 0.0827 Epoch: 1 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:43,630-Speed 9417.30 samples/sec Loss 9.6102 LearningRate 0.0827 Epoch: 1 Global Step: 30170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:44,748-Speed 9167.78 samples/sec Loss 9.7452 LearningRate 0.0827 Epoch: 1 Global Step: 30180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:45:45,888-Speed 8986.03 samples/sec Loss 9.5893 LearningRate 0.0827 Epoch: 1 Global Step: 30190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:45:46,996-Speed 9248.41 samples/sec Loss 9.5569 LearningRate 0.0827 Epoch: 1 Global Step: 30200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:45:48,132-Speed 9019.61 samples/sec Loss 9.6371 LearningRate 0.0827 Epoch: 1 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:49,251-Speed 9159.88 samples/sec Loss 9.7069 LearningRate 0.0827 Epoch: 1 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:50,358-Speed 9257.36 samples/sec Loss 9.6330 LearningRate 0.0827 Epoch: 1 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:51,457-Speed 9321.48 samples/sec Loss 9.6528 LearningRate 0.0827 Epoch: 1 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:52,562-Speed 9277.04 samples/sec Loss 9.7428 LearningRate 0.0827 Epoch: 1 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:53,635-Speed 9546.04 samples/sec Loss 9.7776 LearningRate 0.0827 Epoch: 1 Global Step: 30260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:54,701-Speed 9614.77 samples/sec Loss 9.6906 LearningRate 0.0827 Epoch: 1 Global Step: 30270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:55,758-Speed 9689.61 samples/sec Loss 9.6332 LearningRate 0.0827 Epoch: 1 Global Step: 30280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:56,804-Speed 9790.39 samples/sec Loss 9.7548 LearningRate 0.0827 Epoch: 1 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:57,893-Speed 9412.86 samples/sec Loss 9.6557 LearningRate 0.0827 Epoch: 1 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:45:58,989-Speed 9347.57 samples/sec Loss 9.6360 LearningRate 0.0827 Epoch: 1 Global Step: 30310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:00,088-Speed 9327.20 samples/sec Loss 9.6244 LearningRate 0.0827 Epoch: 1 Global Step: 30320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:01,161-Speed 9548.96 samples/sec Loss 9.5412 LearningRate 0.0827 Epoch: 1 Global Step: 30330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:02,253-Speed 9382.48 samples/sec Loss 9.7573 LearningRate 0.0826 Epoch: 1 Global Step: 30340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:03,319-Speed 9612.10 samples/sec Loss 9.6654 LearningRate 0.0826 Epoch: 1 Global Step: 30350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:04,397-Speed 9502.09 samples/sec Loss 9.6754 LearningRate 0.0826 Epoch: 1 Global Step: 30360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:05,489-Speed 9382.05 samples/sec Loss 9.6137 LearningRate 0.0826 Epoch: 1 Global Step: 30370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:06,570-Speed 9486.94 samples/sec Loss 9.5174 LearningRate 0.0826 Epoch: 1 Global Step: 30380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:07,686-Speed 9176.34 samples/sec Loss 9.6223 LearningRate 0.0826 Epoch: 1 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:08,779-Speed 9374.00 samples/sec Loss 9.6982 LearningRate 0.0826 Epoch: 1 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:09,919-Speed 8987.48 samples/sec Loss 9.6874 LearningRate 0.0826 Epoch: 1 Global Step: 30410 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:10,989-Speed 9582.98 samples/sec Loss 9.6137 LearningRate 0.0826 Epoch: 1 Global Step: 30420 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:12,065-Speed 9517.47 samples/sec Loss 9.5499 LearningRate 0.0826 Epoch: 1 Global Step: 30430 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:13,151-Speed 9431.50 samples/sec Loss 9.6270 LearningRate 0.0826 Epoch: 1 Global Step: 30440 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:14,228-Speed 9517.56 samples/sec Loss 9.7888 LearningRate 0.0826 Epoch: 1 Global Step: 30450 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:15,323-Speed 9356.29 samples/sec Loss 9.6322 LearningRate 0.0826 Epoch: 1 Global Step: 30460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:16,396-Speed 9550.38 samples/sec Loss 9.7969 LearningRate 0.0826 Epoch: 1 Global Step: 30470 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:17,493-Speed 9341.59 samples/sec Loss 9.7136 LearningRate 0.0826 Epoch: 1 Global Step: 30480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:18,586-Speed 9373.57 samples/sec Loss 9.7052 LearningRate 0.0826 Epoch: 1 Global Step: 30490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:19,686-Speed 9315.28 samples/sec Loss 9.5591 LearningRate 0.0826 Epoch: 1 Global Step: 30500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:20,777-Speed 9393.64 samples/sec Loss 9.5757 LearningRate 0.0826 Epoch: 1 Global Step: 30510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:21,846-Speed 9585.37 samples/sec Loss 9.5667 LearningRate 0.0826 Epoch: 1 Global Step: 30520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:22,954-Speed 9242.98 samples/sec Loss 9.6367 LearningRate 0.0825 Epoch: 1 Global Step: 30530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:24,051-Speed 9339.96 samples/sec Loss 9.6576 LearningRate 0.0825 Epoch: 1 Global Step: 30540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:25,144-Speed 9378.25 samples/sec Loss 9.5692 LearningRate 0.0825 Epoch: 1 Global Step: 30550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:26,218-Speed 9537.66 samples/sec Loss 9.5821 LearningRate 0.0825 Epoch: 1 Global Step: 30560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:27,316-Speed 9336.83 samples/sec Loss 9.5537 LearningRate 0.0825 Epoch: 1 Global Step: 30570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:28,426-Speed 9227.53 samples/sec Loss 9.7803 LearningRate 0.0825 Epoch: 1 Global Step: 30580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:29,523-Speed 9341.33 samples/sec Loss 9.5627 LearningRate 0.0825 Epoch: 1 Global Step: 30590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:30,614-Speed 9391.22 samples/sec Loss 9.6516 LearningRate 0.0825 Epoch: 1 Global Step: 30600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:33,157-Speed 4027.46 samples/sec Loss 9.7050 LearningRate 0.0825 Epoch: 1 Global Step: 30610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:34,297-Speed 8990.91 samples/sec Loss 9.5817 LearningRate 0.0825 Epoch: 1 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:35,366-Speed 9580.00 samples/sec Loss 9.5828 LearningRate 0.0825 Epoch: 1 Global Step: 30630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:36,471-Speed 9282.23 samples/sec Loss 9.5409 LearningRate 0.0825 Epoch: 1 Global Step: 30640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:37,560-Speed 9406.87 samples/sec Loss 9.6068 LearningRate 0.0825 Epoch: 1 Global Step: 30650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:38,651-Speed 9386.71 samples/sec Loss 9.5727 LearningRate 0.0825 Epoch: 1 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:39,759-Speed 9248.34 samples/sec Loss 9.6786 LearningRate 0.0825 Epoch: 1 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:40,862-Speed 9293.29 samples/sec Loss 9.6402 LearningRate 0.0825 Epoch: 1 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:41,969-Speed 9251.57 samples/sec Loss 9.5708 LearningRate 0.0825 Epoch: 1 Global Step: 30690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:43,053-Speed 9454.05 samples/sec Loss 9.6594 LearningRate 0.0825 Epoch: 1 Global Step: 30700 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:44,165-Speed 9218.79 samples/sec Loss 9.6001 LearningRate 0.0824 Epoch: 1 Global Step: 30710 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:45,228-Speed 9638.00 samples/sec Loss 9.6239 LearningRate 0.0824 Epoch: 1 Global Step: 30720 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:46,312-Speed 9451.07 samples/sec Loss 9.6021 LearningRate 0.0824 Epoch: 1 Global Step: 30730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:47,395-Speed 9461.87 samples/sec Loss 9.7018 LearningRate 0.0824 Epoch: 1 Global Step: 30740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:48,489-Speed 9362.83 samples/sec Loss 9.7267 LearningRate 0.0824 Epoch: 1 Global Step: 30750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:49,539-Speed 9762.17 samples/sec Loss 9.5699 LearningRate 0.0824 Epoch: 1 Global Step: 30760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:50,596-Speed 9689.40 samples/sec Loss 9.7487 LearningRate 0.0824 Epoch: 1 Global Step: 30770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:51,686-Speed 9402.79 samples/sec Loss 9.7694 LearningRate 0.0824 Epoch: 1 Global Step: 30780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:52,746-Speed 9664.67 samples/sec Loss 9.6065 LearningRate 0.0824 Epoch: 1 Global Step: 30790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:53,817-Speed 9570.05 samples/sec Loss 9.6883 LearningRate 0.0824 Epoch: 1 Global Step: 30800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:54,896-Speed 9488.76 samples/sec Loss 9.6673 LearningRate 0.0824 Epoch: 1 Global Step: 30810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:55,980-Speed 9455.75 samples/sec Loss 9.4960 LearningRate 0.0824 Epoch: 1 Global Step: 30820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:57,054-Speed 9539.50 samples/sec Loss 9.5787 LearningRate 0.0824 Epoch: 1 Global Step: 30830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:46:58,090-Speed 9888.82 samples/sec Loss 9.7050 LearningRate 0.0824 Epoch: 1 Global Step: 30840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:46:59,156-Speed 9617.49 samples/sec Loss 9.5735 LearningRate 0.0824 Epoch: 1 Global Step: 30850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:00,257-Speed 9309.92 samples/sec Loss 9.6880 LearningRate 0.0824 Epoch: 1 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:01,344-Speed 9429.33 samples/sec Loss 9.6340 LearningRate 0.0824 Epoch: 1 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:02,417-Speed 9543.20 samples/sec Loss 9.5691 LearningRate 0.0824 Epoch: 1 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:03,506-Speed 9406.74 samples/sec Loss 9.6377 LearningRate 0.0823 Epoch: 1 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:04,652-Speed 8944.13 samples/sec Loss 9.6245 LearningRate 0.0823 Epoch: 1 Global Step: 30900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:47:05,749-Speed 9336.77 samples/sec Loss 9.5597 LearningRate 0.0823 Epoch: 1 Global Step: 30910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:47:06,808-Speed 9681.14 samples/sec Loss 9.6541 LearningRate 0.0823 Epoch: 1 Global Step: 30920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:47:07,905-Speed 9344.85 samples/sec Loss 9.5438 LearningRate 0.0823 Epoch: 1 Global Step: 30930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:47:09,009-Speed 9280.60 samples/sec Loss 9.6314 LearningRate 0.0823 Epoch: 1 Global Step: 30940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:47:10,110-Speed 9303.28 samples/sec Loss 9.6714 LearningRate 0.0823 Epoch: 1 Global Step: 30950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:47:11,240-Speed 9063.33 samples/sec Loss 9.5705 LearningRate 0.0823 Epoch: 1 Global Step: 30960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:47:12,316-Speed 9526.41 samples/sec Loss 9.6041 LearningRate 0.0823 Epoch: 1 Global Step: 30970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:47:13,413-Speed 9334.83 samples/sec Loss 9.5565 LearningRate 0.0823 Epoch: 1 Global Step: 30980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:47:14,533-Speed 9153.23 samples/sec Loss 9.5528 LearningRate 0.0823 Epoch: 1 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:47:15,624-Speed 9392.80 samples/sec Loss 9.7179 LearningRate 0.0823 Epoch: 1 Global Step: 31000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:16,721-Speed 9334.70 samples/sec Loss 9.5396 LearningRate 0.0823 Epoch: 1 Global Step: 31010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:17,851-Speed 9065.91 samples/sec Loss 9.6765 LearningRate 0.0823 Epoch: 1 Global Step: 31020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:18,958-Speed 9262.21 samples/sec Loss 9.5608 LearningRate 0.0823 Epoch: 1 Global Step: 31030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:20,028-Speed 9575.03 samples/sec Loss 9.6627 LearningRate 0.0823 Epoch: 1 Global Step: 31040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:21,115-Speed 9427.40 samples/sec Loss 9.5930 LearningRate 0.0823 Epoch: 1 Global Step: 31050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:22,223-Speed 9250.03 samples/sec Loss 9.5270 LearningRate 0.0823 Epoch: 1 Global Step: 31060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:23,315-Speed 9377.04 samples/sec Loss 9.4082 LearningRate 0.0823 Epoch: 1 Global Step: 31070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:24,408-Speed 9375.16 samples/sec Loss 9.6637 LearningRate 0.0822 Epoch: 1 Global Step: 31080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:25,462-Speed 9722.62 samples/sec Loss 9.5893 LearningRate 0.0822 Epoch: 1 Global Step: 31090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:26,591-Speed 9080.21 samples/sec Loss 9.5665 LearningRate 0.0822 Epoch: 1 Global Step: 31100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:27,675-Speed 9450.95 samples/sec Loss 9.5596 LearningRate 0.0822 Epoch: 1 Global Step: 31110 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:28,753-Speed 9504.85 samples/sec Loss 9.5241 LearningRate 0.0822 Epoch: 1 Global Step: 31120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:29,792-Speed 9861.13 samples/sec Loss 9.5224 LearningRate 0.0822 Epoch: 1 Global Step: 31130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:30,929-Speed 9013.82 samples/sec Loss 9.5318 LearningRate 0.0822 Epoch: 1 Global Step: 31140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:31,998-Speed 9585.46 samples/sec Loss 9.4987 LearningRate 0.0822 Epoch: 1 Global Step: 31150 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:33,126-Speed 9086.92 samples/sec Loss 9.6014 LearningRate 0.0822 Epoch: 1 Global Step: 31160 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:34,207-Speed 9472.16 samples/sec Loss 9.4925 LearningRate 0.0822 Epoch: 1 Global Step: 31170 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:35,307-Speed 9314.63 samples/sec Loss 9.6451 LearningRate 0.0822 Epoch: 1 Global Step: 31180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:36,376-Speed 9584.71 samples/sec Loss 9.6047 LearningRate 0.0822 Epoch: 1 Global Step: 31190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:37,431-Speed 9711.47 samples/sec Loss 9.6703 LearningRate 0.0822 Epoch: 1 Global Step: 31200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:38,516-Speed 9451.50 samples/sec Loss 9.5781 LearningRate 0.0822 Epoch: 1 Global Step: 31210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:39,580-Speed 9630.77 samples/sec Loss 9.5924 LearningRate 0.0822 Epoch: 1 Global Step: 31220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:40,658-Speed 9501.56 samples/sec Loss 9.6165 LearningRate 0.0822 Epoch: 1 Global Step: 31230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:41,722-Speed 9626.79 samples/sec Loss 9.6122 LearningRate 0.0822 Epoch: 1 Global Step: 31240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:42,796-Speed 9542.11 samples/sec Loss 9.7307 LearningRate 0.0822 Epoch: 1 Global Step: 31250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:43,869-Speed 9549.29 samples/sec Loss 9.7020 LearningRate 0.0821 Epoch: 1 Global Step: 31260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:44,917-Speed 9774.16 samples/sec Loss 9.6057 LearningRate 0.0821 Epoch: 1 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:45,948-Speed 9935.83 samples/sec Loss 9.5719 LearningRate 0.0821 Epoch: 1 Global Step: 31280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:46,993-Speed 9811.26 samples/sec Loss 9.6461 LearningRate 0.0821 Epoch: 1 Global Step: 31290 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:47:48,091-Speed 9330.36 samples/sec Loss 9.6435 LearningRate 0.0821 Epoch: 1 Global Step: 31300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:49,154-Speed 9635.65 samples/sec Loss 9.7245 LearningRate 0.0821 Epoch: 1 Global Step: 31310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:50,254-Speed 9317.94 samples/sec Loss 9.5194 LearningRate 0.0821 Epoch: 1 Global Step: 31320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:51,331-Speed 9514.29 samples/sec Loss 9.5242 LearningRate 0.0821 Epoch: 1 Global Step: 31330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:52,414-Speed 9457.31 samples/sec Loss 9.5316 LearningRate 0.0821 Epoch: 1 Global Step: 31340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:53,483-Speed 9585.33 samples/sec Loss 9.6115 LearningRate 0.0821 Epoch: 1 Global Step: 31350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:54,545-Speed 9651.59 samples/sec Loss 9.5054 LearningRate 0.0821 Epoch: 1 Global Step: 31360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:47:55,572-Speed 9977.95 samples/sec Loss 9.5350 LearningRate 0.0821 Epoch: 1 Global Step: 31370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:00,002-Speed 2311.64 samples/sec Loss 9.6068 LearningRate 0.0821 Epoch: 1 Global Step: 31380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:02,321-Speed 4418.75 samples/sec Loss 9.7042 LearningRate 0.0821 Epoch: 1 Global Step: 31390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:03,396-Speed 9531.78 samples/sec Loss 9.5769 LearningRate 0.0821 Epoch: 1 Global Step: 31400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:04,456-Speed 9658.54 samples/sec Loss 9.6052 LearningRate 0.0821 Epoch: 1 Global Step: 31410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:05,548-Speed 9382.18 samples/sec Loss 9.6086 LearningRate 0.0821 Epoch: 1 Global Step: 31420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:06,645-Speed 9343.85 samples/sec Loss 9.6397 LearningRate 0.0821 Epoch: 1 Global Step: 31430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:07,708-Speed 9638.47 samples/sec Loss 9.4660 LearningRate 0.0821 Epoch: 1 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:08,814-Speed 9266.94 samples/sec Loss 9.5138 LearningRate 0.0820 Epoch: 1 Global Step: 31450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:09,901-Speed 9426.66 samples/sec Loss 9.6906 LearningRate 0.0820 Epoch: 1 Global Step: 31460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:10,973-Speed 9554.21 samples/sec Loss 9.4658 LearningRate 0.0820 Epoch: 1 Global Step: 31470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:12,025-Speed 9737.93 samples/sec Loss 9.5392 LearningRate 0.0820 Epoch: 1 Global Step: 31480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:13,101-Speed 9529.30 samples/sec Loss 9.5066 LearningRate 0.0820 Epoch: 1 Global Step: 31490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:14,195-Speed 9365.77 samples/sec Loss 9.6105 LearningRate 0.0820 Epoch: 1 Global Step: 31500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:15,268-Speed 9551.23 samples/sec Loss 9.5383 LearningRate 0.0820 Epoch: 1 Global Step: 31510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:16,340-Speed 9558.66 samples/sec Loss 9.4757 LearningRate 0.0820 Epoch: 1 Global Step: 31520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:17,441-Speed 9300.65 samples/sec Loss 9.5498 LearningRate 0.0820 Epoch: 1 Global Step: 31530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:18,528-Speed 9424.27 samples/sec Loss 9.6103 LearningRate 0.0820 Epoch: 1 Global Step: 31540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:19,589-Speed 9662.66 samples/sec Loss 9.5274 LearningRate 0.0820 Epoch: 1 Global Step: 31550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:20,719-Speed 9071.87 samples/sec Loss 9.5509 LearningRate 0.0820 Epoch: 1 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:21,800-Speed 9481.29 samples/sec Loss 9.4348 LearningRate 0.0820 Epoch: 1 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:22,827-Speed 9971.47 samples/sec Loss 9.5458 LearningRate 0.0820 Epoch: 1 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:23,880-Speed 9731.30 samples/sec Loss 9.5989 LearningRate 0.0820 Epoch: 1 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:24,922-Speed 9830.79 samples/sec Loss 9.6205 LearningRate 0.0820 Epoch: 1 Global Step: 31600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:25,975-Speed 9742.82 samples/sec Loss 9.5259 LearningRate 0.0820 Epoch: 1 Global Step: 31610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:27,039-Speed 9630.87 samples/sec Loss 9.5542 LearningRate 0.0820 Epoch: 1 Global Step: 31620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:28,122-Speed 9455.19 samples/sec Loss 9.5618 LearningRate 0.0819 Epoch: 1 Global Step: 31630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:29,249-Speed 9096.75 samples/sec Loss 9.4037 LearningRate 0.0819 Epoch: 1 Global Step: 31640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:30,328-Speed 9498.51 samples/sec Loss 9.7313 LearningRate 0.0819 Epoch: 1 Global Step: 31650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:31,440-Speed 9212.84 samples/sec Loss 9.5357 LearningRate 0.0819 Epoch: 1 Global Step: 31660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:32,511-Speed 9570.25 samples/sec Loss 9.4368 LearningRate 0.0819 Epoch: 1 Global Step: 31670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:33,587-Speed 9520.94 samples/sec Loss 9.4772 LearningRate 0.0819 Epoch: 1 Global Step: 31680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:34,682-Speed 9359.13 samples/sec Loss 9.6262 LearningRate 0.0819 Epoch: 1 Global Step: 31690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:35,765-Speed 9457.88 samples/sec Loss 9.5317 LearningRate 0.0819 Epoch: 1 Global Step: 31700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:36,848-Speed 9463.28 samples/sec Loss 9.5382 LearningRate 0.0819 Epoch: 1 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:37,939-Speed 9397.03 samples/sec Loss 9.3932 LearningRate 0.0819 Epoch: 1 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:39,017-Speed 9496.44 samples/sec Loss 9.6414 LearningRate 0.0819 Epoch: 1 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:40,082-Speed 9625.51 samples/sec Loss 9.5224 LearningRate 0.0819 Epoch: 1 Global Step: 31740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:41,152-Speed 9570.38 samples/sec Loss 9.6763 LearningRate 0.0819 Epoch: 1 Global Step: 31750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:42,234-Speed 9473.64 samples/sec Loss 9.6227 LearningRate 0.0819 Epoch: 1 Global Step: 31760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:43,328-Speed 9365.61 samples/sec Loss 9.5643 LearningRate 0.0819 Epoch: 1 Global Step: 31770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:44,396-Speed 9593.71 samples/sec Loss 9.4931 LearningRate 0.0819 Epoch: 1 Global Step: 31780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:45,460-Speed 9623.44 samples/sec Loss 9.4522 LearningRate 0.0819 Epoch: 1 Global Step: 31790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:46,551-Speed 9392.84 samples/sec Loss 9.4563 LearningRate 0.0819 Epoch: 1 Global Step: 31800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:47,674-Speed 9130.06 samples/sec Loss 9.4965 LearningRate 0.0818 Epoch: 1 Global Step: 31810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:48,779-Speed 9267.86 samples/sec Loss 9.5018 LearningRate 0.0818 Epoch: 1 Global Step: 31820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:49,852-Speed 9557.24 samples/sec Loss 9.5151 LearningRate 0.0818 Epoch: 1 Global Step: 31830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:50,915-Speed 9641.47 samples/sec Loss 9.5954 LearningRate 0.0818 Epoch: 1 Global Step: 31840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:48:52,034-Speed 9156.46 samples/sec Loss 9.5894 LearningRate 0.0818 Epoch: 1 Global Step: 31850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:53,139-Speed 9273.82 samples/sec Loss 9.5308 LearningRate 0.0818 Epoch: 1 Global Step: 31860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:54,219-Speed 9486.99 samples/sec Loss 9.5573 LearningRate 0.0818 Epoch: 1 Global Step: 31870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:55,328-Speed 9243.50 samples/sec Loss 9.6679 LearningRate 0.0818 Epoch: 1 Global Step: 31880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:56,398-Speed 9572.58 samples/sec Loss 9.5792 LearningRate 0.0818 Epoch: 1 Global Step: 31890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:57,488-Speed 9402.09 samples/sec Loss 9.5271 LearningRate 0.0818 Epoch: 1 Global Step: 31900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:58,566-Speed 9503.87 samples/sec Loss 9.4243 LearningRate 0.0818 Epoch: 1 Global Step: 31910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:48:59,639-Speed 9552.78 samples/sec Loss 9.6382 LearningRate 0.0818 Epoch: 1 Global Step: 31920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:49:00,716-Speed 9513.77 samples/sec Loss 9.5931 LearningRate 0.0818 Epoch: 1 Global Step: 31930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:49:01,818-Speed 9298.28 samples/sec Loss 9.7007 LearningRate 0.0818 Epoch: 1 Global Step: 31940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:49:02,965-Speed 8927.65 samples/sec Loss 9.5599 LearningRate 0.0818 Epoch: 1 Global Step: 31950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:49:04,006-Speed 9844.36 samples/sec Loss 9.5562 LearningRate 0.0818 Epoch: 1 Global Step: 31960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:49:05,117-Speed 9224.99 samples/sec Loss 9.5280 LearningRate 0.0818 Epoch: 1 Global Step: 31970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:49:06,168-Speed 9747.90 samples/sec Loss 9.5730 LearningRate 0.0818 Epoch: 1 Global Step: 31980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:49:07,297-Speed 9075.79 samples/sec Loss 9.4346 LearningRate 0.0818 Epoch: 1 Global Step: 31990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:49:08,379-Speed 9466.95 samples/sec Loss 9.4637 LearningRate 0.0817 Epoch: 1 Global Step: 32000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:49:30,509-[lfw][32000]XNorm: 13.668435 Training: 2022-04-11 12:49:30,510-[lfw][32000]Accuracy-Flip: 0.99533+-0.00314 Training: 2022-04-11 12:49:30,510-[lfw][32000]Accuracy-Highest: 0.99533 Training: 2022-04-11 12:49:56,141-[cfp_fp][32000]XNorm: 11.320121 Training: 2022-04-11 12:49:56,141-[cfp_fp][32000]Accuracy-Flip: 0.93143+-0.01148 Training: 2022-04-11 12:49:56,142-[cfp_fp][32000]Accuracy-Highest: 0.93614 Training: 2022-04-11 12:50:18,199-[agedb_30][32000]XNorm: 13.186324 Training: 2022-04-11 12:50:18,200-[agedb_30][32000]Accuracy-Flip: 0.94550+-0.01138 Training: 2022-04-11 12:50:18,201-[agedb_30][32000]Accuracy-Highest: 0.94550 Training: 2022-04-11 12:50:19,308-Speed 144.37 samples/sec Loss 9.5827 LearningRate 0.0817 Epoch: 1 Global Step: 32010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:20,367-Speed 9674.81 samples/sec Loss 9.5916 LearningRate 0.0817 Epoch: 1 Global Step: 32020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:21,419-Speed 9739.36 samples/sec Loss 9.4947 LearningRate 0.0817 Epoch: 1 Global Step: 32030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:22,524-Speed 9271.67 samples/sec Loss 9.4691 LearningRate 0.0817 Epoch: 1 Global Step: 32040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:23,595-Speed 9565.43 samples/sec Loss 9.6127 LearningRate 0.0817 Epoch: 1 Global Step: 32050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:24,659-Speed 9623.18 samples/sec Loss 9.5454 LearningRate 0.0817 Epoch: 1 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:50:25,765-Speed 9278.36 samples/sec Loss 9.4259 LearningRate 0.0817 Epoch: 1 Global Step: 32070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:50:26,841-Speed 9521.27 samples/sec Loss 9.5783 LearningRate 0.0817 Epoch: 1 Global Step: 32080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:50:27,901-Speed 9666.68 samples/sec Loss 9.4954 LearningRate 0.0817 Epoch: 1 Global Step: 32090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:50:28,956-Speed 9714.05 samples/sec Loss 9.6590 LearningRate 0.0817 Epoch: 1 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:50:30,055-Speed 9322.29 samples/sec Loss 9.5236 LearningRate 0.0817 Epoch: 1 Global Step: 32110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:50:31,169-Speed 9198.52 samples/sec Loss 9.4549 LearningRate 0.0817 Epoch: 1 Global Step: 32120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:50:32,237-Speed 9593.95 samples/sec Loss 9.6262 LearningRate 0.0817 Epoch: 1 Global Step: 32130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:50:33,354-Speed 9172.34 samples/sec Loss 9.5882 LearningRate 0.0817 Epoch: 1 Global Step: 32140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:50:34,433-Speed 9493.85 samples/sec Loss 9.4976 LearningRate 0.0817 Epoch: 1 Global Step: 32150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:50:35,512-Speed 9498.07 samples/sec Loss 9.5197 LearningRate 0.0817 Epoch: 1 Global Step: 32160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:36,585-Speed 9552.73 samples/sec Loss 9.4525 LearningRate 0.0817 Epoch: 1 Global Step: 32170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:37,663-Speed 9502.49 samples/sec Loss 9.4229 LearningRate 0.0816 Epoch: 1 Global Step: 32180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:38,742-Speed 9498.16 samples/sec Loss 9.4338 LearningRate 0.0816 Epoch: 1 Global Step: 32190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:39,805-Speed 9638.99 samples/sec Loss 9.4483 LearningRate 0.0816 Epoch: 1 Global Step: 32200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:40,863-Speed 9687.24 samples/sec Loss 9.4168 LearningRate 0.0816 Epoch: 1 Global Step: 32210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:41,951-Speed 9416.30 samples/sec Loss 9.5056 LearningRate 0.0816 Epoch: 1 Global Step: 32220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:43,047-Speed 9344.09 samples/sec Loss 9.5282 LearningRate 0.0816 Epoch: 1 Global Step: 32230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:44,074-Speed 9976.55 samples/sec Loss 9.6586 LearningRate 0.0816 Epoch: 1 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:45,159-Speed 9446.53 samples/sec Loss 9.4717 LearningRate 0.0816 Epoch: 1 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:46,248-Speed 9412.33 samples/sec Loss 9.4497 LearningRate 0.0816 Epoch: 1 Global Step: 32260 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:50:47,297-Speed 9764.75 samples/sec Loss 9.5075 LearningRate 0.0816 Epoch: 1 Global Step: 32270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:48,346-Speed 9761.13 samples/sec Loss 9.4525 LearningRate 0.0816 Epoch: 1 Global Step: 32280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:49,408-Speed 9653.67 samples/sec Loss 9.4302 LearningRate 0.0816 Epoch: 1 Global Step: 32290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:50,477-Speed 9582.44 samples/sec Loss 9.4918 LearningRate 0.0816 Epoch: 1 Global Step: 32300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:51,528-Speed 9751.57 samples/sec Loss 9.5155 LearningRate 0.0816 Epoch: 1 Global Step: 32310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:52,623-Speed 9355.77 samples/sec Loss 9.4418 LearningRate 0.0816 Epoch: 1 Global Step: 32320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:53,696-Speed 9548.92 samples/sec Loss 9.3628 LearningRate 0.0816 Epoch: 1 Global Step: 32330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:54,858-Speed 8819.10 samples/sec Loss 9.4948 LearningRate 0.0816 Epoch: 1 Global Step: 32340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:55,939-Speed 9476.01 samples/sec Loss 9.4711 LearningRate 0.0816 Epoch: 1 Global Step: 32350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:57,062-Speed 9122.57 samples/sec Loss 9.5958 LearningRate 0.0816 Epoch: 1 Global Step: 32360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:50:58,150-Speed 9425.54 samples/sec Loss 9.5784 LearningRate 0.0815 Epoch: 1 Global Step: 32370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:50:59,235-Speed 9442.82 samples/sec Loss 9.4804 LearningRate 0.0815 Epoch: 1 Global Step: 32380 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:00,332-Speed 9342.47 samples/sec Loss 9.6315 LearningRate 0.0815 Epoch: 1 Global Step: 32390 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:01,388-Speed 9712.30 samples/sec Loss 9.4392 LearningRate 0.0815 Epoch: 1 Global Step: 32400 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:02,508-Speed 9149.60 samples/sec Loss 9.3912 LearningRate 0.0815 Epoch: 1 Global Step: 32410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:03,556-Speed 9778.29 samples/sec Loss 9.4370 LearningRate 0.0815 Epoch: 1 Global Step: 32420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:04,631-Speed 9531.44 samples/sec Loss 9.5638 LearningRate 0.0815 Epoch: 1 Global Step: 32430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:05,713-Speed 9473.72 samples/sec Loss 9.4648 LearningRate 0.0815 Epoch: 1 Global Step: 32440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:06,780-Speed 9608.71 samples/sec Loss 9.6420 LearningRate 0.0815 Epoch: 1 Global Step: 32450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:07,881-Speed 9302.40 samples/sec Loss 9.5252 LearningRate 0.0815 Epoch: 1 Global Step: 32460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:09,028-Speed 8932.56 samples/sec Loss 9.4540 LearningRate 0.0815 Epoch: 1 Global Step: 32470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:10,128-Speed 9315.96 samples/sec Loss 9.6114 LearningRate 0.0815 Epoch: 1 Global Step: 32480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:11,204-Speed 9527.22 samples/sec Loss 9.4863 LearningRate 0.0815 Epoch: 1 Global Step: 32490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:12,273-Speed 9584.32 samples/sec Loss 9.5817 LearningRate 0.0815 Epoch: 1 Global Step: 32500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:13,383-Speed 9226.23 samples/sec Loss 9.4970 LearningRate 0.0815 Epoch: 1 Global Step: 32510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:14,478-Speed 9352.97 samples/sec Loss 9.4856 LearningRate 0.0815 Epoch: 1 Global Step: 32520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:15,532-Speed 9729.92 samples/sec Loss 9.4889 LearningRate 0.0815 Epoch: 1 Global Step: 32530 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:16,588-Speed 9696.13 samples/sec Loss 9.5101 LearningRate 0.0815 Epoch: 1 Global Step: 32540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:17,685-Speed 9345.04 samples/sec Loss 9.5140 LearningRate 0.0814 Epoch: 1 Global Step: 32550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:18,817-Speed 9050.05 samples/sec Loss 9.2994 LearningRate 0.0814 Epoch: 1 Global Step: 32560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:19,927-Speed 9231.70 samples/sec Loss 9.4627 LearningRate 0.0814 Epoch: 1 Global Step: 32570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:21,079-Speed 8899.44 samples/sec Loss 9.3938 LearningRate 0.0814 Epoch: 1 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:22,122-Speed 9823.20 samples/sec Loss 9.5413 LearningRate 0.0814 Epoch: 1 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:23,198-Speed 9519.21 samples/sec Loss 9.3691 LearningRate 0.0814 Epoch: 1 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:24,288-Speed 9397.49 samples/sec Loss 9.3696 LearningRate 0.0814 Epoch: 1 Global Step: 32610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:25,399-Speed 9225.76 samples/sec Loss 9.4535 LearningRate 0.0814 Epoch: 1 Global Step: 32620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:26,491-Speed 9384.93 samples/sec Loss 9.4275 LearningRate 0.0814 Epoch: 1 Global Step: 32630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:27,591-Speed 9314.47 samples/sec Loss 9.4110 LearningRate 0.0814 Epoch: 1 Global Step: 32640 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:28,676-Speed 9438.28 samples/sec Loss 9.4410 LearningRate 0.0814 Epoch: 1 Global Step: 32650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:29,799-Speed 9122.20 samples/sec Loss 9.6048 LearningRate 0.0814 Epoch: 1 Global Step: 32660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:30,903-Speed 9282.39 samples/sec Loss 9.3545 LearningRate 0.0814 Epoch: 1 Global Step: 32670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:31,971-Speed 9598.79 samples/sec Loss 9.4526 LearningRate 0.0814 Epoch: 1 Global Step: 32680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:33,054-Speed 9456.14 samples/sec Loss 9.6651 LearningRate 0.0814 Epoch: 1 Global Step: 32690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:34,135-Speed 9482.62 samples/sec Loss 9.5099 LearningRate 0.0814 Epoch: 1 Global Step: 32700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:35,209-Speed 9536.70 samples/sec Loss 9.4914 LearningRate 0.0814 Epoch: 1 Global Step: 32710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:36,296-Speed 9427.47 samples/sec Loss 9.5017 LearningRate 0.0814 Epoch: 1 Global Step: 32720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:37,384-Speed 9418.00 samples/sec Loss 9.4755 LearningRate 0.0814 Epoch: 1 Global Step: 32730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:38,433-Speed 9775.21 samples/sec Loss 9.4673 LearningRate 0.0813 Epoch: 1 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:39,484-Speed 9742.46 samples/sec Loss 9.4458 LearningRate 0.0813 Epoch: 1 Global Step: 32750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:40,585-Speed 9307.19 samples/sec Loss 9.3607 LearningRate 0.0813 Epoch: 1 Global Step: 32760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:41,683-Speed 9336.84 samples/sec Loss 9.4395 LearningRate 0.0813 Epoch: 1 Global Step: 32770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:42,788-Speed 9268.04 samples/sec Loss 9.4680 LearningRate 0.0813 Epoch: 1 Global Step: 32780 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:43,854-Speed 9617.95 samples/sec Loss 9.3317 LearningRate 0.0813 Epoch: 1 Global Step: 32790 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:44,917-Speed 9636.72 samples/sec Loss 9.4554 LearningRate 0.0813 Epoch: 1 Global Step: 32800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:45,973-Speed 9702.99 samples/sec Loss 9.3862 LearningRate 0.0813 Epoch: 1 Global Step: 32810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:47,028-Speed 9704.84 samples/sec Loss 9.4378 LearningRate 0.0813 Epoch: 1 Global Step: 32820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:48,137-Speed 9239.67 samples/sec Loss 9.3672 LearningRate 0.0813 Epoch: 1 Global Step: 32830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:49,227-Speed 9401.02 samples/sec Loss 9.4289 LearningRate 0.0813 Epoch: 1 Global Step: 32840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:50,324-Speed 9337.84 samples/sec Loss 9.3893 LearningRate 0.0813 Epoch: 1 Global Step: 32850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:51,434-Speed 9235.96 samples/sec Loss 9.2880 LearningRate 0.0813 Epoch: 1 Global Step: 32860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:51:52,529-Speed 9349.93 samples/sec Loss 9.3504 LearningRate 0.0813 Epoch: 1 Global Step: 32870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:53,655-Speed 9104.62 samples/sec Loss 9.5713 LearningRate 0.0813 Epoch: 1 Global Step: 32880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:54,742-Speed 9422.45 samples/sec Loss 9.4527 LearningRate 0.0813 Epoch: 1 Global Step: 32890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:55,791-Speed 9776.49 samples/sec Loss 9.4545 LearningRate 0.0813 Epoch: 1 Global Step: 32900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:56,954-Speed 8811.45 samples/sec Loss 9.2636 LearningRate 0.0813 Epoch: 1 Global Step: 32910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:58,038-Speed 9449.67 samples/sec Loss 9.4616 LearningRate 0.0812 Epoch: 1 Global Step: 32920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:51:59,141-Speed 9289.32 samples/sec Loss 9.5086 LearningRate 0.0812 Epoch: 1 Global Step: 32930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:00,210-Speed 9581.19 samples/sec Loss 9.4181 LearningRate 0.0812 Epoch: 1 Global Step: 32940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:01,317-Speed 9256.97 samples/sec Loss 9.3925 LearningRate 0.0812 Epoch: 1 Global Step: 32950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:02,402-Speed 9447.46 samples/sec Loss 9.3635 LearningRate 0.0812 Epoch: 1 Global Step: 32960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:03,508-Speed 9259.74 samples/sec Loss 9.3522 LearningRate 0.0812 Epoch: 1 Global Step: 32970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:52:04,594-Speed 9433.29 samples/sec Loss 9.4077 LearningRate 0.0812 Epoch: 1 Global Step: 32980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:05,680-Speed 9446.83 samples/sec Loss 9.4973 LearningRate 0.0812 Epoch: 1 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:06,802-Speed 9130.64 samples/sec Loss 9.4121 LearningRate 0.0812 Epoch: 1 Global Step: 33000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:07,903-Speed 9308.57 samples/sec Loss 9.3986 LearningRate 0.0812 Epoch: 1 Global Step: 33010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:08,978-Speed 9533.00 samples/sec Loss 9.3563 LearningRate 0.0812 Epoch: 1 Global Step: 33020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:10,080-Speed 9293.00 samples/sec Loss 9.3322 LearningRate 0.0812 Epoch: 1 Global Step: 33030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:11,152-Speed 9553.65 samples/sec Loss 9.4277 LearningRate 0.0812 Epoch: 1 Global Step: 33040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:12,226-Speed 9551.04 samples/sec Loss 9.4511 LearningRate 0.0812 Epoch: 1 Global Step: 33050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:13,289-Speed 9635.91 samples/sec Loss 9.2886 LearningRate 0.0812 Epoch: 1 Global Step: 33060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:14,364-Speed 9537.40 samples/sec Loss 9.3263 LearningRate 0.0812 Epoch: 1 Global Step: 33070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:15,444-Speed 9487.96 samples/sec Loss 9.4438 LearningRate 0.0812 Epoch: 1 Global Step: 33080 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:52:16,508-Speed 9630.91 samples/sec Loss 9.3068 LearningRate 0.0812 Epoch: 1 Global Step: 33090 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:52:17,623-Speed 9182.27 samples/sec Loss 9.3506 LearningRate 0.0812 Epoch: 1 Global Step: 33100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:52:18,730-Speed 9256.16 samples/sec Loss 9.4511 LearningRate 0.0811 Epoch: 1 Global Step: 33110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:19,766-Speed 9890.34 samples/sec Loss 9.5137 LearningRate 0.0811 Epoch: 1 Global Step: 33120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:20,844-Speed 9509.63 samples/sec Loss 9.3503 LearningRate 0.0811 Epoch: 1 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:21,959-Speed 9188.08 samples/sec Loss 9.3447 LearningRate 0.0811 Epoch: 1 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:23,060-Speed 9304.63 samples/sec Loss 9.3398 LearningRate 0.0811 Epoch: 1 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:24,096-Speed 9891.99 samples/sec Loss 9.4489 LearningRate 0.0811 Epoch: 1 Global Step: 33160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:25,194-Speed 9331.08 samples/sec Loss 9.4166 LearningRate 0.0811 Epoch: 1 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:26,253-Speed 9680.41 samples/sec Loss 9.4872 LearningRate 0.0811 Epoch: 1 Global Step: 33180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:27,336-Speed 9457.46 samples/sec Loss 9.4438 LearningRate 0.0811 Epoch: 1 Global Step: 33190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:28,405-Speed 9581.09 samples/sec Loss 9.5628 LearningRate 0.0811 Epoch: 1 Global Step: 33200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:29,462-Speed 9699.17 samples/sec Loss 9.4219 LearningRate 0.0811 Epoch: 1 Global Step: 33210 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:52:30,535-Speed 9550.10 samples/sec Loss 9.4436 LearningRate 0.0811 Epoch: 1 Global Step: 33220 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:52:31,627-Speed 9380.06 samples/sec Loss 9.4443 LearningRate 0.0811 Epoch: 1 Global Step: 33230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:32,711-Speed 9447.60 samples/sec Loss 9.4743 LearningRate 0.0811 Epoch: 1 Global Step: 33240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:33,772-Speed 9662.12 samples/sec Loss 9.3936 LearningRate 0.0811 Epoch: 1 Global Step: 33250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:34,865-Speed 9376.86 samples/sec Loss 9.4899 LearningRate 0.0811 Epoch: 1 Global Step: 33260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:35,967-Speed 9297.02 samples/sec Loss 9.3525 LearningRate 0.0811 Epoch: 1 Global Step: 33270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:37,057-Speed 9407.22 samples/sec Loss 9.3950 LearningRate 0.0811 Epoch: 1 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:38,140-Speed 9455.87 samples/sec Loss 9.5069 LearningRate 0.0810 Epoch: 1 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:39,251-Speed 9224.72 samples/sec Loss 9.5053 LearningRate 0.0810 Epoch: 1 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:40,331-Speed 9483.85 samples/sec Loss 9.5029 LearningRate 0.0810 Epoch: 1 Global Step: 33310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:41,417-Speed 9431.54 samples/sec Loss 9.3924 LearningRate 0.0810 Epoch: 1 Global Step: 33320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:42,481-Speed 9642.04 samples/sec Loss 9.5133 LearningRate 0.0810 Epoch: 1 Global Step: 33330 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:52:43,547-Speed 9604.01 samples/sec Loss 9.3729 LearningRate 0.0810 Epoch: 1 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:44,622-Speed 9537.47 samples/sec Loss 9.3561 LearningRate 0.0810 Epoch: 1 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:45,698-Speed 9520.10 samples/sec Loss 9.3005 LearningRate 0.0810 Epoch: 1 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:46,804-Speed 9260.28 samples/sec Loss 9.3837 LearningRate 0.0810 Epoch: 1 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:52:48,101-Speed 7902.22 samples/sec Loss 9.3482 LearningRate 0.0810 Epoch: 1 Global Step: 33380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:16,385-Speed 362.06 samples/sec Loss 8.8359 LearningRate 0.0810 Epoch: 2 Global Step: 33390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:17,491-Speed 9270.30 samples/sec Loss 8.6036 LearningRate 0.0810 Epoch: 2 Global Step: 33400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:18,593-Speed 9294.49 samples/sec Loss 8.5436 LearningRate 0.0810 Epoch: 2 Global Step: 33410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:19,672-Speed 9497.50 samples/sec Loss 8.6140 LearningRate 0.0810 Epoch: 2 Global Step: 33420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:21,086-Speed 7246.96 samples/sec Loss 8.5390 LearningRate 0.0810 Epoch: 2 Global Step: 33430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:22,273-Speed 8630.00 samples/sec Loss 8.5016 LearningRate 0.0810 Epoch: 2 Global Step: 33440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:23,375-Speed 9301.40 samples/sec Loss 8.4967 LearningRate 0.0810 Epoch: 2 Global Step: 33450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:24,491-Speed 9182.05 samples/sec Loss 8.6369 LearningRate 0.0810 Epoch: 2 Global Step: 33460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:25,551-Speed 9662.30 samples/sec Loss 8.5898 LearningRate 0.0810 Epoch: 2 Global Step: 33470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:26,644-Speed 9375.10 samples/sec Loss 8.6141 LearningRate 0.0809 Epoch: 2 Global Step: 33480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:27,761-Speed 9171.15 samples/sec Loss 8.7718 LearningRate 0.0809 Epoch: 2 Global Step: 33490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:53:28,879-Speed 9170.59 samples/sec Loss 8.6562 LearningRate 0.0809 Epoch: 2 Global Step: 33500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:53:29,953-Speed 9534.27 samples/sec Loss 8.6487 LearningRate 0.0809 Epoch: 2 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:53:31,078-Speed 9110.84 samples/sec Loss 8.6917 LearningRate 0.0809 Epoch: 2 Global Step: 33520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:53:32,121-Speed 9822.60 samples/sec Loss 8.6543 LearningRate 0.0809 Epoch: 2 Global Step: 33530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:53:33,244-Speed 9123.34 samples/sec Loss 8.6407 LearningRate 0.0809 Epoch: 2 Global Step: 33540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:53:34,355-Speed 9225.00 samples/sec Loss 8.6084 LearningRate 0.0809 Epoch: 2 Global Step: 33550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:53:35,388-Speed 9920.91 samples/sec Loss 8.7200 LearningRate 0.0809 Epoch: 2 Global Step: 33560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:53:36,456-Speed 9595.47 samples/sec Loss 8.6380 LearningRate 0.0809 Epoch: 2 Global Step: 33570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:53:37,537-Speed 9477.69 samples/sec Loss 8.7323 LearningRate 0.0809 Epoch: 2 Global Step: 33580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:53:38,625-Speed 9419.52 samples/sec Loss 8.5866 LearningRate 0.0809 Epoch: 2 Global Step: 33590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:39,702-Speed 9515.13 samples/sec Loss 8.6267 LearningRate 0.0809 Epoch: 2 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:40,743-Speed 9845.15 samples/sec Loss 8.6528 LearningRate 0.0809 Epoch: 2 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:41,998-Speed 8166.59 samples/sec Loss 8.6600 LearningRate 0.0809 Epoch: 2 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:43,221-Speed 8376.63 samples/sec Loss 8.7400 LearningRate 0.0809 Epoch: 2 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:44,829-Speed 6370.04 samples/sec Loss 8.7941 LearningRate 0.0809 Epoch: 2 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:45,898-Speed 9583.35 samples/sec Loss 8.7318 LearningRate 0.0809 Epoch: 2 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:46,944-Speed 9795.96 samples/sec Loss 8.6820 LearningRate 0.0809 Epoch: 2 Global Step: 33660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:48,003-Speed 9671.48 samples/sec Loss 8.6380 LearningRate 0.0808 Epoch: 2 Global Step: 33670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:49,099-Speed 9353.68 samples/sec Loss 8.5868 LearningRate 0.0808 Epoch: 2 Global Step: 33680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:50,171-Speed 9557.74 samples/sec Loss 8.5969 LearningRate 0.0808 Epoch: 2 Global Step: 33690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:53:51,255-Speed 9446.07 samples/sec Loss 8.6889 LearningRate 0.0808 Epoch: 2 Global Step: 33700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:52,304-Speed 9769.36 samples/sec Loss 8.6853 LearningRate 0.0808 Epoch: 2 Global Step: 33710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:53,390-Speed 9434.62 samples/sec Loss 8.5546 LearningRate 0.0808 Epoch: 2 Global Step: 33720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:54,455-Speed 9620.72 samples/sec Loss 8.7785 LearningRate 0.0808 Epoch: 2 Global Step: 33730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:55,543-Speed 9423.88 samples/sec Loss 8.6119 LearningRate 0.0808 Epoch: 2 Global Step: 33740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:56,577-Speed 9902.41 samples/sec Loss 8.7408 LearningRate 0.0808 Epoch: 2 Global Step: 33750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:57,658-Speed 9482.63 samples/sec Loss 8.6124 LearningRate 0.0808 Epoch: 2 Global Step: 33760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:58,708-Speed 9758.18 samples/sec Loss 8.7068 LearningRate 0.0808 Epoch: 2 Global Step: 33770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:53:59,755-Speed 9784.63 samples/sec Loss 8.6629 LearningRate 0.0808 Epoch: 2 Global Step: 33780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:00,810-Speed 9709.61 samples/sec Loss 8.7244 LearningRate 0.0808 Epoch: 2 Global Step: 33790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:01,888-Speed 9507.08 samples/sec Loss 8.6575 LearningRate 0.0808 Epoch: 2 Global Step: 33800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:54:02,942-Speed 9719.66 samples/sec Loss 8.7483 LearningRate 0.0808 Epoch: 2 Global Step: 33810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:04,014-Speed 9559.43 samples/sec Loss 8.7844 LearningRate 0.0808 Epoch: 2 Global Step: 33820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:05,075-Speed 9660.04 samples/sec Loss 8.7815 LearningRate 0.0808 Epoch: 2 Global Step: 33830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:06,154-Speed 9495.33 samples/sec Loss 8.7386 LearningRate 0.0808 Epoch: 2 Global Step: 33840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:07,235-Speed 9483.24 samples/sec Loss 8.6899 LearningRate 0.0807 Epoch: 2 Global Step: 33850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:08,319-Speed 9448.55 samples/sec Loss 8.7067 LearningRate 0.0807 Epoch: 2 Global Step: 33860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:09,381-Speed 9650.53 samples/sec Loss 8.7873 LearningRate 0.0807 Epoch: 2 Global Step: 33870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:10,475-Speed 9365.45 samples/sec Loss 8.7104 LearningRate 0.0807 Epoch: 2 Global Step: 33880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:11,546-Speed 9566.24 samples/sec Loss 8.8003 LearningRate 0.0807 Epoch: 2 Global Step: 33890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:12,653-Speed 9258.40 samples/sec Loss 8.6235 LearningRate 0.0807 Epoch: 2 Global Step: 33900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:13,762-Speed 9239.43 samples/sec Loss 8.7390 LearningRate 0.0807 Epoch: 2 Global Step: 33910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:14,860-Speed 9329.02 samples/sec Loss 8.7169 LearningRate 0.0807 Epoch: 2 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:15,985-Speed 9103.39 samples/sec Loss 8.7377 LearningRate 0.0807 Epoch: 2 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:17,113-Speed 9088.56 samples/sec Loss 8.8265 LearningRate 0.0807 Epoch: 2 Global Step: 33940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:18,216-Speed 9288.38 samples/sec Loss 8.5822 LearningRate 0.0807 Epoch: 2 Global Step: 33950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:19,304-Speed 9422.03 samples/sec Loss 8.6262 LearningRate 0.0807 Epoch: 2 Global Step: 33960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:20,346-Speed 9828.84 samples/sec Loss 8.7826 LearningRate 0.0807 Epoch: 2 Global Step: 33970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:21,412-Speed 9615.88 samples/sec Loss 8.6856 LearningRate 0.0807 Epoch: 2 Global Step: 33980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:22,483-Speed 9563.97 samples/sec Loss 8.6193 LearningRate 0.0807 Epoch: 2 Global Step: 33990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:23,529-Speed 9797.45 samples/sec Loss 8.8220 LearningRate 0.0807 Epoch: 2 Global Step: 34000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:54:45,433-[lfw][34000]XNorm: 13.287884 Training: 2022-04-11 12:54:45,434-[lfw][34000]Accuracy-Flip: 0.99467+-0.00306 Training: 2022-04-11 12:54:45,434-[lfw][34000]Accuracy-Highest: 0.99533 Training: 2022-04-11 12:55:10,772-[cfp_fp][34000]XNorm: 11.163665 Training: 2022-04-11 12:55:10,772-[cfp_fp][34000]Accuracy-Flip: 0.93943+-0.01171 Training: 2022-04-11 12:55:10,773-[cfp_fp][34000]Accuracy-Highest: 0.93943 Training: 2022-04-11 12:55:32,618-[agedb_30][34000]XNorm: 12.864343 Training: 2022-04-11 12:55:32,618-[agedb_30][34000]Accuracy-Flip: 0.94900+-0.01379 Training: 2022-04-11 12:55:32,619-[agedb_30][34000]Accuracy-Highest: 0.94900 Training: 2022-04-11 12:55:33,703-Speed 145.92 samples/sec Loss 8.8143 LearningRate 0.0807 Epoch: 2 Global Step: 34010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:55:34,760-Speed 9694.02 samples/sec Loss 8.7910 LearningRate 0.0807 Epoch: 2 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:35,825-Speed 9622.24 samples/sec Loss 8.7506 LearningRate 0.0807 Epoch: 2 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:36,884-Speed 9673.38 samples/sec Loss 8.7495 LearningRate 0.0806 Epoch: 2 Global Step: 34040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:37,918-Speed 9908.64 samples/sec Loss 8.7269 LearningRate 0.0806 Epoch: 2 Global Step: 34050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:55:38,946-Speed 9965.11 samples/sec Loss 8.8047 LearningRate 0.0806 Epoch: 2 Global Step: 34060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:55:40,008-Speed 9659.38 samples/sec Loss 8.7177 LearningRate 0.0806 Epoch: 2 Global Step: 34070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:55:41,059-Speed 9745.48 samples/sec Loss 8.6673 LearningRate 0.0806 Epoch: 2 Global Step: 34080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:55:42,138-Speed 9495.77 samples/sec Loss 8.8057 LearningRate 0.0806 Epoch: 2 Global Step: 34090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:55:43,178-Speed 9855.58 samples/sec Loss 8.7438 LearningRate 0.0806 Epoch: 2 Global Step: 34100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:55:44,249-Speed 9557.97 samples/sec Loss 8.7973 LearningRate 0.0806 Epoch: 2 Global Step: 34110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:55:45,290-Speed 9846.87 samples/sec Loss 8.7788 LearningRate 0.0806 Epoch: 2 Global Step: 34120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:55:46,400-Speed 9227.15 samples/sec Loss 8.6546 LearningRate 0.0806 Epoch: 2 Global Step: 34130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:55:47,456-Speed 9708.85 samples/sec Loss 8.7115 LearningRate 0.0806 Epoch: 2 Global Step: 34140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:55:48,555-Speed 9316.45 samples/sec Loss 8.6035 LearningRate 0.0806 Epoch: 2 Global Step: 34150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:49,652-Speed 9349.03 samples/sec Loss 8.7494 LearningRate 0.0806 Epoch: 2 Global Step: 34160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:50,713-Speed 9654.58 samples/sec Loss 8.8211 LearningRate 0.0806 Epoch: 2 Global Step: 34170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:51,769-Speed 9701.38 samples/sec Loss 8.8355 LearningRate 0.0806 Epoch: 2 Global Step: 34180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:52,824-Speed 9711.92 samples/sec Loss 8.7341 LearningRate 0.0806 Epoch: 2 Global Step: 34190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:53,921-Speed 9334.81 samples/sec Loss 8.8113 LearningRate 0.0806 Epoch: 2 Global Step: 34200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:55,006-Speed 9448.01 samples/sec Loss 8.6526 LearningRate 0.0806 Epoch: 2 Global Step: 34210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:56,108-Speed 9298.49 samples/sec Loss 8.8752 LearningRate 0.0805 Epoch: 2 Global Step: 34220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:57,228-Speed 9144.56 samples/sec Loss 8.7981 LearningRate 0.0805 Epoch: 2 Global Step: 34230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:58,318-Speed 9405.45 samples/sec Loss 8.7572 LearningRate 0.0805 Epoch: 2 Global Step: 34240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:55:59,428-Speed 9232.51 samples/sec Loss 8.8518 LearningRate 0.0805 Epoch: 2 Global Step: 34250 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:00,486-Speed 9683.92 samples/sec Loss 8.8035 LearningRate 0.0805 Epoch: 2 Global Step: 34260 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:01,510-Speed 10004.95 samples/sec Loss 8.7139 LearningRate 0.0805 Epoch: 2 Global Step: 34270 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:02,545-Speed 9897.97 samples/sec Loss 8.9207 LearningRate 0.0805 Epoch: 2 Global Step: 34280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:03,597-Speed 9744.58 samples/sec Loss 8.7473 LearningRate 0.0805 Epoch: 2 Global Step: 34290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:04,663-Speed 9608.62 samples/sec Loss 8.7117 LearningRate 0.0805 Epoch: 2 Global Step: 34300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:05,734-Speed 9565.87 samples/sec Loss 8.8715 LearningRate 0.0805 Epoch: 2 Global Step: 34310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:06,808-Speed 9538.69 samples/sec Loss 8.7832 LearningRate 0.0805 Epoch: 2 Global Step: 34320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:07,863-Speed 9709.96 samples/sec Loss 8.5845 LearningRate 0.0805 Epoch: 2 Global Step: 34330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:08,958-Speed 9360.30 samples/sec Loss 8.8952 LearningRate 0.0805 Epoch: 2 Global Step: 34340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:10,035-Speed 9512.86 samples/sec Loss 8.8102 LearningRate 0.0805 Epoch: 2 Global Step: 34350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:11,126-Speed 9390.12 samples/sec Loss 8.7245 LearningRate 0.0805 Epoch: 2 Global Step: 34360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:12,192-Speed 9612.67 samples/sec Loss 8.6812 LearningRate 0.0805 Epoch: 2 Global Step: 34370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:13,237-Speed 9804.25 samples/sec Loss 8.7611 LearningRate 0.0805 Epoch: 2 Global Step: 34380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:14,303-Speed 9617.37 samples/sec Loss 8.8646 LearningRate 0.0805 Epoch: 2 Global Step: 34390 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:15,400-Speed 9333.52 samples/sec Loss 8.8751 LearningRate 0.0805 Epoch: 2 Global Step: 34400 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:16,488-Speed 9421.50 samples/sec Loss 8.8845 LearningRate 0.0804 Epoch: 2 Global Step: 34410 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:17,618-Speed 9065.08 samples/sec Loss 8.9081 LearningRate 0.0804 Epoch: 2 Global Step: 34420 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:18,717-Speed 9321.59 samples/sec Loss 8.7776 LearningRate 0.0804 Epoch: 2 Global Step: 34430 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:19,818-Speed 9312.45 samples/sec Loss 8.8775 LearningRate 0.0804 Epoch: 2 Global Step: 34440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:20,859-Speed 9845.47 samples/sec Loss 8.9185 LearningRate 0.0804 Epoch: 2 Global Step: 34450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:21,914-Speed 9712.38 samples/sec Loss 8.8180 LearningRate 0.0804 Epoch: 2 Global Step: 34460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:23,002-Speed 9412.47 samples/sec Loss 8.8103 LearningRate 0.0804 Epoch: 2 Global Step: 34470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:24,045-Speed 9830.14 samples/sec Loss 8.9768 LearningRate 0.0804 Epoch: 2 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:25,076-Speed 9931.56 samples/sec Loss 8.9270 LearningRate 0.0804 Epoch: 2 Global Step: 34490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:26,125-Speed 9769.52 samples/sec Loss 8.8499 LearningRate 0.0804 Epoch: 2 Global Step: 34500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:27,186-Speed 9658.10 samples/sec Loss 8.8239 LearningRate 0.0804 Epoch: 2 Global Step: 34510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:28,246-Speed 9665.27 samples/sec Loss 8.9536 LearningRate 0.0804 Epoch: 2 Global Step: 34520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:29,310-Speed 9631.96 samples/sec Loss 8.8421 LearningRate 0.0804 Epoch: 2 Global Step: 34530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:30,370-Speed 9664.66 samples/sec Loss 8.8632 LearningRate 0.0804 Epoch: 2 Global Step: 34540 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:31,427-Speed 9690.49 samples/sec Loss 8.8778 LearningRate 0.0804 Epoch: 2 Global Step: 34550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:32,477-Speed 9761.50 samples/sec Loss 8.8343 LearningRate 0.0804 Epoch: 2 Global Step: 34560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:33,543-Speed 9611.57 samples/sec Loss 9.0016 LearningRate 0.0804 Epoch: 2 Global Step: 34570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:34,630-Speed 9427.92 samples/sec Loss 8.7997 LearningRate 0.0804 Epoch: 2 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:35,735-Speed 9266.25 samples/sec Loss 8.9755 LearningRate 0.0803 Epoch: 2 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:36,822-Speed 9431.18 samples/sec Loss 8.8979 LearningRate 0.0803 Epoch: 2 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:37,887-Speed 9621.19 samples/sec Loss 8.8267 LearningRate 0.0803 Epoch: 2 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:38,951-Speed 9631.64 samples/sec Loss 8.8826 LearningRate 0.0803 Epoch: 2 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:39,997-Speed 9795.05 samples/sec Loss 8.8029 LearningRate 0.0803 Epoch: 2 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:41,044-Speed 9787.45 samples/sec Loss 8.8471 LearningRate 0.0803 Epoch: 2 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:42,081-Speed 9883.08 samples/sec Loss 8.8569 LearningRate 0.0803 Epoch: 2 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:43,198-Speed 9172.20 samples/sec Loss 8.7394 LearningRate 0.0803 Epoch: 2 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:44,262-Speed 9624.03 samples/sec Loss 8.9160 LearningRate 0.0803 Epoch: 2 Global Step: 34670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:45,327-Speed 9628.08 samples/sec Loss 8.8742 LearningRate 0.0803 Epoch: 2 Global Step: 34680 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:46,423-Speed 9345.33 samples/sec Loss 8.8375 LearningRate 0.0803 Epoch: 2 Global Step: 34690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:56:47,466-Speed 9820.91 samples/sec Loss 8.7929 LearningRate 0.0803 Epoch: 2 Global Step: 34700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:48,519-Speed 9732.14 samples/sec Loss 8.9416 LearningRate 0.0803 Epoch: 2 Global Step: 34710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:49,584-Speed 9615.73 samples/sec Loss 8.8926 LearningRate 0.0803 Epoch: 2 Global Step: 34720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:50,685-Speed 9315.68 samples/sec Loss 8.9595 LearningRate 0.0803 Epoch: 2 Global Step: 34730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:51,771-Speed 9436.01 samples/sec Loss 8.8756 LearningRate 0.0803 Epoch: 2 Global Step: 34740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:52,859-Speed 9409.65 samples/sec Loss 8.9481 LearningRate 0.0803 Epoch: 2 Global Step: 34750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:56:53,909-Speed 9762.34 samples/sec Loss 8.9176 LearningRate 0.0803 Epoch: 2 Global Step: 34760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:56:54,979-Speed 9570.47 samples/sec Loss 8.9118 LearningRate 0.0803 Epoch: 2 Global Step: 34770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:56:56,043-Speed 9637.21 samples/sec Loss 8.8941 LearningRate 0.0802 Epoch: 2 Global Step: 34780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:56:57,136-Speed 9371.91 samples/sec Loss 8.9109 LearningRate 0.0802 Epoch: 2 Global Step: 34790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:56:58,218-Speed 9478.83 samples/sec Loss 8.9331 LearningRate 0.0802 Epoch: 2 Global Step: 34800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:56:59,255-Speed 9878.01 samples/sec Loss 8.9183 LearningRate 0.0802 Epoch: 2 Global Step: 34810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:00,330-Speed 9525.42 samples/sec Loss 8.8684 LearningRate 0.0802 Epoch: 2 Global Step: 34820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:01,443-Speed 9211.19 samples/sec Loss 8.7919 LearningRate 0.0802 Epoch: 2 Global Step: 34830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:02,509-Speed 9606.79 samples/sec Loss 8.9931 LearningRate 0.0802 Epoch: 2 Global Step: 34840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:03,586-Speed 9514.18 samples/sec Loss 8.9182 LearningRate 0.0802 Epoch: 2 Global Step: 34850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:04,645-Speed 9673.76 samples/sec Loss 9.0019 LearningRate 0.0802 Epoch: 2 Global Step: 34860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:05,693-Speed 9774.59 samples/sec Loss 8.9758 LearningRate 0.0802 Epoch: 2 Global Step: 34870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:06,724-Speed 9940.70 samples/sec Loss 8.9437 LearningRate 0.0802 Epoch: 2 Global Step: 34880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:07,779-Speed 9709.71 samples/sec Loss 8.8150 LearningRate 0.0802 Epoch: 2 Global Step: 34890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:08,827-Speed 9782.74 samples/sec Loss 8.7662 LearningRate 0.0802 Epoch: 2 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:09,910-Speed 9457.40 samples/sec Loss 8.8368 LearningRate 0.0802 Epoch: 2 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:11,007-Speed 9341.66 samples/sec Loss 8.8876 LearningRate 0.0802 Epoch: 2 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:12,088-Speed 9477.76 samples/sec Loss 8.9375 LearningRate 0.0802 Epoch: 2 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:13,154-Speed 9613.19 samples/sec Loss 8.8742 LearningRate 0.0802 Epoch: 2 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:14,212-Speed 9676.92 samples/sec Loss 8.7999 LearningRate 0.0802 Epoch: 2 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:15,301-Speed 9415.08 samples/sec Loss 8.8954 LearningRate 0.0802 Epoch: 2 Global Step: 34960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:57:16,373-Speed 9553.83 samples/sec Loss 9.0614 LearningRate 0.0801 Epoch: 2 Global Step: 34970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:57:17,440-Speed 9607.57 samples/sec Loss 8.9927 LearningRate 0.0801 Epoch: 2 Global Step: 34980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:57:18,562-Speed 9135.16 samples/sec Loss 8.8527 LearningRate 0.0801 Epoch: 2 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:19,628-Speed 9606.86 samples/sec Loss 8.9622 LearningRate 0.0801 Epoch: 2 Global Step: 35000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:20,716-Speed 9421.57 samples/sec Loss 8.8350 LearningRate 0.0801 Epoch: 2 Global Step: 35010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:21,777-Speed 9662.39 samples/sec Loss 8.9390 LearningRate 0.0801 Epoch: 2 Global Step: 35020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:22,851-Speed 9534.93 samples/sec Loss 8.8514 LearningRate 0.0801 Epoch: 2 Global Step: 35030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:23,978-Speed 9096.57 samples/sec Loss 8.8750 LearningRate 0.0801 Epoch: 2 Global Step: 35040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:25,041-Speed 9641.14 samples/sec Loss 8.9435 LearningRate 0.0801 Epoch: 2 Global Step: 35050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:26,133-Speed 9381.50 samples/sec Loss 8.8721 LearningRate 0.0801 Epoch: 2 Global Step: 35060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:27,220-Speed 9425.57 samples/sec Loss 8.9568 LearningRate 0.0801 Epoch: 2 Global Step: 35070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:28,288-Speed 9591.81 samples/sec Loss 8.9450 LearningRate 0.0801 Epoch: 2 Global Step: 35080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:29,350-Speed 9645.29 samples/sec Loss 8.8819 LearningRate 0.0801 Epoch: 2 Global Step: 35090 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:57:30,420-Speed 9575.14 samples/sec Loss 8.9730 LearningRate 0.0801 Epoch: 2 Global Step: 35100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:57:31,509-Speed 9412.62 samples/sec Loss 8.9795 LearningRate 0.0801 Epoch: 2 Global Step: 35110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:32,611-Speed 9292.43 samples/sec Loss 8.9402 LearningRate 0.0801 Epoch: 2 Global Step: 35120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:33,672-Speed 9662.15 samples/sec Loss 8.9560 LearningRate 0.0801 Epoch: 2 Global Step: 35130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:34,747-Speed 9534.96 samples/sec Loss 8.9472 LearningRate 0.0801 Epoch: 2 Global Step: 35140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:35,788-Speed 9837.75 samples/sec Loss 8.9536 LearningRate 0.0800 Epoch: 2 Global Step: 35150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:36,871-Speed 9458.69 samples/sec Loss 8.9141 LearningRate 0.0800 Epoch: 2 Global Step: 35160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:37,938-Speed 9605.95 samples/sec Loss 9.0176 LearningRate 0.0800 Epoch: 2 Global Step: 35170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:39,015-Speed 9516.45 samples/sec Loss 9.0068 LearningRate 0.0800 Epoch: 2 Global Step: 35180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:40,061-Speed 9793.54 samples/sec Loss 8.8076 LearningRate 0.0800 Epoch: 2 Global Step: 35190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:41,141-Speed 9490.93 samples/sec Loss 8.8944 LearningRate 0.0800 Epoch: 2 Global Step: 35200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:42,181-Speed 9846.19 samples/sec Loss 8.7943 LearningRate 0.0800 Epoch: 2 Global Step: 35210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:43,229-Speed 9775.83 samples/sec Loss 8.8213 LearningRate 0.0800 Epoch: 2 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:44,283-Speed 9724.42 samples/sec Loss 8.7849 LearningRate 0.0800 Epoch: 2 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:45,339-Speed 9705.27 samples/sec Loss 8.8837 LearningRate 0.0800 Epoch: 2 Global Step: 35240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:46,379-Speed 9849.21 samples/sec Loss 9.0079 LearningRate 0.0800 Epoch: 2 Global Step: 35250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:47,412-Speed 9919.36 samples/sec Loss 8.9085 LearningRate 0.0800 Epoch: 2 Global Step: 35260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:48,533-Speed 9141.80 samples/sec Loss 8.9345 LearningRate 0.0800 Epoch: 2 Global Step: 35270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:49,586-Speed 9723.28 samples/sec Loss 8.9135 LearningRate 0.0800 Epoch: 2 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:50,691-Speed 9278.84 samples/sec Loss 8.9163 LearningRate 0.0800 Epoch: 2 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 12:57:51,745-Speed 9718.78 samples/sec Loss 8.8876 LearningRate 0.0800 Epoch: 2 Global Step: 35300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:52,822-Speed 9513.98 samples/sec Loss 8.9251 LearningRate 0.0800 Epoch: 2 Global Step: 35310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:53,861-Speed 9860.95 samples/sec Loss 8.9330 LearningRate 0.0800 Epoch: 2 Global Step: 35320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:54,942-Speed 9480.87 samples/sec Loss 8.9122 LearningRate 0.0800 Epoch: 2 Global Step: 35330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:55,996-Speed 9724.07 samples/sec Loss 9.0874 LearningRate 0.0799 Epoch: 2 Global Step: 35340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:57,104-Speed 9244.44 samples/sec Loss 8.8894 LearningRate 0.0799 Epoch: 2 Global Step: 35350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:58,199-Speed 9362.26 samples/sec Loss 9.0026 LearningRate 0.0799 Epoch: 2 Global Step: 35360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:57:59,303-Speed 9273.99 samples/sec Loss 8.8781 LearningRate 0.0799 Epoch: 2 Global Step: 35370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:00,405-Speed 9301.07 samples/sec Loss 9.0783 LearningRate 0.0799 Epoch: 2 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:01,463-Speed 9685.76 samples/sec Loss 8.9445 LearningRate 0.0799 Epoch: 2 Global Step: 35390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:02,516-Speed 9729.78 samples/sec Loss 8.9639 LearningRate 0.0799 Epoch: 2 Global Step: 35400 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:03,573-Speed 9690.12 samples/sec Loss 8.9445 LearningRate 0.0799 Epoch: 2 Global Step: 35410 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:04,670-Speed 9339.30 samples/sec Loss 8.9948 LearningRate 0.0799 Epoch: 2 Global Step: 35420 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:05,734-Speed 9634.05 samples/sec Loss 8.9486 LearningRate 0.0799 Epoch: 2 Global Step: 35430 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:06,819-Speed 9439.85 samples/sec Loss 9.0189 LearningRate 0.0799 Epoch: 2 Global Step: 35440 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:07,857-Speed 9872.07 samples/sec Loss 8.9522 LearningRate 0.0799 Epoch: 2 Global Step: 35450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:08,912-Speed 9709.47 samples/sec Loss 9.0246 LearningRate 0.0799 Epoch: 2 Global Step: 35460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:10,014-Speed 9295.95 samples/sec Loss 8.9203 LearningRate 0.0799 Epoch: 2 Global Step: 35470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:11,070-Speed 9710.27 samples/sec Loss 9.0261 LearningRate 0.0799 Epoch: 2 Global Step: 35480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:12,159-Speed 9405.14 samples/sec Loss 9.1103 LearningRate 0.0799 Epoch: 2 Global Step: 35490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:13,255-Speed 9353.38 samples/sec Loss 9.0426 LearningRate 0.0799 Epoch: 2 Global Step: 35500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:14,324-Speed 9585.45 samples/sec Loss 9.1036 LearningRate 0.0799 Epoch: 2 Global Step: 35510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:15,366-Speed 9832.43 samples/sec Loss 9.0563 LearningRate 0.0799 Epoch: 2 Global Step: 35520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:16,429-Speed 9639.98 samples/sec Loss 8.9742 LearningRate 0.0798 Epoch: 2 Global Step: 35530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:17,525-Speed 9346.21 samples/sec Loss 9.0255 LearningRate 0.0798 Epoch: 2 Global Step: 35540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:18,547-Speed 10021.12 samples/sec Loss 9.0409 LearningRate 0.0798 Epoch: 2 Global Step: 35550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:19,609-Speed 9652.72 samples/sec Loss 9.1184 LearningRate 0.0798 Epoch: 2 Global Step: 35560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:20,686-Speed 9511.83 samples/sec Loss 8.8920 LearningRate 0.0798 Epoch: 2 Global Step: 35570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:21,750-Speed 9630.43 samples/sec Loss 9.1130 LearningRate 0.0798 Epoch: 2 Global Step: 35580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:22,835-Speed 9443.09 samples/sec Loss 8.9935 LearningRate 0.0798 Epoch: 2 Global Step: 35590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:23,918-Speed 9461.90 samples/sec Loss 8.9953 LearningRate 0.0798 Epoch: 2 Global Step: 35600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:25,012-Speed 9367.85 samples/sec Loss 8.9744 LearningRate 0.0798 Epoch: 2 Global Step: 35610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:26,057-Speed 9801.88 samples/sec Loss 8.9579 LearningRate 0.0798 Epoch: 2 Global Step: 35620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:27,108-Speed 9751.61 samples/sec Loss 8.8666 LearningRate 0.0798 Epoch: 2 Global Step: 35630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:28,173-Speed 9625.69 samples/sec Loss 8.8815 LearningRate 0.0798 Epoch: 2 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:29,279-Speed 9259.95 samples/sec Loss 9.0195 LearningRate 0.0798 Epoch: 2 Global Step: 35650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:30,347-Speed 9597.70 samples/sec Loss 9.1255 LearningRate 0.0798 Epoch: 2 Global Step: 35660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:31,469-Speed 9129.05 samples/sec Loss 8.9756 LearningRate 0.0798 Epoch: 2 Global Step: 35670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:32,521-Speed 9744.38 samples/sec Loss 8.9217 LearningRate 0.0798 Epoch: 2 Global Step: 35680 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:33,595-Speed 9538.10 samples/sec Loss 8.9267 LearningRate 0.0798 Epoch: 2 Global Step: 35690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:34,634-Speed 9858.48 samples/sec Loss 9.1153 LearningRate 0.0798 Epoch: 2 Global Step: 35700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:35,711-Speed 9516.49 samples/sec Loss 8.8875 LearningRate 0.0797 Epoch: 2 Global Step: 35710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:36,773-Speed 9649.83 samples/sec Loss 8.8996 LearningRate 0.0797 Epoch: 2 Global Step: 35720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:37,882-Speed 9233.60 samples/sec Loss 8.9421 LearningRate 0.0797 Epoch: 2 Global Step: 35730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:38,954-Speed 9558.73 samples/sec Loss 9.0324 LearningRate 0.0797 Epoch: 2 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:40,016-Speed 9654.24 samples/sec Loss 8.9048 LearningRate 0.0797 Epoch: 2 Global Step: 35750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:41,066-Speed 9754.08 samples/sec Loss 8.9650 LearningRate 0.0797 Epoch: 2 Global Step: 35760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:42,165-Speed 9325.69 samples/sec Loss 9.0279 LearningRate 0.0797 Epoch: 2 Global Step: 35770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:43,212-Speed 9787.10 samples/sec Loss 8.9117 LearningRate 0.0797 Epoch: 2 Global Step: 35780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:44,267-Speed 9711.07 samples/sec Loss 8.9128 LearningRate 0.0797 Epoch: 2 Global Step: 35790 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:45,320-Speed 9727.55 samples/sec Loss 8.9128 LearningRate 0.0797 Epoch: 2 Global Step: 35800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:46,368-Speed 9784.65 samples/sec Loss 9.0628 LearningRate 0.0797 Epoch: 2 Global Step: 35810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:47,453-Speed 9440.23 samples/sec Loss 8.9478 LearningRate 0.0797 Epoch: 2 Global Step: 35820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:48,530-Speed 9513.07 samples/sec Loss 8.9365 LearningRate 0.0797 Epoch: 2 Global Step: 35830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:58:49,607-Speed 9517.85 samples/sec Loss 8.9108 LearningRate 0.0797 Epoch: 2 Global Step: 35840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:50,712-Speed 9273.86 samples/sec Loss 8.8955 LearningRate 0.0797 Epoch: 2 Global Step: 35850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:51,773-Speed 9651.90 samples/sec Loss 9.0093 LearningRate 0.0797 Epoch: 2 Global Step: 35860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:52,844-Speed 9567.90 samples/sec Loss 8.9248 LearningRate 0.0797 Epoch: 2 Global Step: 35870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:53,911-Speed 9605.64 samples/sec Loss 8.9815 LearningRate 0.0797 Epoch: 2 Global Step: 35880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:55,043-Speed 9048.87 samples/sec Loss 9.0243 LearningRate 0.0797 Epoch: 2 Global Step: 35890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:56,134-Speed 9392.23 samples/sec Loss 8.9424 LearningRate 0.0796 Epoch: 2 Global Step: 35900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:57,202-Speed 9596.95 samples/sec Loss 8.9432 LearningRate 0.0796 Epoch: 2 Global Step: 35910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:58,280-Speed 9502.81 samples/sec Loss 8.9727 LearningRate 0.0796 Epoch: 2 Global Step: 35920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:58:59,368-Speed 9413.12 samples/sec Loss 9.0301 LearningRate 0.0796 Epoch: 2 Global Step: 35930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 12:59:00,471-Speed 9292.11 samples/sec Loss 9.0853 LearningRate 0.0796 Epoch: 2 Global Step: 35940 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:59:01,587-Speed 9184.89 samples/sec Loss 8.8599 LearningRate 0.0796 Epoch: 2 Global Step: 35950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:59:02,657-Speed 9575.67 samples/sec Loss 9.0722 LearningRate 0.0796 Epoch: 2 Global Step: 35960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:59:03,768-Speed 9221.04 samples/sec Loss 8.9520 LearningRate 0.0796 Epoch: 2 Global Step: 35970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:59:04,886-Speed 9170.32 samples/sec Loss 8.9953 LearningRate 0.0796 Epoch: 2 Global Step: 35980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:59:05,956-Speed 9572.35 samples/sec Loss 9.0154 LearningRate 0.0796 Epoch: 2 Global Step: 35990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:59:07,017-Speed 9653.93 samples/sec Loss 9.0127 LearningRate 0.0796 Epoch: 2 Global Step: 36000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 12:59:28,802-[lfw][36000]XNorm: 13.217900 Training: 2022-04-11 12:59:28,802-[lfw][36000]Accuracy-Flip: 0.99233+-0.00423 Training: 2022-04-11 12:59:28,803-[lfw][36000]Accuracy-Highest: 0.99533 Training: 2022-04-11 12:59:54,051-[cfp_fp][36000]XNorm: 11.195447 Training: 2022-04-11 12:59:54,052-[cfp_fp][36000]Accuracy-Flip: 0.93986+-0.01076 Training: 2022-04-11 12:59:54,052-[cfp_fp][36000]Accuracy-Highest: 0.93986 Training: 2022-04-11 13:00:15,821-[agedb_30][36000]XNorm: 12.856884 Training: 2022-04-11 13:00:15,822-[agedb_30][36000]Accuracy-Flip: 0.94950+-0.01108 Training: 2022-04-11 13:00:15,822-[agedb_30][36000]Accuracy-Highest: 0.94950 Training: 2022-04-11 13:00:16,892-Speed 146.55 samples/sec Loss 9.0474 LearningRate 0.0796 Epoch: 2 Global Step: 36010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:17,953-Speed 9659.55 samples/sec Loss 8.9576 LearningRate 0.0796 Epoch: 2 Global Step: 36020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:19,042-Speed 9402.02 samples/sec Loss 8.9291 LearningRate 0.0796 Epoch: 2 Global Step: 36030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:20,138-Speed 9353.44 samples/sec Loss 9.0817 LearningRate 0.0796 Epoch: 2 Global Step: 36040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:21,218-Speed 9490.42 samples/sec Loss 8.9079 LearningRate 0.0796 Epoch: 2 Global Step: 36050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:22,296-Speed 9504.37 samples/sec Loss 8.8977 LearningRate 0.0796 Epoch: 2 Global Step: 36060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:23,352-Speed 9708.82 samples/sec Loss 9.0590 LearningRate 0.0796 Epoch: 2 Global Step: 36070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:24,416-Speed 9628.85 samples/sec Loss 8.9535 LearningRate 0.0796 Epoch: 2 Global Step: 36080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:25,476-Speed 9661.30 samples/sec Loss 9.1140 LearningRate 0.0795 Epoch: 2 Global Step: 36090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:26,534-Speed 9688.12 samples/sec Loss 8.9867 LearningRate 0.0795 Epoch: 2 Global Step: 36100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:27,577-Speed 9819.66 samples/sec Loss 8.9759 LearningRate 0.0795 Epoch: 2 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:28,625-Speed 9779.17 samples/sec Loss 8.9939 LearningRate 0.0795 Epoch: 2 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:29,712-Speed 9421.82 samples/sec Loss 9.0746 LearningRate 0.0795 Epoch: 2 Global Step: 36130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:30,785-Speed 9550.59 samples/sec Loss 9.0484 LearningRate 0.0795 Epoch: 2 Global Step: 36140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:31,859-Speed 9546.50 samples/sec Loss 8.9638 LearningRate 0.0795 Epoch: 2 Global Step: 36150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:32,948-Speed 9400.98 samples/sec Loss 8.9858 LearningRate 0.0795 Epoch: 2 Global Step: 36160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:34,049-Speed 9310.65 samples/sec Loss 9.0294 LearningRate 0.0795 Epoch: 2 Global Step: 36170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:35,129-Speed 9486.40 samples/sec Loss 9.0535 LearningRate 0.0795 Epoch: 2 Global Step: 36180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:36,191-Speed 9642.74 samples/sec Loss 9.0385 LearningRate 0.0795 Epoch: 2 Global Step: 36190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:37,280-Speed 9413.23 samples/sec Loss 8.9956 LearningRate 0.0795 Epoch: 2 Global Step: 36200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:38,387-Speed 9250.33 samples/sec Loss 9.0360 LearningRate 0.0795 Epoch: 2 Global Step: 36210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:39,487-Speed 9321.59 samples/sec Loss 9.1179 LearningRate 0.0795 Epoch: 2 Global Step: 36220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:40,592-Speed 9272.37 samples/sec Loss 9.0112 LearningRate 0.0795 Epoch: 2 Global Step: 36230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:41,696-Speed 9279.43 samples/sec Loss 8.9221 LearningRate 0.0795 Epoch: 2 Global Step: 36240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:42,788-Speed 9377.72 samples/sec Loss 9.1400 LearningRate 0.0795 Epoch: 2 Global Step: 36250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:43,905-Speed 9174.35 samples/sec Loss 9.0618 LearningRate 0.0795 Epoch: 2 Global Step: 36260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:44,979-Speed 9536.91 samples/sec Loss 9.0623 LearningRate 0.0795 Epoch: 2 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:46,046-Speed 9607.97 samples/sec Loss 9.0125 LearningRate 0.0794 Epoch: 2 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:47,126-Speed 9483.98 samples/sec Loss 8.9811 LearningRate 0.0794 Epoch: 2 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:48,185-Speed 9675.59 samples/sec Loss 9.0893 LearningRate 0.0794 Epoch: 2 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:49,301-Speed 9187.76 samples/sec Loss 9.0063 LearningRate 0.0794 Epoch: 2 Global Step: 36310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:50,369-Speed 9591.61 samples/sec Loss 9.0084 LearningRate 0.0794 Epoch: 2 Global Step: 36320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:51,501-Speed 9053.02 samples/sec Loss 8.9519 LearningRate 0.0794 Epoch: 2 Global Step: 36330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:52,563-Speed 9648.84 samples/sec Loss 9.0453 LearningRate 0.0794 Epoch: 2 Global Step: 36340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:53,637-Speed 9541.29 samples/sec Loss 9.0336 LearningRate 0.0794 Epoch: 2 Global Step: 36350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:54,741-Speed 9280.96 samples/sec Loss 8.9602 LearningRate 0.0794 Epoch: 2 Global Step: 36360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:55,813-Speed 9552.63 samples/sec Loss 9.0198 LearningRate 0.0794 Epoch: 2 Global Step: 36370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:56,894-Speed 9483.00 samples/sec Loss 8.9406 LearningRate 0.0794 Epoch: 2 Global Step: 36380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:57,944-Speed 9762.22 samples/sec Loss 8.8988 LearningRate 0.0794 Epoch: 2 Global Step: 36390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:00:59,082-Speed 8999.13 samples/sec Loss 9.0030 LearningRate 0.0794 Epoch: 2 Global Step: 36400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:00,180-Speed 9330.04 samples/sec Loss 9.0084 LearningRate 0.0794 Epoch: 2 Global Step: 36410 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:01:01,279-Speed 9329.15 samples/sec Loss 9.0028 LearningRate 0.0794 Epoch: 2 Global Step: 36420 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:01:02,339-Speed 9662.98 samples/sec Loss 8.9342 LearningRate 0.0794 Epoch: 2 Global Step: 36430 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:01:03,402-Speed 9633.76 samples/sec Loss 9.0220 LearningRate 0.0794 Epoch: 2 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:04,507-Speed 9277.65 samples/sec Loss 8.9949 LearningRate 0.0794 Epoch: 2 Global Step: 36450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:05,530-Speed 10009.92 samples/sec Loss 8.9402 LearningRate 0.0793 Epoch: 2 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:06,610-Speed 9491.02 samples/sec Loss 9.0547 LearningRate 0.0793 Epoch: 2 Global Step: 36470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:07,716-Speed 9268.28 samples/sec Loss 8.9632 LearningRate 0.0793 Epoch: 2 Global Step: 36480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:08,806-Speed 9397.54 samples/sec Loss 9.0883 LearningRate 0.0793 Epoch: 2 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:09,846-Speed 9849.54 samples/sec Loss 9.0055 LearningRate 0.0793 Epoch: 2 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:10,896-Speed 9764.15 samples/sec Loss 9.0475 LearningRate 0.0793 Epoch: 2 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:11,973-Speed 9507.03 samples/sec Loss 8.8101 LearningRate 0.0793 Epoch: 2 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:13,061-Speed 9417.06 samples/sec Loss 8.9322 LearningRate 0.0793 Epoch: 2 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:14,152-Speed 9394.13 samples/sec Loss 9.0557 LearningRate 0.0793 Epoch: 2 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:15,234-Speed 9465.90 samples/sec Loss 8.8593 LearningRate 0.0793 Epoch: 2 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:16,311-Speed 9514.86 samples/sec Loss 9.1445 LearningRate 0.0793 Epoch: 2 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:17,434-Speed 9119.24 samples/sec Loss 8.9231 LearningRate 0.0793 Epoch: 2 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:18,510-Speed 9528.07 samples/sec Loss 8.9645 LearningRate 0.0793 Epoch: 2 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:19,559-Speed 9766.10 samples/sec Loss 8.9154 LearningRate 0.0793 Epoch: 2 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:20,653-Speed 9364.00 samples/sec Loss 9.0256 LearningRate 0.0793 Epoch: 2 Global Step: 36600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:21,775-Speed 9136.67 samples/sec Loss 9.1624 LearningRate 0.0793 Epoch: 2 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:22,890-Speed 9193.07 samples/sec Loss 9.0058 LearningRate 0.0793 Epoch: 2 Global Step: 36620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:24,249-Speed 7539.13 samples/sec Loss 9.0002 LearningRate 0.0793 Epoch: 2 Global Step: 36630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:25,316-Speed 9610.63 samples/sec Loss 9.0874 LearningRate 0.0793 Epoch: 2 Global Step: 36640 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:01:26,407-Speed 9385.68 samples/sec Loss 9.0635 LearningRate 0.0792 Epoch: 2 Global Step: 36650 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:01:27,509-Speed 9303.24 samples/sec Loss 9.0252 LearningRate 0.0792 Epoch: 2 Global Step: 36660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:28,579-Speed 9583.62 samples/sec Loss 9.1095 LearningRate 0.0792 Epoch: 2 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:29,674-Speed 9350.01 samples/sec Loss 9.0492 LearningRate 0.0792 Epoch: 2 Global Step: 36680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:30,772-Speed 9334.21 samples/sec Loss 8.8967 LearningRate 0.0792 Epoch: 2 Global Step: 36690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:31,839-Speed 9598.23 samples/sec Loss 9.0981 LearningRate 0.0792 Epoch: 2 Global Step: 36700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:32,914-Speed 9532.46 samples/sec Loss 8.9801 LearningRate 0.0792 Epoch: 2 Global Step: 36710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:33,998-Speed 9454.58 samples/sec Loss 9.1562 LearningRate 0.0792 Epoch: 2 Global Step: 36720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:35,086-Speed 9412.64 samples/sec Loss 9.0839 LearningRate 0.0792 Epoch: 2 Global Step: 36730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:36,165-Speed 9500.91 samples/sec Loss 9.0680 LearningRate 0.0792 Epoch: 2 Global Step: 36740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:37,232-Speed 9598.40 samples/sec Loss 9.0636 LearningRate 0.0792 Epoch: 2 Global Step: 36750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:38,353-Speed 9139.24 samples/sec Loss 9.0248 LearningRate 0.0792 Epoch: 2 Global Step: 36760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:39,433-Speed 9499.92 samples/sec Loss 9.0367 LearningRate 0.0792 Epoch: 2 Global Step: 36770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:40,570-Speed 9013.23 samples/sec Loss 9.0454 LearningRate 0.0792 Epoch: 2 Global Step: 36780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:41,697-Speed 9093.05 samples/sec Loss 9.1089 LearningRate 0.0792 Epoch: 2 Global Step: 36790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:42,783-Speed 9442.56 samples/sec Loss 8.9476 LearningRate 0.0792 Epoch: 2 Global Step: 36800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:43,830-Speed 9787.59 samples/sec Loss 9.1052 LearningRate 0.0792 Epoch: 2 Global Step: 36810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:44,892-Speed 9651.84 samples/sec Loss 9.0299 LearningRate 0.0792 Epoch: 2 Global Step: 36820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:45,910-Speed 10057.78 samples/sec Loss 8.9701 LearningRate 0.0792 Epoch: 2 Global Step: 36830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:46,950-Speed 9858.24 samples/sec Loss 9.0443 LearningRate 0.0791 Epoch: 2 Global Step: 36840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:47,987-Speed 9879.12 samples/sec Loss 8.9434 LearningRate 0.0791 Epoch: 2 Global Step: 36850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:49,032-Speed 9799.97 samples/sec Loss 9.0296 LearningRate 0.0791 Epoch: 2 Global Step: 36860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:01:50,088-Speed 9706.80 samples/sec Loss 9.0074 LearningRate 0.0791 Epoch: 2 Global Step: 36870 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:01:51,146-Speed 9689.30 samples/sec Loss 9.1142 LearningRate 0.0791 Epoch: 2 Global Step: 36880 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:01:52,223-Speed 9514.48 samples/sec Loss 9.0844 LearningRate 0.0791 Epoch: 2 Global Step: 36890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:53,275-Speed 9734.00 samples/sec Loss 9.0061 LearningRate 0.0791 Epoch: 2 Global Step: 36900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:54,348-Speed 9554.13 samples/sec Loss 9.0077 LearningRate 0.0791 Epoch: 2 Global Step: 36910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:55,374-Speed 9979.93 samples/sec Loss 9.1047 LearningRate 0.0791 Epoch: 2 Global Step: 36920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:56,452-Speed 9505.76 samples/sec Loss 8.9723 LearningRate 0.0791 Epoch: 2 Global Step: 36930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:57,519-Speed 9604.70 samples/sec Loss 9.0433 LearningRate 0.0791 Epoch: 2 Global Step: 36940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:58,586-Speed 9604.37 samples/sec Loss 9.0893 LearningRate 0.0791 Epoch: 2 Global Step: 36950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:01:59,646-Speed 9666.39 samples/sec Loss 9.0271 LearningRate 0.0791 Epoch: 2 Global Step: 36960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:00,677-Speed 9932.89 samples/sec Loss 8.8995 LearningRate 0.0791 Epoch: 2 Global Step: 36970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:01,767-Speed 9399.90 samples/sec Loss 8.9446 LearningRate 0.0791 Epoch: 2 Global Step: 36980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:02,841-Speed 9553.01 samples/sec Loss 9.0390 LearningRate 0.0791 Epoch: 2 Global Step: 36990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:02:03,916-Speed 9527.07 samples/sec Loss 8.9755 LearningRate 0.0791 Epoch: 2 Global Step: 37000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:02:04,975-Speed 9681.85 samples/sec Loss 8.9523 LearningRate 0.0791 Epoch: 2 Global Step: 37010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:02:06,054-Speed 9491.83 samples/sec Loss 9.0587 LearningRate 0.0791 Epoch: 2 Global Step: 37020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:07,119-Speed 9626.04 samples/sec Loss 9.1389 LearningRate 0.0790 Epoch: 2 Global Step: 37030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:08,194-Speed 9526.74 samples/sec Loss 9.0202 LearningRate 0.0790 Epoch: 2 Global Step: 37040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:09,257-Speed 9641.92 samples/sec Loss 8.9856 LearningRate 0.0790 Epoch: 2 Global Step: 37050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:10,410-Speed 8884.77 samples/sec Loss 9.0157 LearningRate 0.0790 Epoch: 2 Global Step: 37060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:11,497-Speed 9430.27 samples/sec Loss 9.0765 LearningRate 0.0790 Epoch: 2 Global Step: 37070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:12,576-Speed 9494.02 samples/sec Loss 9.0667 LearningRate 0.0790 Epoch: 2 Global Step: 37080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:13,649-Speed 9556.64 samples/sec Loss 9.0701 LearningRate 0.0790 Epoch: 2 Global Step: 37090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:14,702-Speed 9725.21 samples/sec Loss 8.9451 LearningRate 0.0790 Epoch: 2 Global Step: 37100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:15,738-Speed 9890.46 samples/sec Loss 8.9523 LearningRate 0.0790 Epoch: 2 Global Step: 37110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:16,783-Speed 9803.64 samples/sec Loss 9.0650 LearningRate 0.0790 Epoch: 2 Global Step: 37120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:17,883-Speed 9314.03 samples/sec Loss 8.9989 LearningRate 0.0790 Epoch: 2 Global Step: 37130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:18,991-Speed 9246.55 samples/sec Loss 9.0423 LearningRate 0.0790 Epoch: 2 Global Step: 37140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:20,081-Speed 9397.57 samples/sec Loss 8.8333 LearningRate 0.0790 Epoch: 2 Global Step: 37150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:21,195-Speed 9201.40 samples/sec Loss 9.0161 LearningRate 0.0790 Epoch: 2 Global Step: 37160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:22,268-Speed 9560.72 samples/sec Loss 9.1452 LearningRate 0.0790 Epoch: 2 Global Step: 37170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:23,337-Speed 9585.96 samples/sec Loss 9.1845 LearningRate 0.0790 Epoch: 2 Global Step: 37180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:24,443-Speed 9264.37 samples/sec Loss 9.0347 LearningRate 0.0790 Epoch: 2 Global Step: 37190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:25,533-Speed 9407.99 samples/sec Loss 9.0508 LearningRate 0.0790 Epoch: 2 Global Step: 37200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:26,643-Speed 9230.80 samples/sec Loss 9.0164 LearningRate 0.0789 Epoch: 2 Global Step: 37210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:27,695-Speed 9735.36 samples/sec Loss 9.0288 LearningRate 0.0789 Epoch: 2 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:28,776-Speed 9484.15 samples/sec Loss 8.9979 LearningRate 0.0789 Epoch: 2 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:29,871-Speed 9356.18 samples/sec Loss 9.1038 LearningRate 0.0789 Epoch: 2 Global Step: 37240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:02:30,948-Speed 9511.58 samples/sec Loss 9.0282 LearningRate 0.0789 Epoch: 2 Global Step: 37250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:32,019-Speed 9562.00 samples/sec Loss 9.1071 LearningRate 0.0789 Epoch: 2 Global Step: 37260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:33,091-Speed 9557.07 samples/sec Loss 8.9428 LearningRate 0.0789 Epoch: 2 Global Step: 37270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:34,139-Speed 9779.20 samples/sec Loss 9.0356 LearningRate 0.0789 Epoch: 2 Global Step: 37280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:35,220-Speed 9476.74 samples/sec Loss 9.0561 LearningRate 0.0789 Epoch: 2 Global Step: 37290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:36,269-Speed 9771.93 samples/sec Loss 8.9578 LearningRate 0.0789 Epoch: 2 Global Step: 37300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:37,340-Speed 9564.87 samples/sec Loss 8.9849 LearningRate 0.0789 Epoch: 2 Global Step: 37310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:38,415-Speed 9531.86 samples/sec Loss 8.9693 LearningRate 0.0789 Epoch: 2 Global Step: 37320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:39,521-Speed 9266.74 samples/sec Loss 9.0403 LearningRate 0.0789 Epoch: 2 Global Step: 37330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:40,626-Speed 9272.92 samples/sec Loss 8.8455 LearningRate 0.0789 Epoch: 2 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:02:41,725-Speed 9329.56 samples/sec Loss 9.0450 LearningRate 0.0789 Epoch: 2 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:42,811-Speed 9432.91 samples/sec Loss 9.0942 LearningRate 0.0789 Epoch: 2 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:43,875-Speed 9625.81 samples/sec Loss 8.9358 LearningRate 0.0789 Epoch: 2 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:44,959-Speed 9451.45 samples/sec Loss 9.0701 LearningRate 0.0789 Epoch: 2 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:46,040-Speed 9480.44 samples/sec Loss 9.0275 LearningRate 0.0789 Epoch: 2 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:47,070-Speed 9950.07 samples/sec Loss 9.0804 LearningRate 0.0788 Epoch: 2 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:48,142-Speed 9550.94 samples/sec Loss 9.1316 LearningRate 0.0788 Epoch: 2 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:49,197-Speed 9714.23 samples/sec Loss 8.9944 LearningRate 0.0788 Epoch: 2 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:50,226-Speed 9959.97 samples/sec Loss 9.0244 LearningRate 0.0788 Epoch: 2 Global Step: 37430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:51,318-Speed 9386.20 samples/sec Loss 8.9669 LearningRate 0.0788 Epoch: 2 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:52,411-Speed 9372.99 samples/sec Loss 9.0671 LearningRate 0.0788 Epoch: 2 Global Step: 37450 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:02:53,525-Speed 9198.43 samples/sec Loss 9.1388 LearningRate 0.0788 Epoch: 2 Global Step: 37460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:54,627-Speed 9292.64 samples/sec Loss 9.0757 LearningRate 0.0788 Epoch: 2 Global Step: 37470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:55,706-Speed 9497.23 samples/sec Loss 9.1355 LearningRate 0.0788 Epoch: 2 Global Step: 37480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:56,777-Speed 9569.49 samples/sec Loss 9.0546 LearningRate 0.0788 Epoch: 2 Global Step: 37490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:57,845-Speed 9592.96 samples/sec Loss 9.0698 LearningRate 0.0788 Epoch: 2 Global Step: 37500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:58,918-Speed 9553.18 samples/sec Loss 9.0766 LearningRate 0.0788 Epoch: 2 Global Step: 37510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:02:59,997-Speed 9494.40 samples/sec Loss 9.0122 LearningRate 0.0788 Epoch: 2 Global Step: 37520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:01,094-Speed 9346.46 samples/sec Loss 8.9494 LearningRate 0.0788 Epoch: 2 Global Step: 37530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:02,131-Speed 9872.58 samples/sec Loss 8.9992 LearningRate 0.0788 Epoch: 2 Global Step: 37540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:03,194-Speed 9642.05 samples/sec Loss 9.0597 LearningRate 0.0788 Epoch: 2 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:04,294-Speed 9313.23 samples/sec Loss 9.0213 LearningRate 0.0788 Epoch: 2 Global Step: 37560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:03:05,322-Speed 9964.83 samples/sec Loss 9.0749 LearningRate 0.0788 Epoch: 2 Global Step: 37570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:03:06,399-Speed 9514.38 samples/sec Loss 9.1401 LearningRate 0.0788 Epoch: 2 Global Step: 37580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:03:07,441-Speed 9837.21 samples/sec Loss 9.0169 LearningRate 0.0787 Epoch: 2 Global Step: 37590 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:03:08,479-Speed 9868.54 samples/sec Loss 8.8961 LearningRate 0.0787 Epoch: 2 Global Step: 37600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:09,520-Speed 9839.89 samples/sec Loss 9.0085 LearningRate 0.0787 Epoch: 2 Global Step: 37610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:10,597-Speed 9522.81 samples/sec Loss 8.9871 LearningRate 0.0787 Epoch: 2 Global Step: 37620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:11,700-Speed 9282.92 samples/sec Loss 9.0576 LearningRate 0.0787 Epoch: 2 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:12,774-Speed 9538.28 samples/sec Loss 8.9760 LearningRate 0.0787 Epoch: 2 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:13,840-Speed 9616.17 samples/sec Loss 8.8450 LearningRate 0.0787 Epoch: 2 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:14,931-Speed 9388.28 samples/sec Loss 9.0729 LearningRate 0.0787 Epoch: 2 Global Step: 37660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:16,017-Speed 9433.38 samples/sec Loss 9.0746 LearningRate 0.0787 Epoch: 2 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:17,066-Speed 9765.46 samples/sec Loss 8.9919 LearningRate 0.0787 Epoch: 2 Global Step: 37680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:18,155-Speed 9416.80 samples/sec Loss 8.8136 LearningRate 0.0787 Epoch: 2 Global Step: 37690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:19,199-Speed 9812.94 samples/sec Loss 9.0042 LearningRate 0.0787 Epoch: 2 Global Step: 37700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:20,263-Speed 9625.30 samples/sec Loss 9.1071 LearningRate 0.0787 Epoch: 2 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:21,360-Speed 9343.82 samples/sec Loss 9.0399 LearningRate 0.0787 Epoch: 2 Global Step: 37720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:22,444-Speed 9455.83 samples/sec Loss 9.0127 LearningRate 0.0787 Epoch: 2 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:23,541-Speed 9340.83 samples/sec Loss 8.9573 LearningRate 0.0787 Epoch: 2 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:24,632-Speed 9392.56 samples/sec Loss 8.9842 LearningRate 0.0787 Epoch: 2 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:25,702-Speed 9572.78 samples/sec Loss 9.0717 LearningRate 0.0787 Epoch: 2 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:26,756-Speed 9719.88 samples/sec Loss 8.9239 LearningRate 0.0787 Epoch: 2 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:27,810-Speed 9725.59 samples/sec Loss 9.0303 LearningRate 0.0786 Epoch: 2 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:28,932-Speed 9130.65 samples/sec Loss 9.0989 LearningRate 0.0786 Epoch: 2 Global Step: 37790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:30,011-Speed 9492.73 samples/sec Loss 9.1439 LearningRate 0.0786 Epoch: 2 Global Step: 37800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:03:31,043-Speed 9932.33 samples/sec Loss 8.9891 LearningRate 0.0786 Epoch: 2 Global Step: 37810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:03:32,115-Speed 9556.67 samples/sec Loss 8.9333 LearningRate 0.0786 Epoch: 2 Global Step: 37820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:03:33,194-Speed 9496.78 samples/sec Loss 8.9646 LearningRate 0.0786 Epoch: 2 Global Step: 37830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:34,288-Speed 9365.39 samples/sec Loss 8.8659 LearningRate 0.0786 Epoch: 2 Global Step: 37840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:35,387-Speed 9320.04 samples/sec Loss 8.9612 LearningRate 0.0786 Epoch: 2 Global Step: 37850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:36,507-Speed 9154.77 samples/sec Loss 9.0698 LearningRate 0.0786 Epoch: 2 Global Step: 37860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:37,582-Speed 9527.97 samples/sec Loss 8.9910 LearningRate 0.0786 Epoch: 2 Global Step: 37870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:38,676-Speed 9365.21 samples/sec Loss 9.0615 LearningRate 0.0786 Epoch: 2 Global Step: 37880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:39,751-Speed 9533.03 samples/sec Loss 9.0274 LearningRate 0.0786 Epoch: 2 Global Step: 37890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:40,790-Speed 9868.05 samples/sec Loss 9.0998 LearningRate 0.0786 Epoch: 2 Global Step: 37900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:41,900-Speed 9223.33 samples/sec Loss 9.0930 LearningRate 0.0786 Epoch: 2 Global Step: 37910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:42,957-Speed 9693.23 samples/sec Loss 8.9242 LearningRate 0.0786 Epoch: 2 Global Step: 37920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:03:44,008-Speed 9754.53 samples/sec Loss 8.9647 LearningRate 0.0786 Epoch: 2 Global Step: 37930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:45,093-Speed 9442.34 samples/sec Loss 9.0396 LearningRate 0.0786 Epoch: 2 Global Step: 37940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:46,185-Speed 9387.87 samples/sec Loss 8.9657 LearningRate 0.0786 Epoch: 2 Global Step: 37950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:47,253-Speed 9590.78 samples/sec Loss 9.0547 LearningRate 0.0786 Epoch: 2 Global Step: 37960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:48,325-Speed 9557.80 samples/sec Loss 9.0636 LearningRate 0.0785 Epoch: 2 Global Step: 37970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:49,383-Speed 9683.73 samples/sec Loss 8.9812 LearningRate 0.0785 Epoch: 2 Global Step: 37980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:50,468-Speed 9448.88 samples/sec Loss 8.9998 LearningRate 0.0785 Epoch: 2 Global Step: 37990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:03:51,496-Speed 9967.16 samples/sec Loss 8.8745 LearningRate 0.0785 Epoch: 2 Global Step: 38000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:04:13,481-[lfw][38000]XNorm: 13.333310 Training: 2022-04-11 13:04:13,482-[lfw][38000]Accuracy-Flip: 0.99533+-0.00256 Training: 2022-04-11 13:04:13,482-[lfw][38000]Accuracy-Highest: 0.99533 Training: 2022-04-11 13:04:38,859-[cfp_fp][38000]XNorm: 11.191384 Training: 2022-04-11 13:04:38,860-[cfp_fp][38000]Accuracy-Flip: 0.93886+-0.01285 Training: 2022-04-11 13:04:38,860-[cfp_fp][38000]Accuracy-Highest: 0.93986 Training: 2022-04-11 13:05:00,667-[agedb_30][38000]XNorm: 12.882927 Training: 2022-04-11 13:05:00,668-[agedb_30][38000]Accuracy-Flip: 0.95333+-0.01140 Training: 2022-04-11 13:05:00,669-[agedb_30][38000]Accuracy-Highest: 0.95333 Training: 2022-04-11 13:05:01,754-Speed 145.75 samples/sec Loss 8.9804 LearningRate 0.0785 Epoch: 2 Global Step: 38010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:02,806-Speed 9735.86 samples/sec Loss 8.9816 LearningRate 0.0785 Epoch: 2 Global Step: 38020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:03,852-Speed 9800.46 samples/sec Loss 8.9943 LearningRate 0.0785 Epoch: 2 Global Step: 38030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:04,884-Speed 9922.68 samples/sec Loss 9.0138 LearningRate 0.0785 Epoch: 2 Global Step: 38040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:05,926-Speed 9836.59 samples/sec Loss 8.9417 LearningRate 0.0785 Epoch: 2 Global Step: 38050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:07,002-Speed 9520.15 samples/sec Loss 9.0427 LearningRate 0.0785 Epoch: 2 Global Step: 38060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:08,079-Speed 9516.31 samples/sec Loss 9.0201 LearningRate 0.0785 Epoch: 2 Global Step: 38070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:09,183-Speed 9282.52 samples/sec Loss 9.0218 LearningRate 0.0785 Epoch: 2 Global Step: 38080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:10,231-Speed 9774.01 samples/sec Loss 8.9910 LearningRate 0.0785 Epoch: 2 Global Step: 38090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:11,289-Speed 9683.28 samples/sec Loss 9.1933 LearningRate 0.0785 Epoch: 2 Global Step: 38100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:12,336-Speed 9794.15 samples/sec Loss 9.0043 LearningRate 0.0785 Epoch: 2 Global Step: 38110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:13,362-Speed 9981.23 samples/sec Loss 9.0363 LearningRate 0.0785 Epoch: 2 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:14,457-Speed 9356.36 samples/sec Loss 9.1038 LearningRate 0.0785 Epoch: 2 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:15,533-Speed 9522.57 samples/sec Loss 8.8659 LearningRate 0.0785 Epoch: 2 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:16,618-Speed 9443.31 samples/sec Loss 8.8252 LearningRate 0.0784 Epoch: 2 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:17,705-Speed 9421.59 samples/sec Loss 9.1835 LearningRate 0.0784 Epoch: 2 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:18,726-Speed 10049.74 samples/sec Loss 9.1246 LearningRate 0.0784 Epoch: 2 Global Step: 38170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:19,803-Speed 9511.18 samples/sec Loss 9.1013 LearningRate 0.0784 Epoch: 2 Global Step: 38180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:20,929-Speed 9106.07 samples/sec Loss 9.0300 LearningRate 0.0784 Epoch: 2 Global Step: 38190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:21,971-Speed 9826.20 samples/sec Loss 8.9327 LearningRate 0.0784 Epoch: 2 Global Step: 38200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:23,035-Speed 9636.24 samples/sec Loss 8.9470 LearningRate 0.0784 Epoch: 2 Global Step: 38210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:24,120-Speed 9442.43 samples/sec Loss 8.9805 LearningRate 0.0784 Epoch: 2 Global Step: 38220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:25,197-Speed 9514.71 samples/sec Loss 8.9078 LearningRate 0.0784 Epoch: 2 Global Step: 38230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:26,230-Speed 9917.19 samples/sec Loss 9.0737 LearningRate 0.0784 Epoch: 2 Global Step: 38240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:27,289-Speed 9678.38 samples/sec Loss 9.0451 LearningRate 0.0784 Epoch: 2 Global Step: 38250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:28,390-Speed 9311.68 samples/sec Loss 9.0789 LearningRate 0.0784 Epoch: 2 Global Step: 38260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:29,496-Speed 9258.65 samples/sec Loss 8.9823 LearningRate 0.0784 Epoch: 2 Global Step: 38270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:30,549-Speed 9728.22 samples/sec Loss 9.0247 LearningRate 0.0784 Epoch: 2 Global Step: 38280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:31,590-Speed 9846.66 samples/sec Loss 9.0113 LearningRate 0.0784 Epoch: 2 Global Step: 38290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:32,690-Speed 9309.44 samples/sec Loss 8.9419 LearningRate 0.0784 Epoch: 2 Global Step: 38300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:33,733-Speed 9830.43 samples/sec Loss 9.1244 LearningRate 0.0784 Epoch: 2 Global Step: 38310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:34,828-Speed 9351.01 samples/sec Loss 9.0540 LearningRate 0.0784 Epoch: 2 Global Step: 38320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:35,888-Speed 9669.14 samples/sec Loss 9.0714 LearningRate 0.0784 Epoch: 2 Global Step: 38330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:36,938-Speed 9754.82 samples/sec Loss 8.9675 LearningRate 0.0783 Epoch: 2 Global Step: 38340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:38,038-Speed 9313.36 samples/sec Loss 9.0125 LearningRate 0.0783 Epoch: 2 Global Step: 38350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:39,090-Speed 9743.33 samples/sec Loss 9.0745 LearningRate 0.0783 Epoch: 2 Global Step: 38360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:40,168-Speed 9508.70 samples/sec Loss 8.8973 LearningRate 0.0783 Epoch: 2 Global Step: 38370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:41,288-Speed 9145.66 samples/sec Loss 9.1078 LearningRate 0.0783 Epoch: 2 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:42,390-Speed 9301.19 samples/sec Loss 9.1042 LearningRate 0.0783 Epoch: 2 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:43,449-Speed 9671.03 samples/sec Loss 9.0682 LearningRate 0.0783 Epoch: 2 Global Step: 38400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:44,508-Speed 9674.85 samples/sec Loss 8.9549 LearningRate 0.0783 Epoch: 2 Global Step: 38410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:45,575-Speed 9607.62 samples/sec Loss 9.0231 LearningRate 0.0783 Epoch: 2 Global Step: 38420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:46,649-Speed 9538.45 samples/sec Loss 8.9901 LearningRate 0.0783 Epoch: 2 Global Step: 38430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:47,750-Speed 9306.62 samples/sec Loss 9.0904 LearningRate 0.0783 Epoch: 2 Global Step: 38440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:48,816-Speed 9610.91 samples/sec Loss 9.1360 LearningRate 0.0783 Epoch: 2 Global Step: 38450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:49,912-Speed 9347.16 samples/sec Loss 9.1725 LearningRate 0.0783 Epoch: 2 Global Step: 38460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:50,975-Speed 9642.25 samples/sec Loss 8.9715 LearningRate 0.0783 Epoch: 2 Global Step: 38470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:05:52,053-Speed 9507.79 samples/sec Loss 9.0778 LearningRate 0.0783 Epoch: 2 Global Step: 38480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:53,119-Speed 9608.53 samples/sec Loss 8.9594 LearningRate 0.0783 Epoch: 2 Global Step: 38490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:54,185-Speed 9609.83 samples/sec Loss 8.9563 LearningRate 0.0783 Epoch: 2 Global Step: 38500 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:55,210-Speed 9994.23 samples/sec Loss 8.9442 LearningRate 0.0783 Epoch: 2 Global Step: 38510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:56,277-Speed 9608.12 samples/sec Loss 9.0028 LearningRate 0.0783 Epoch: 2 Global Step: 38520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:57,365-Speed 9423.01 samples/sec Loss 9.0192 LearningRate 0.0782 Epoch: 2 Global Step: 38530 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:58,464-Speed 9319.16 samples/sec Loss 9.1128 LearningRate 0.0782 Epoch: 2 Global Step: 38540 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:05:59,575-Speed 9220.73 samples/sec Loss 8.9655 LearningRate 0.0782 Epoch: 2 Global Step: 38550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:06:00,646-Speed 9573.01 samples/sec Loss 9.0797 LearningRate 0.0782 Epoch: 2 Global Step: 38560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:06:01,703-Speed 9691.41 samples/sec Loss 8.9637 LearningRate 0.0782 Epoch: 2 Global Step: 38570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:02,788-Speed 9445.67 samples/sec Loss 8.9535 LearningRate 0.0782 Epoch: 2 Global Step: 38580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:03,877-Speed 9407.43 samples/sec Loss 8.9365 LearningRate 0.0782 Epoch: 2 Global Step: 38590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:04,975-Speed 9337.86 samples/sec Loss 9.0495 LearningRate 0.0782 Epoch: 2 Global Step: 38600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:06,069-Speed 9365.76 samples/sec Loss 8.8749 LearningRate 0.0782 Epoch: 2 Global Step: 38610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:07,146-Speed 9510.06 samples/sec Loss 9.0596 LearningRate 0.0782 Epoch: 2 Global Step: 38620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:08,189-Speed 9822.76 samples/sec Loss 8.9461 LearningRate 0.0782 Epoch: 2 Global Step: 38630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:09,272-Speed 9467.67 samples/sec Loss 9.0530 LearningRate 0.0782 Epoch: 2 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:10,308-Speed 9889.60 samples/sec Loss 9.0207 LearningRate 0.0782 Epoch: 2 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:11,333-Speed 9998.50 samples/sec Loss 9.0922 LearningRate 0.0782 Epoch: 2 Global Step: 38660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:12,391-Speed 9681.47 samples/sec Loss 9.0205 LearningRate 0.0782 Epoch: 2 Global Step: 38670 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:06:13,426-Speed 9894.67 samples/sec Loss 9.0699 LearningRate 0.0782 Epoch: 2 Global Step: 38680 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:06:14,453-Speed 9983.05 samples/sec Loss 9.1481 LearningRate 0.0782 Epoch: 2 Global Step: 38690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:06:15,529-Speed 9516.60 samples/sec Loss 9.0408 LearningRate 0.0782 Epoch: 2 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:16,627-Speed 9337.15 samples/sec Loss 9.1100 LearningRate 0.0782 Epoch: 2 Global Step: 38710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:17,738-Speed 9217.76 samples/sec Loss 9.0947 LearningRate 0.0781 Epoch: 2 Global Step: 38720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:18,877-Speed 8997.11 samples/sec Loss 9.0377 LearningRate 0.0781 Epoch: 2 Global Step: 38730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:19,959-Speed 9465.71 samples/sec Loss 8.9832 LearningRate 0.0781 Epoch: 2 Global Step: 38740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:21,052-Speed 9373.38 samples/sec Loss 9.0960 LearningRate 0.0781 Epoch: 2 Global Step: 38750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:22,149-Speed 9348.24 samples/sec Loss 9.0277 LearningRate 0.0781 Epoch: 2 Global Step: 38760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:23,201-Speed 9742.97 samples/sec Loss 9.0165 LearningRate 0.0781 Epoch: 2 Global Step: 38770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:24,266-Speed 9613.47 samples/sec Loss 9.0477 LearningRate 0.0781 Epoch: 2 Global Step: 38780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:25,380-Speed 9199.94 samples/sec Loss 8.9384 LearningRate 0.0781 Epoch: 2 Global Step: 38790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:26,477-Speed 9337.49 samples/sec Loss 9.0152 LearningRate 0.0781 Epoch: 2 Global Step: 38800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:27,545-Speed 9597.08 samples/sec Loss 9.0148 LearningRate 0.0781 Epoch: 2 Global Step: 38810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:28,610-Speed 9619.74 samples/sec Loss 9.0217 LearningRate 0.0781 Epoch: 2 Global Step: 38820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:29,656-Speed 9797.38 samples/sec Loss 9.0388 LearningRate 0.0781 Epoch: 2 Global Step: 38830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:30,736-Speed 9490.15 samples/sec Loss 8.8573 LearningRate 0.0781 Epoch: 2 Global Step: 38840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:31,822-Speed 9431.17 samples/sec Loss 9.0855 LearningRate 0.0781 Epoch: 2 Global Step: 38850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:32,938-Speed 9187.70 samples/sec Loss 8.8926 LearningRate 0.0781 Epoch: 2 Global Step: 38860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:34,009-Speed 9563.31 samples/sec Loss 8.9891 LearningRate 0.0781 Epoch: 2 Global Step: 38870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:35,086-Speed 9511.60 samples/sec Loss 9.0444 LearningRate 0.0781 Epoch: 2 Global Step: 38880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:36,155-Speed 9586.25 samples/sec Loss 9.0013 LearningRate 0.0781 Epoch: 2 Global Step: 38890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:37,221-Speed 9608.91 samples/sec Loss 8.8537 LearningRate 0.0781 Epoch: 2 Global Step: 38900 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:06:38,294-Speed 9549.65 samples/sec Loss 8.8822 LearningRate 0.0780 Epoch: 2 Global Step: 38910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:39,336-Speed 9836.16 samples/sec Loss 8.9245 LearningRate 0.0780 Epoch: 2 Global Step: 38920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:40,402-Speed 9615.80 samples/sec Loss 8.9691 LearningRate 0.0780 Epoch: 2 Global Step: 38930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:41,509-Speed 9257.77 samples/sec Loss 8.9465 LearningRate 0.0780 Epoch: 2 Global Step: 38940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:42,605-Speed 9350.10 samples/sec Loss 9.0324 LearningRate 0.0780 Epoch: 2 Global Step: 38950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:43,683-Speed 9506.30 samples/sec Loss 8.9701 LearningRate 0.0780 Epoch: 2 Global Step: 38960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:44,714-Speed 9937.61 samples/sec Loss 8.9488 LearningRate 0.0780 Epoch: 2 Global Step: 38970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:45,775-Speed 9658.04 samples/sec Loss 9.0468 LearningRate 0.0780 Epoch: 2 Global Step: 38980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:46,851-Speed 9520.99 samples/sec Loss 9.0414 LearningRate 0.0780 Epoch: 2 Global Step: 38990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:47,915-Speed 9624.48 samples/sec Loss 9.0539 LearningRate 0.0780 Epoch: 2 Global Step: 39000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:48,968-Speed 9730.67 samples/sec Loss 8.9250 LearningRate 0.0780 Epoch: 2 Global Step: 39010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:06:50,034-Speed 9618.43 samples/sec Loss 9.0693 LearningRate 0.0780 Epoch: 2 Global Step: 39020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:06:51,129-Speed 9359.76 samples/sec Loss 9.0676 LearningRate 0.0780 Epoch: 2 Global Step: 39030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:06:52,206-Speed 9507.51 samples/sec Loss 9.0796 LearningRate 0.0780 Epoch: 2 Global Step: 39040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:06:53,342-Speed 9027.02 samples/sec Loss 9.0446 LearningRate 0.0780 Epoch: 2 Global Step: 39050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:06:54,434-Speed 9385.06 samples/sec Loss 9.0727 LearningRate 0.0780 Epoch: 2 Global Step: 39060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:06:55,507-Speed 9551.02 samples/sec Loss 9.0596 LearningRate 0.0780 Epoch: 2 Global Step: 39070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:06:56,614-Speed 9257.31 samples/sec Loss 9.0073 LearningRate 0.0780 Epoch: 2 Global Step: 39080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:06:57,723-Speed 9235.77 samples/sec Loss 8.9504 LearningRate 0.0780 Epoch: 2 Global Step: 39090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:06:58,800-Speed 9513.75 samples/sec Loss 9.0704 LearningRate 0.0779 Epoch: 2 Global Step: 39100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:06:59,863-Speed 9640.40 samples/sec Loss 9.1015 LearningRate 0.0779 Epoch: 2 Global Step: 39110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:07:00,939-Speed 9529.09 samples/sec Loss 8.9928 LearningRate 0.0779 Epoch: 2 Global Step: 39120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:07:02,002-Speed 9638.19 samples/sec Loss 8.9522 LearningRate 0.0779 Epoch: 2 Global Step: 39130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:07:03,056-Speed 9719.37 samples/sec Loss 8.9065 LearningRate 0.0779 Epoch: 2 Global Step: 39140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:07:04,125-Speed 9581.29 samples/sec Loss 8.8552 LearningRate 0.0779 Epoch: 2 Global Step: 39150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:07:05,194-Speed 9585.18 samples/sec Loss 9.0344 LearningRate 0.0779 Epoch: 2 Global Step: 39160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:06,299-Speed 9270.80 samples/sec Loss 9.0266 LearningRate 0.0779 Epoch: 2 Global Step: 39170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:07,349-Speed 9759.12 samples/sec Loss 8.9799 LearningRate 0.0779 Epoch: 2 Global Step: 39180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:08,432-Speed 9460.80 samples/sec Loss 9.0874 LearningRate 0.0779 Epoch: 2 Global Step: 39190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:09,491-Speed 9674.12 samples/sec Loss 8.9767 LearningRate 0.0779 Epoch: 2 Global Step: 39200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:10,610-Speed 9161.02 samples/sec Loss 8.9860 LearningRate 0.0779 Epoch: 2 Global Step: 39210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:11,708-Speed 9332.56 samples/sec Loss 9.0072 LearningRate 0.0779 Epoch: 2 Global Step: 39220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:12,756-Speed 9771.43 samples/sec Loss 9.0098 LearningRate 0.0779 Epoch: 2 Global Step: 39230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:13,851-Speed 9359.77 samples/sec Loss 9.0797 LearningRate 0.0779 Epoch: 2 Global Step: 39240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:14,949-Speed 9328.53 samples/sec Loss 9.0399 LearningRate 0.0779 Epoch: 2 Global Step: 39250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:16,009-Speed 9664.59 samples/sec Loss 9.0129 LearningRate 0.0779 Epoch: 2 Global Step: 39260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:17,085-Speed 9523.69 samples/sec Loss 8.9659 LearningRate 0.0779 Epoch: 2 Global Step: 39270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:18,202-Speed 9178.16 samples/sec Loss 9.0637 LearningRate 0.0779 Epoch: 2 Global Step: 39280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:19,265-Speed 9639.24 samples/sec Loss 9.0412 LearningRate 0.0778 Epoch: 2 Global Step: 39290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:20,348-Speed 9459.78 samples/sec Loss 9.0261 LearningRate 0.0778 Epoch: 2 Global Step: 39300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:21,408-Speed 9673.86 samples/sec Loss 8.8913 LearningRate 0.0778 Epoch: 2 Global Step: 39310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:22,490-Speed 9467.26 samples/sec Loss 8.9986 LearningRate 0.0778 Epoch: 2 Global Step: 39320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:23,565-Speed 9529.73 samples/sec Loss 9.0947 LearningRate 0.0778 Epoch: 2 Global Step: 39330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:24,650-Speed 9444.26 samples/sec Loss 9.0652 LearningRate 0.0778 Epoch: 2 Global Step: 39340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:25,686-Speed 9892.44 samples/sec Loss 9.0534 LearningRate 0.0778 Epoch: 2 Global Step: 39350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:26,779-Speed 9375.48 samples/sec Loss 8.9911 LearningRate 0.0778 Epoch: 2 Global Step: 39360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:07:27,869-Speed 9399.17 samples/sec Loss 8.9213 LearningRate 0.0778 Epoch: 2 Global Step: 39370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:07:28,930-Speed 9656.82 samples/sec Loss 8.9801 LearningRate 0.0778 Epoch: 2 Global Step: 39380 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:07:29,999-Speed 9586.91 samples/sec Loss 8.9421 LearningRate 0.0778 Epoch: 2 Global Step: 39390 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:07:31,146-Speed 8934.54 samples/sec Loss 9.0456 LearningRate 0.0778 Epoch: 2 Global Step: 39400 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:07:32,210-Speed 9633.91 samples/sec Loss 8.9966 LearningRate 0.0778 Epoch: 2 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:33,297-Speed 9425.94 samples/sec Loss 9.0230 LearningRate 0.0778 Epoch: 2 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:34,390-Speed 9368.24 samples/sec Loss 9.0225 LearningRate 0.0778 Epoch: 2 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:35,476-Speed 9439.99 samples/sec Loss 9.0912 LearningRate 0.0778 Epoch: 2 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:36,573-Speed 9345.29 samples/sec Loss 9.0754 LearningRate 0.0778 Epoch: 2 Global Step: 39450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:37,634-Speed 9654.26 samples/sec Loss 9.0054 LearningRate 0.0778 Epoch: 2 Global Step: 39460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:38,745-Speed 9222.74 samples/sec Loss 9.0311 LearningRate 0.0778 Epoch: 2 Global Step: 39470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:39,820-Speed 9531.52 samples/sec Loss 8.9716 LearningRate 0.0777 Epoch: 2 Global Step: 39480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:40,908-Speed 9412.25 samples/sec Loss 8.9523 LearningRate 0.0777 Epoch: 2 Global Step: 39490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:43,330-Speed 9615.85 samples/sec Loss 8.9285 LearningRate 0.0777 Epoch: 2 Global Step: 39500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:45,903-Speed 9342.29 samples/sec Loss 8.9795 LearningRate 0.0777 Epoch: 2 Global Step: 39510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:07:47,030-Speed 9093.75 samples/sec Loss 8.9484 LearningRate 0.0777 Epoch: 2 Global Step: 39520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:07:48,083-Speed 9733.87 samples/sec Loss 8.9716 LearningRate 0.0777 Epoch: 2 Global Step: 39530 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:07:49,166-Speed 9456.64 samples/sec Loss 8.9302 LearningRate 0.0777 Epoch: 2 Global Step: 39540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:50,293-Speed 9092.86 samples/sec Loss 9.0153 LearningRate 0.0777 Epoch: 2 Global Step: 39550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:51,368-Speed 9533.40 samples/sec Loss 8.9689 LearningRate 0.0777 Epoch: 2 Global Step: 39560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:52,471-Speed 9288.41 samples/sec Loss 8.9724 LearningRate 0.0777 Epoch: 2 Global Step: 39570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:53,544-Speed 9552.90 samples/sec Loss 8.8756 LearningRate 0.0777 Epoch: 2 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:54,647-Speed 9288.06 samples/sec Loss 8.9805 LearningRate 0.0777 Epoch: 2 Global Step: 39590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:55,756-Speed 9236.97 samples/sec Loss 8.9996 LearningRate 0.0777 Epoch: 2 Global Step: 39600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:56,853-Speed 9347.10 samples/sec Loss 8.8793 LearningRate 0.0777 Epoch: 2 Global Step: 39610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:57,930-Speed 9505.03 samples/sec Loss 9.1432 LearningRate 0.0777 Epoch: 2 Global Step: 39620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:07:59,006-Speed 9526.51 samples/sec Loss 9.0518 LearningRate 0.0777 Epoch: 2 Global Step: 39630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:00,072-Speed 9614.38 samples/sec Loss 8.9290 LearningRate 0.0777 Epoch: 2 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:01,121-Speed 9764.77 samples/sec Loss 8.9225 LearningRate 0.0777 Epoch: 2 Global Step: 39650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:02,184-Speed 9641.17 samples/sec Loss 8.9615 LearningRate 0.0777 Epoch: 2 Global Step: 39660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:03,252-Speed 9598.57 samples/sec Loss 8.9117 LearningRate 0.0776 Epoch: 2 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:04,328-Speed 9521.50 samples/sec Loss 8.8826 LearningRate 0.0776 Epoch: 2 Global Step: 39680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:05,390-Speed 9650.58 samples/sec Loss 8.9962 LearningRate 0.0776 Epoch: 2 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:06,467-Speed 9509.54 samples/sec Loss 8.8959 LearningRate 0.0776 Epoch: 2 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:07,552-Speed 9443.59 samples/sec Loss 9.0422 LearningRate 0.0776 Epoch: 2 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:08,620-Speed 9595.07 samples/sec Loss 8.9598 LearningRate 0.0776 Epoch: 2 Global Step: 39720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:09,719-Speed 9320.28 samples/sec Loss 8.9922 LearningRate 0.0776 Epoch: 2 Global Step: 39730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:10,788-Speed 9586.75 samples/sec Loss 8.9530 LearningRate 0.0776 Epoch: 2 Global Step: 39740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:08:11,862-Speed 9532.77 samples/sec Loss 8.9990 LearningRate 0.0776 Epoch: 2 Global Step: 39750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:08:12,922-Speed 9670.27 samples/sec Loss 8.9049 LearningRate 0.0776 Epoch: 2 Global Step: 39760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:08:14,008-Speed 9433.45 samples/sec Loss 8.9300 LearningRate 0.0776 Epoch: 2 Global Step: 39770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:08:15,101-Speed 9376.35 samples/sec Loss 8.9809 LearningRate 0.0776 Epoch: 2 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:16,150-Speed 9778.57 samples/sec Loss 8.8477 LearningRate 0.0776 Epoch: 2 Global Step: 39790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:17,181-Speed 9933.53 samples/sec Loss 8.9855 LearningRate 0.0776 Epoch: 2 Global Step: 39800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:18,271-Speed 9404.56 samples/sec Loss 8.9277 LearningRate 0.0776 Epoch: 2 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:19,336-Speed 9623.82 samples/sec Loss 9.0801 LearningRate 0.0776 Epoch: 2 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:20,435-Speed 9319.87 samples/sec Loss 8.9686 LearningRate 0.0776 Epoch: 2 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:21,552-Speed 9176.28 samples/sec Loss 9.0234 LearningRate 0.0776 Epoch: 2 Global Step: 39840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:22,657-Speed 9270.56 samples/sec Loss 8.9916 LearningRate 0.0775 Epoch: 2 Global Step: 39850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:23,713-Speed 9695.65 samples/sec Loss 8.9879 LearningRate 0.0775 Epoch: 2 Global Step: 39860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:24,737-Speed 10010.89 samples/sec Loss 9.0470 LearningRate 0.0775 Epoch: 2 Global Step: 39870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:25,781-Speed 9813.76 samples/sec Loss 8.9809 LearningRate 0.0775 Epoch: 2 Global Step: 39880 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:08:26,820-Speed 9862.13 samples/sec Loss 8.9463 LearningRate 0.0775 Epoch: 2 Global Step: 39890 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:08:27,858-Speed 9872.13 samples/sec Loss 9.0554 LearningRate 0.0775 Epoch: 2 Global Step: 39900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:28,905-Speed 9783.06 samples/sec Loss 9.0617 LearningRate 0.0775 Epoch: 2 Global Step: 39910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:30,010-Speed 9276.91 samples/sec Loss 9.0075 LearningRate 0.0775 Epoch: 2 Global Step: 39920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:31,080-Speed 9571.13 samples/sec Loss 9.0221 LearningRate 0.0775 Epoch: 2 Global Step: 39930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:32,197-Speed 9171.85 samples/sec Loss 8.8383 LearningRate 0.0775 Epoch: 2 Global Step: 39940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:33,281-Speed 9460.49 samples/sec Loss 8.9434 LearningRate 0.0775 Epoch: 2 Global Step: 39950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:34,346-Speed 9621.54 samples/sec Loss 8.8275 LearningRate 0.0775 Epoch: 2 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:35,445-Speed 9321.07 samples/sec Loss 9.0798 LearningRate 0.0775 Epoch: 2 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:36,511-Speed 9612.67 samples/sec Loss 9.0086 LearningRate 0.0775 Epoch: 2 Global Step: 39980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:37,589-Speed 9505.40 samples/sec Loss 8.8911 LearningRate 0.0775 Epoch: 2 Global Step: 39990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:08:38,626-Speed 9880.14 samples/sec Loss 9.0410 LearningRate 0.0775 Epoch: 2 Global Step: 40000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:09:00,570-[lfw][40000]XNorm: 12.913431 Training: 2022-04-11 13:09:00,570-[lfw][40000]Accuracy-Flip: 0.99367+-0.00379 Training: 2022-04-11 13:09:00,571-[lfw][40000]Accuracy-Highest: 0.99533 Training: 2022-04-11 13:09:25,883-[cfp_fp][40000]XNorm: 10.887237 Training: 2022-04-11 13:09:25,884-[cfp_fp][40000]Accuracy-Flip: 0.93914+-0.01023 Training: 2022-04-11 13:09:25,884-[cfp_fp][40000]Accuracy-Highest: 0.93986 Training: 2022-04-11 13:09:47,661-[agedb_30][40000]XNorm: 12.465581 Training: 2022-04-11 13:09:47,662-[agedb_30][40000]Accuracy-Flip: 0.95083+-0.01193 Training: 2022-04-11 13:09:47,663-[agedb_30][40000]Accuracy-Highest: 0.95333 Training: 2022-04-11 13:09:48,732-Speed 146.07 samples/sec Loss 9.0025 LearningRate 0.0775 Epoch: 2 Global Step: 40010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:09:49,768-Speed 9886.22 samples/sec Loss 8.9505 LearningRate 0.0775 Epoch: 2 Global Step: 40020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:09:50,813-Speed 9808.71 samples/sec Loss 8.9868 LearningRate 0.0775 Epoch: 2 Global Step: 40030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:09:51,849-Speed 9883.28 samples/sec Loss 9.0375 LearningRate 0.0774 Epoch: 2 Global Step: 40040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:09:52,886-Speed 9882.69 samples/sec Loss 9.0238 LearningRate 0.0774 Epoch: 2 Global Step: 40050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:09:53,933-Speed 9784.10 samples/sec Loss 8.9884 LearningRate 0.0774 Epoch: 2 Global Step: 40060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:09:54,983-Speed 9761.94 samples/sec Loss 9.0793 LearningRate 0.0774 Epoch: 2 Global Step: 40070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:09:56,075-Speed 9381.21 samples/sec Loss 9.0057 LearningRate 0.0774 Epoch: 2 Global Step: 40080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:09:57,156-Speed 9479.88 samples/sec Loss 8.9855 LearningRate 0.0774 Epoch: 2 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:09:58,246-Speed 9402.66 samples/sec Loss 9.0150 LearningRate 0.0774 Epoch: 2 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:09:59,323-Speed 9510.63 samples/sec Loss 8.9809 LearningRate 0.0774 Epoch: 2 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:00,405-Speed 9466.21 samples/sec Loss 8.9324 LearningRate 0.0774 Epoch: 2 Global Step: 40120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:01,467-Speed 9652.36 samples/sec Loss 9.0550 LearningRate 0.0774 Epoch: 2 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:02,549-Speed 9469.46 samples/sec Loss 8.9972 LearningRate 0.0774 Epoch: 2 Global Step: 40140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:03,613-Speed 9635.19 samples/sec Loss 8.9594 LearningRate 0.0774 Epoch: 2 Global Step: 40150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:04,653-Speed 9846.71 samples/sec Loss 9.0127 LearningRate 0.0774 Epoch: 2 Global Step: 40160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:05,704-Speed 9751.64 samples/sec Loss 8.9725 LearningRate 0.0774 Epoch: 2 Global Step: 40170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:06,804-Speed 9311.29 samples/sec Loss 8.9278 LearningRate 0.0774 Epoch: 2 Global Step: 40180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:07,830-Speed 9993.11 samples/sec Loss 8.8745 LearningRate 0.0774 Epoch: 2 Global Step: 40190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:08,904-Speed 9536.46 samples/sec Loss 8.9070 LearningRate 0.0774 Epoch: 2 Global Step: 40200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:09,998-Speed 9363.54 samples/sec Loss 8.8589 LearningRate 0.0774 Epoch: 2 Global Step: 40210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:11,044-Speed 9802.13 samples/sec Loss 9.0252 LearningRate 0.0774 Epoch: 2 Global Step: 40220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:12,110-Speed 9606.85 samples/sec Loss 9.0289 LearningRate 0.0773 Epoch: 2 Global Step: 40230 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:10:13,164-Speed 9720.48 samples/sec Loss 8.9332 LearningRate 0.0773 Epoch: 2 Global Step: 40240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:10:14,252-Speed 9416.03 samples/sec Loss 8.9437 LearningRate 0.0773 Epoch: 2 Global Step: 40250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:15,298-Speed 9804.53 samples/sec Loss 9.0073 LearningRate 0.0773 Epoch: 2 Global Step: 40260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:16,338-Speed 9850.49 samples/sec Loss 8.8719 LearningRate 0.0773 Epoch: 2 Global Step: 40270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:17,362-Speed 10009.90 samples/sec Loss 9.0210 LearningRate 0.0773 Epoch: 2 Global Step: 40280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:18,422-Speed 9664.91 samples/sec Loss 8.8892 LearningRate 0.0773 Epoch: 2 Global Step: 40290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:19,527-Speed 9267.53 samples/sec Loss 8.8211 LearningRate 0.0773 Epoch: 2 Global Step: 40300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:20,586-Speed 9680.37 samples/sec Loss 9.0142 LearningRate 0.0773 Epoch: 2 Global Step: 40310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:21,694-Speed 9243.39 samples/sec Loss 8.9407 LearningRate 0.0773 Epoch: 2 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:22,740-Speed 9797.36 samples/sec Loss 8.9595 LearningRate 0.0773 Epoch: 2 Global Step: 40330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:23,831-Speed 9387.65 samples/sec Loss 8.9538 LearningRate 0.0773 Epoch: 2 Global Step: 40340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:24,899-Speed 9591.23 samples/sec Loss 8.9444 LearningRate 0.0773 Epoch: 2 Global Step: 40350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:10:25,920-Speed 10045.03 samples/sec Loss 8.9645 LearningRate 0.0773 Epoch: 2 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:26,955-Speed 9902.92 samples/sec Loss 8.9822 LearningRate 0.0773 Epoch: 2 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:28,040-Speed 9439.64 samples/sec Loss 9.0171 LearningRate 0.0773 Epoch: 2 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:29,128-Speed 9421.94 samples/sec Loss 9.0643 LearningRate 0.0773 Epoch: 2 Global Step: 40390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:30,222-Speed 9361.70 samples/sec Loss 8.9876 LearningRate 0.0773 Epoch: 2 Global Step: 40400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:31,255-Speed 9917.77 samples/sec Loss 9.1057 LearningRate 0.0773 Epoch: 2 Global Step: 40410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:32,318-Speed 9642.54 samples/sec Loss 9.0205 LearningRate 0.0772 Epoch: 2 Global Step: 40420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:33,357-Speed 9862.73 samples/sec Loss 9.0268 LearningRate 0.0772 Epoch: 2 Global Step: 40430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:34,416-Speed 9674.42 samples/sec Loss 8.8824 LearningRate 0.0772 Epoch: 2 Global Step: 40440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:35,494-Speed 9503.19 samples/sec Loss 8.8667 LearningRate 0.0772 Epoch: 2 Global Step: 40450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:36,580-Speed 9434.28 samples/sec Loss 8.9058 LearningRate 0.0772 Epoch: 2 Global Step: 40460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:10:37,655-Speed 9533.58 samples/sec Loss 8.9498 LearningRate 0.0772 Epoch: 2 Global Step: 40470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:38,745-Speed 9396.96 samples/sec Loss 9.1132 LearningRate 0.0772 Epoch: 2 Global Step: 40480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:39,824-Speed 9496.66 samples/sec Loss 9.0148 LearningRate 0.0772 Epoch: 2 Global Step: 40490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:40,943-Speed 9159.93 samples/sec Loss 8.9473 LearningRate 0.0772 Epoch: 2 Global Step: 40500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:42,061-Speed 9157.19 samples/sec Loss 8.9100 LearningRate 0.0772 Epoch: 2 Global Step: 40510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:43,120-Speed 9678.46 samples/sec Loss 8.9641 LearningRate 0.0772 Epoch: 2 Global Step: 40520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:44,235-Speed 9191.40 samples/sec Loss 8.9327 LearningRate 0.0772 Epoch: 2 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:45,295-Speed 9664.63 samples/sec Loss 9.0076 LearningRate 0.0772 Epoch: 2 Global Step: 40540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:46,357-Speed 9656.28 samples/sec Loss 8.9618 LearningRate 0.0772 Epoch: 2 Global Step: 40550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:47,392-Speed 9892.95 samples/sec Loss 8.9928 LearningRate 0.0772 Epoch: 2 Global Step: 40560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:48,473-Speed 9475.59 samples/sec Loss 8.9530 LearningRate 0.0772 Epoch: 2 Global Step: 40570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:10:49,539-Speed 9611.23 samples/sec Loss 8.9295 LearningRate 0.0772 Epoch: 2 Global Step: 40580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:10:50,591-Speed 9747.61 samples/sec Loss 8.9215 LearningRate 0.0772 Epoch: 2 Global Step: 40590 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:10:51,683-Speed 9378.06 samples/sec Loss 9.0014 LearningRate 0.0772 Epoch: 2 Global Step: 40600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:10:52,725-Speed 9836.27 samples/sec Loss 8.9208 LearningRate 0.0771 Epoch: 2 Global Step: 40610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:10:53,778-Speed 9730.54 samples/sec Loss 8.9924 LearningRate 0.0771 Epoch: 2 Global Step: 40620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:54,850-Speed 9551.51 samples/sec Loss 9.0865 LearningRate 0.0771 Epoch: 2 Global Step: 40630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:55,917-Speed 9601.51 samples/sec Loss 8.9878 LearningRate 0.0771 Epoch: 2 Global Step: 40640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:56,983-Speed 9617.95 samples/sec Loss 8.9752 LearningRate 0.0771 Epoch: 2 Global Step: 40650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:58,032-Speed 9761.49 samples/sec Loss 8.9023 LearningRate 0.0771 Epoch: 2 Global Step: 40660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:10:59,137-Speed 9276.00 samples/sec Loss 9.0323 LearningRate 0.0771 Epoch: 2 Global Step: 40670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:00,240-Speed 9286.48 samples/sec Loss 9.0183 LearningRate 0.0771 Epoch: 2 Global Step: 40680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:01,335-Speed 9355.60 samples/sec Loss 8.8972 LearningRate 0.0771 Epoch: 2 Global Step: 40690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:02,451-Speed 9182.79 samples/sec Loss 8.8908 LearningRate 0.0771 Epoch: 2 Global Step: 40700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:03,568-Speed 9179.53 samples/sec Loss 8.9260 LearningRate 0.0771 Epoch: 2 Global Step: 40710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:04,615-Speed 9791.14 samples/sec Loss 9.0169 LearningRate 0.0771 Epoch: 2 Global Step: 40720 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:11:05,715-Speed 9315.55 samples/sec Loss 8.9767 LearningRate 0.0771 Epoch: 2 Global Step: 40730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:06,796-Speed 9475.13 samples/sec Loss 9.0509 LearningRate 0.0771 Epoch: 2 Global Step: 40740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:07,866-Speed 9574.18 samples/sec Loss 8.9618 LearningRate 0.0771 Epoch: 2 Global Step: 40750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:08,941-Speed 9531.82 samples/sec Loss 8.9539 LearningRate 0.0771 Epoch: 2 Global Step: 40760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:09,996-Speed 9706.45 samples/sec Loss 8.8462 LearningRate 0.0771 Epoch: 2 Global Step: 40770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:11,050-Speed 9721.93 samples/sec Loss 8.9458 LearningRate 0.0771 Epoch: 2 Global Step: 40780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:12,101-Speed 9752.37 samples/sec Loss 8.9268 LearningRate 0.0771 Epoch: 2 Global Step: 40790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:11:13,208-Speed 9254.57 samples/sec Loss 8.8051 LearningRate 0.0770 Epoch: 2 Global Step: 40800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:11:14,301-Speed 9377.15 samples/sec Loss 8.9323 LearningRate 0.0770 Epoch: 2 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:11:15,403-Speed 9291.40 samples/sec Loss 8.9646 LearningRate 0.0770 Epoch: 2 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:11:16,528-Speed 9114.63 samples/sec Loss 8.9300 LearningRate 0.0770 Epoch: 2 Global Step: 40830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:11:17,585-Speed 9690.74 samples/sec Loss 8.9685 LearningRate 0.0770 Epoch: 2 Global Step: 40840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:11:18,670-Speed 9443.95 samples/sec Loss 9.0019 LearningRate 0.0770 Epoch: 2 Global Step: 40850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:11:19,712-Speed 9834.56 samples/sec Loss 8.9626 LearningRate 0.0770 Epoch: 2 Global Step: 40860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:11:20,794-Speed 9464.82 samples/sec Loss 8.8959 LearningRate 0.0770 Epoch: 2 Global Step: 40870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:11:21,909-Speed 9190.32 samples/sec Loss 8.9464 LearningRate 0.0770 Epoch: 2 Global Step: 40880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:11:22,986-Speed 9512.67 samples/sec Loss 8.9822 LearningRate 0.0770 Epoch: 2 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:24,073-Speed 9433.57 samples/sec Loss 8.9556 LearningRate 0.0770 Epoch: 2 Global Step: 40900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:25,155-Speed 9468.59 samples/sec Loss 9.1168 LearningRate 0.0770 Epoch: 2 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:26,272-Speed 9172.64 samples/sec Loss 9.0688 LearningRate 0.0770 Epoch: 2 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:27,314-Speed 9833.82 samples/sec Loss 8.9701 LearningRate 0.0770 Epoch: 2 Global Step: 40930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:28,398-Speed 9450.01 samples/sec Loss 8.9163 LearningRate 0.0770 Epoch: 2 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:29,482-Speed 9455.93 samples/sec Loss 8.8712 LearningRate 0.0770 Epoch: 2 Global Step: 40950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:30,544-Speed 9644.98 samples/sec Loss 8.9502 LearningRate 0.0770 Epoch: 2 Global Step: 40960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:31,606-Speed 9643.11 samples/sec Loss 8.9774 LearningRate 0.0770 Epoch: 2 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:32,668-Speed 9654.96 samples/sec Loss 8.8506 LearningRate 0.0770 Epoch: 2 Global Step: 40980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:33,750-Speed 9469.13 samples/sec Loss 8.9521 LearningRate 0.0769 Epoch: 2 Global Step: 40990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:11:34,825-Speed 9537.21 samples/sec Loss 8.9311 LearningRate 0.0769 Epoch: 2 Global Step: 41000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:11:35,912-Speed 9420.30 samples/sec Loss 8.8973 LearningRate 0.0769 Epoch: 2 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:36,973-Speed 9667.63 samples/sec Loss 8.9156 LearningRate 0.0769 Epoch: 2 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:38,065-Speed 9376.56 samples/sec Loss 9.0269 LearningRate 0.0769 Epoch: 2 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:39,189-Speed 9115.93 samples/sec Loss 8.9751 LearningRate 0.0769 Epoch: 2 Global Step: 41040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:40,300-Speed 9223.35 samples/sec Loss 8.8168 LearningRate 0.0769 Epoch: 2 Global Step: 41050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:41,389-Speed 9409.24 samples/sec Loss 8.8202 LearningRate 0.0769 Epoch: 2 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:42,465-Speed 9531.41 samples/sec Loss 9.0169 LearningRate 0.0769 Epoch: 2 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:43,537-Speed 9557.59 samples/sec Loss 9.0053 LearningRate 0.0769 Epoch: 2 Global Step: 41080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:44,611-Speed 9543.21 samples/sec Loss 8.9845 LearningRate 0.0769 Epoch: 2 Global Step: 41090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:45,667-Speed 9701.61 samples/sec Loss 8.9134 LearningRate 0.0769 Epoch: 2 Global Step: 41100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:46,729-Speed 9658.64 samples/sec Loss 8.9614 LearningRate 0.0769 Epoch: 2 Global Step: 41110 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:11:47,825-Speed 9347.45 samples/sec Loss 8.9577 LearningRate 0.0769 Epoch: 2 Global Step: 41120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:11:48,897-Speed 9562.25 samples/sec Loss 8.9708 LearningRate 0.0769 Epoch: 2 Global Step: 41130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:11:49,958-Speed 9653.19 samples/sec Loss 8.8644 LearningRate 0.0769 Epoch: 2 Global Step: 41140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:11:50,980-Speed 10028.63 samples/sec Loss 8.9161 LearningRate 0.0769 Epoch: 2 Global Step: 41150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:52,030-Speed 9758.88 samples/sec Loss 9.0150 LearningRate 0.0769 Epoch: 2 Global Step: 41160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:53,088-Speed 9681.57 samples/sec Loss 8.9359 LearningRate 0.0769 Epoch: 2 Global Step: 41170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:54,183-Speed 9356.45 samples/sec Loss 8.9327 LearningRate 0.0768 Epoch: 2 Global Step: 41180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:55,250-Speed 9606.12 samples/sec Loss 9.0466 LearningRate 0.0768 Epoch: 2 Global Step: 41190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:56,363-Speed 9205.46 samples/sec Loss 8.8979 LearningRate 0.0768 Epoch: 2 Global Step: 41200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:57,454-Speed 9390.00 samples/sec Loss 9.0704 LearningRate 0.0768 Epoch: 2 Global Step: 41210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:58,506-Speed 9742.89 samples/sec Loss 8.9708 LearningRate 0.0768 Epoch: 2 Global Step: 41220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:11:59,587-Speed 9481.94 samples/sec Loss 8.9772 LearningRate 0.0768 Epoch: 2 Global Step: 41230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:00,671-Speed 9449.14 samples/sec Loss 8.9876 LearningRate 0.0768 Epoch: 2 Global Step: 41240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:01,737-Speed 9613.89 samples/sec Loss 8.8320 LearningRate 0.0768 Epoch: 2 Global Step: 41250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:02,848-Speed 9223.01 samples/sec Loss 8.9125 LearningRate 0.0768 Epoch: 2 Global Step: 41260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:03,924-Speed 9525.78 samples/sec Loss 8.9455 LearningRate 0.0768 Epoch: 2 Global Step: 41270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:04,999-Speed 9531.90 samples/sec Loss 9.0216 LearningRate 0.0768 Epoch: 2 Global Step: 41280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:06,070-Speed 9565.74 samples/sec Loss 8.9936 LearningRate 0.0768 Epoch: 2 Global Step: 41290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:07,139-Speed 9582.17 samples/sec Loss 8.9806 LearningRate 0.0768 Epoch: 2 Global Step: 41300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:08,179-Speed 9846.05 samples/sec Loss 8.8791 LearningRate 0.0768 Epoch: 2 Global Step: 41310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:09,262-Speed 9463.24 samples/sec Loss 8.8584 LearningRate 0.0768 Epoch: 2 Global Step: 41320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:10,307-Speed 9810.50 samples/sec Loss 8.9316 LearningRate 0.0768 Epoch: 2 Global Step: 41330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:11,370-Speed 9639.18 samples/sec Loss 8.9153 LearningRate 0.0768 Epoch: 2 Global Step: 41340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:12,458-Speed 9411.65 samples/sec Loss 9.0087 LearningRate 0.0768 Epoch: 2 Global Step: 41350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:13,525-Speed 9601.60 samples/sec Loss 8.9221 LearningRate 0.0768 Epoch: 2 Global Step: 41360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:14,597-Speed 9560.19 samples/sec Loss 8.8941 LearningRate 0.0768 Epoch: 2 Global Step: 41370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:15,692-Speed 9365.76 samples/sec Loss 8.9457 LearningRate 0.0767 Epoch: 2 Global Step: 41380 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:16,790-Speed 9330.77 samples/sec Loss 8.9783 LearningRate 0.0767 Epoch: 2 Global Step: 41390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:17,862-Speed 9552.27 samples/sec Loss 8.9267 LearningRate 0.0767 Epoch: 2 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:18,961-Speed 9327.24 samples/sec Loss 8.9561 LearningRate 0.0767 Epoch: 2 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:20,062-Speed 9304.02 samples/sec Loss 8.8721 LearningRate 0.0767 Epoch: 2 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:21,164-Speed 9298.51 samples/sec Loss 8.8592 LearningRate 0.0767 Epoch: 2 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:22,212-Speed 9774.80 samples/sec Loss 8.9494 LearningRate 0.0767 Epoch: 2 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:23,299-Speed 9428.40 samples/sec Loss 8.9212 LearningRate 0.0767 Epoch: 2 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:24,396-Speed 9337.57 samples/sec Loss 8.8786 LearningRate 0.0767 Epoch: 2 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:25,438-Speed 9830.47 samples/sec Loss 8.9473 LearningRate 0.0767 Epoch: 2 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:26,531-Speed 9383.47 samples/sec Loss 8.9208 LearningRate 0.0767 Epoch: 2 Global Step: 41480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:27,601-Speed 9570.57 samples/sec Loss 8.9469 LearningRate 0.0767 Epoch: 2 Global Step: 41490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:28,635-Speed 9907.80 samples/sec Loss 8.9257 LearningRate 0.0767 Epoch: 2 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:29,758-Speed 9124.05 samples/sec Loss 8.8882 LearningRate 0.0767 Epoch: 2 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:30,854-Speed 9349.06 samples/sec Loss 9.0059 LearningRate 0.0767 Epoch: 2 Global Step: 41520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:31,924-Speed 9576.72 samples/sec Loss 8.9747 LearningRate 0.0767 Epoch: 2 Global Step: 41530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:32,967-Speed 9827.69 samples/sec Loss 8.9523 LearningRate 0.0767 Epoch: 2 Global Step: 41540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:34,050-Speed 9464.82 samples/sec Loss 8.8299 LearningRate 0.0767 Epoch: 2 Global Step: 41550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:35,132-Speed 9463.40 samples/sec Loss 8.9293 LearningRate 0.0767 Epoch: 2 Global Step: 41560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:36,231-Speed 9326.23 samples/sec Loss 8.9316 LearningRate 0.0766 Epoch: 2 Global Step: 41570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:37,305-Speed 9541.96 samples/sec Loss 8.8339 LearningRate 0.0766 Epoch: 2 Global Step: 41580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:38,372-Speed 9610.21 samples/sec Loss 9.0010 LearningRate 0.0766 Epoch: 2 Global Step: 41590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:39,517-Speed 8943.18 samples/sec Loss 8.9428 LearningRate 0.0766 Epoch: 2 Global Step: 41600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:40,581-Speed 9631.15 samples/sec Loss 8.9524 LearningRate 0.0766 Epoch: 2 Global Step: 41610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:41,693-Speed 9216.80 samples/sec Loss 9.0069 LearningRate 0.0766 Epoch: 2 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:42,762-Speed 9577.13 samples/sec Loss 8.7866 LearningRate 0.0766 Epoch: 2 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:43,836-Speed 9544.02 samples/sec Loss 8.8330 LearningRate 0.0766 Epoch: 2 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:44,907-Speed 9569.80 samples/sec Loss 8.8421 LearningRate 0.0766 Epoch: 2 Global Step: 41650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:45,957-Speed 9761.85 samples/sec Loss 8.7768 LearningRate 0.0766 Epoch: 2 Global Step: 41660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:47,036-Speed 9498.92 samples/sec Loss 9.1117 LearningRate 0.0766 Epoch: 2 Global Step: 41670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:48,060-Speed 10005.60 samples/sec Loss 8.9321 LearningRate 0.0766 Epoch: 2 Global Step: 41680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:49,099-Speed 9857.54 samples/sec Loss 8.9485 LearningRate 0.0766 Epoch: 2 Global Step: 41690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:50,188-Speed 9411.99 samples/sec Loss 8.9883 LearningRate 0.0766 Epoch: 2 Global Step: 41700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:51,239-Speed 9752.79 samples/sec Loss 8.9674 LearningRate 0.0766 Epoch: 2 Global Step: 41710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:52,349-Speed 9226.85 samples/sec Loss 8.9434 LearningRate 0.0766 Epoch: 2 Global Step: 41720 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:53,450-Speed 9304.17 samples/sec Loss 8.8845 LearningRate 0.0766 Epoch: 2 Global Step: 41730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:54,552-Speed 9301.02 samples/sec Loss 8.9981 LearningRate 0.0766 Epoch: 2 Global Step: 41740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:55,619-Speed 9602.48 samples/sec Loss 8.8834 LearningRate 0.0766 Epoch: 2 Global Step: 41750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:56,695-Speed 9530.62 samples/sec Loss 8.9471 LearningRate 0.0765 Epoch: 2 Global Step: 41760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:12:57,787-Speed 9380.52 samples/sec Loss 8.9676 LearningRate 0.0765 Epoch: 2 Global Step: 41770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:12:58,837-Speed 9759.18 samples/sec Loss 8.8730 LearningRate 0.0765 Epoch: 2 Global Step: 41780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:12:59,910-Speed 9548.67 samples/sec Loss 8.9118 LearningRate 0.0765 Epoch: 2 Global Step: 41790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:13:00,981-Speed 9562.03 samples/sec Loss 8.8607 LearningRate 0.0765 Epoch: 2 Global Step: 41800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:13:02,049-Speed 9600.91 samples/sec Loss 8.7937 LearningRate 0.0765 Epoch: 2 Global Step: 41810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:13:03,132-Speed 9461.08 samples/sec Loss 8.9442 LearningRate 0.0765 Epoch: 2 Global Step: 41820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:13:04,240-Speed 9244.11 samples/sec Loss 8.7461 LearningRate 0.0765 Epoch: 2 Global Step: 41830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:13:05,316-Speed 9520.21 samples/sec Loss 8.9309 LearningRate 0.0765 Epoch: 2 Global Step: 41840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:13:06,393-Speed 9516.31 samples/sec Loss 8.9232 LearningRate 0.0765 Epoch: 2 Global Step: 41850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:13:07,503-Speed 9232.18 samples/sec Loss 8.9837 LearningRate 0.0765 Epoch: 2 Global Step: 41860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:13:08,564-Speed 9662.67 samples/sec Loss 9.0172 LearningRate 0.0765 Epoch: 2 Global Step: 41870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:13:09,677-Speed 9197.94 samples/sec Loss 8.8613 LearningRate 0.0765 Epoch: 2 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:10,766-Speed 9410.55 samples/sec Loss 9.0024 LearningRate 0.0765 Epoch: 2 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:11,839-Speed 9548.33 samples/sec Loss 8.9887 LearningRate 0.0765 Epoch: 2 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:12,917-Speed 9508.50 samples/sec Loss 8.8248 LearningRate 0.0765 Epoch: 2 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:14,013-Speed 9345.65 samples/sec Loss 8.8878 LearningRate 0.0765 Epoch: 2 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:15,090-Speed 9520.84 samples/sec Loss 9.0758 LearningRate 0.0765 Epoch: 2 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:16,165-Speed 9525.94 samples/sec Loss 8.9615 LearningRate 0.0765 Epoch: 2 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:17,208-Speed 9823.88 samples/sec Loss 8.9391 LearningRate 0.0764 Epoch: 2 Global Step: 41950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:18,240-Speed 9926.07 samples/sec Loss 8.9104 LearningRate 0.0764 Epoch: 2 Global Step: 41960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:19,306-Speed 9615.23 samples/sec Loss 8.7970 LearningRate 0.0764 Epoch: 2 Global Step: 41970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:20,405-Speed 9318.53 samples/sec Loss 8.9417 LearningRate 0.0764 Epoch: 2 Global Step: 41980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:21,466-Speed 9666.82 samples/sec Loss 9.0242 LearningRate 0.0764 Epoch: 2 Global Step: 41990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:22,513-Speed 9779.88 samples/sec Loss 8.8990 LearningRate 0.0764 Epoch: 2 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:13:44,368-[lfw][42000]XNorm: 13.008962 Training: 2022-04-11 13:13:44,368-[lfw][42000]Accuracy-Flip: 0.99450+-0.00224 Training: 2022-04-11 13:13:44,369-[lfw][42000]Accuracy-Highest: 0.99533 Training: 2022-04-11 13:14:09,693-[cfp_fp][42000]XNorm: 10.938136 Training: 2022-04-11 13:14:09,694-[cfp_fp][42000]Accuracy-Flip: 0.93857+-0.01342 Training: 2022-04-11 13:14:09,694-[cfp_fp][42000]Accuracy-Highest: 0.93986 Training: 2022-04-11 13:14:31,568-[agedb_30][42000]XNorm: 12.512479 Training: 2022-04-11 13:14:31,569-[agedb_30][42000]Accuracy-Flip: 0.95333+-0.01101 Training: 2022-04-11 13:14:31,569-[agedb_30][42000]Accuracy-Highest: 0.95333 Training: 2022-04-11 13:14:32,613-Speed 146.08 samples/sec Loss 8.9938 LearningRate 0.0764 Epoch: 2 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:33,667-Speed 9716.94 samples/sec Loss 8.9582 LearningRate 0.0764 Epoch: 2 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:34,748-Speed 9479.19 samples/sec Loss 8.8757 LearningRate 0.0764 Epoch: 2 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:35,807-Speed 9678.72 samples/sec Loss 8.8417 LearningRate 0.0764 Epoch: 2 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:36,875-Speed 9597.25 samples/sec Loss 8.9371 LearningRate 0.0764 Epoch: 2 Global Step: 42050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:37,962-Speed 9430.38 samples/sec Loss 9.0127 LearningRate 0.0764 Epoch: 2 Global Step: 42060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:39,017-Speed 9711.11 samples/sec Loss 8.9387 LearningRate 0.0764 Epoch: 2 Global Step: 42070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:40,079-Speed 9649.49 samples/sec Loss 8.8931 LearningRate 0.0764 Epoch: 2 Global Step: 42080 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:14:41,158-Speed 9495.69 samples/sec Loss 8.9574 LearningRate 0.0764 Epoch: 2 Global Step: 42090 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:14:42,222-Speed 9626.27 samples/sec Loss 8.8835 LearningRate 0.0764 Epoch: 2 Global Step: 42100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:14:43,342-Speed 9151.69 samples/sec Loss 8.9928 LearningRate 0.0764 Epoch: 2 Global Step: 42110 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:14:44,394-Speed 9739.00 samples/sec Loss 8.7610 LearningRate 0.0764 Epoch: 2 Global Step: 42120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:14:45,459-Speed 9620.63 samples/sec Loss 8.9355 LearningRate 0.0764 Epoch: 2 Global Step: 42130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:46,518-Speed 9676.24 samples/sec Loss 8.9489 LearningRate 0.0763 Epoch: 2 Global Step: 42140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:47,631-Speed 9204.11 samples/sec Loss 8.9975 LearningRate 0.0763 Epoch: 2 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:48,679-Speed 9777.93 samples/sec Loss 8.8160 LearningRate 0.0763 Epoch: 2 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:49,750-Speed 9563.60 samples/sec Loss 9.0342 LearningRate 0.0763 Epoch: 2 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:50,850-Speed 9315.78 samples/sec Loss 8.8898 LearningRate 0.0763 Epoch: 2 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:51,944-Speed 9365.23 samples/sec Loss 8.8765 LearningRate 0.0763 Epoch: 2 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:53,001-Speed 9691.69 samples/sec Loss 8.9166 LearningRate 0.0763 Epoch: 2 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:54,062-Speed 9651.64 samples/sec Loss 8.8462 LearningRate 0.0763 Epoch: 2 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:55,110-Speed 9778.08 samples/sec Loss 8.9280 LearningRate 0.0763 Epoch: 2 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:56,138-Speed 9976.57 samples/sec Loss 8.9001 LearningRate 0.0763 Epoch: 2 Global Step: 42230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:57,211-Speed 9541.78 samples/sec Loss 8.9302 LearningRate 0.0763 Epoch: 2 Global Step: 42240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:58,275-Speed 9632.70 samples/sec Loss 8.8059 LearningRate 0.0763 Epoch: 2 Global Step: 42250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:14:59,301-Speed 9985.12 samples/sec Loss 8.9019 LearningRate 0.0763 Epoch: 2 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:00,341-Speed 9851.66 samples/sec Loss 8.8879 LearningRate 0.0763 Epoch: 2 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:01,413-Speed 9557.06 samples/sec Loss 8.8690 LearningRate 0.0763 Epoch: 2 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:02,514-Speed 9306.99 samples/sec Loss 8.8674 LearningRate 0.0763 Epoch: 2 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:03,649-Speed 9031.05 samples/sec Loss 9.0666 LearningRate 0.0763 Epoch: 2 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:04,771-Speed 9132.06 samples/sec Loss 8.8948 LearningRate 0.0763 Epoch: 2 Global Step: 42310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:05,841-Speed 9572.03 samples/sec Loss 8.9039 LearningRate 0.0763 Epoch: 2 Global Step: 42320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:06,942-Speed 9307.33 samples/sec Loss 8.7921 LearningRate 0.0762 Epoch: 2 Global Step: 42330 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:15:08,032-Speed 9403.07 samples/sec Loss 8.8596 LearningRate 0.0762 Epoch: 2 Global Step: 42340 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:15:09,084-Speed 9736.11 samples/sec Loss 8.8294 LearningRate 0.0762 Epoch: 2 Global Step: 42350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:15:10,123-Speed 9858.59 samples/sec Loss 8.9545 LearningRate 0.0762 Epoch: 2 Global Step: 42360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:15:11,194-Speed 9568.70 samples/sec Loss 9.0362 LearningRate 0.0762 Epoch: 2 Global Step: 42370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:15:12,270-Speed 9526.01 samples/sec Loss 8.8393 LearningRate 0.0762 Epoch: 2 Global Step: 42380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:13,384-Speed 9200.39 samples/sec Loss 9.0459 LearningRate 0.0762 Epoch: 2 Global Step: 42390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:14,419-Speed 9900.51 samples/sec Loss 8.9304 LearningRate 0.0762 Epoch: 2 Global Step: 42400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:15,498-Speed 9494.01 samples/sec Loss 8.9359 LearningRate 0.0762 Epoch: 2 Global Step: 42410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:16,573-Speed 9525.88 samples/sec Loss 8.8043 LearningRate 0.0762 Epoch: 2 Global Step: 42420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:17,649-Speed 9528.43 samples/sec Loss 8.9315 LearningRate 0.0762 Epoch: 2 Global Step: 42430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:18,694-Speed 9802.23 samples/sec Loss 8.9175 LearningRate 0.0762 Epoch: 2 Global Step: 42440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:19,756-Speed 9655.49 samples/sec Loss 8.8826 LearningRate 0.0762 Epoch: 2 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:20,858-Speed 9298.72 samples/sec Loss 8.7998 LearningRate 0.0762 Epoch: 2 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:21,945-Speed 9430.28 samples/sec Loss 9.0762 LearningRate 0.0762 Epoch: 2 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:23,032-Speed 9424.92 samples/sec Loss 8.8091 LearningRate 0.0762 Epoch: 2 Global Step: 42480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:15:24,123-Speed 9389.56 samples/sec Loss 8.6940 LearningRate 0.0762 Epoch: 2 Global Step: 42490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:15:25,190-Speed 9598.27 samples/sec Loss 8.8500 LearningRate 0.0762 Epoch: 2 Global Step: 42500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:26,282-Speed 9381.63 samples/sec Loss 8.7450 LearningRate 0.0762 Epoch: 2 Global Step: 42510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:27,360-Speed 9511.54 samples/sec Loss 8.9190 LearningRate 0.0761 Epoch: 2 Global Step: 42520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:28,414-Speed 9711.60 samples/sec Loss 8.8827 LearningRate 0.0761 Epoch: 2 Global Step: 42530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:29,485-Speed 9569.95 samples/sec Loss 8.8716 LearningRate 0.0761 Epoch: 2 Global Step: 42540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:30,586-Speed 9302.77 samples/sec Loss 8.8715 LearningRate 0.0761 Epoch: 2 Global Step: 42550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:31,696-Speed 9236.59 samples/sec Loss 8.8765 LearningRate 0.0761 Epoch: 2 Global Step: 42560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:32,811-Speed 9192.23 samples/sec Loss 8.8534 LearningRate 0.0761 Epoch: 2 Global Step: 42570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:33,893-Speed 9471.11 samples/sec Loss 8.9205 LearningRate 0.0761 Epoch: 2 Global Step: 42580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:35,002-Speed 9235.92 samples/sec Loss 8.9178 LearningRate 0.0761 Epoch: 2 Global Step: 42590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:36,114-Speed 9212.20 samples/sec Loss 8.8243 LearningRate 0.0761 Epoch: 2 Global Step: 42600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:15:37,179-Speed 9621.70 samples/sec Loss 8.9296 LearningRate 0.0761 Epoch: 2 Global Step: 42610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:15:38,272-Speed 9374.54 samples/sec Loss 8.9945 LearningRate 0.0761 Epoch: 2 Global Step: 42620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:15:39,349-Speed 9515.84 samples/sec Loss 8.8764 LearningRate 0.0761 Epoch: 2 Global Step: 42630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:15:40,438-Speed 9407.81 samples/sec Loss 8.9013 LearningRate 0.0761 Epoch: 2 Global Step: 42640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:15:41,505-Speed 9604.84 samples/sec Loss 8.8740 LearningRate 0.0761 Epoch: 2 Global Step: 42650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:15:42,576-Speed 9572.52 samples/sec Loss 8.8895 LearningRate 0.0761 Epoch: 2 Global Step: 42660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:15:43,667-Speed 9393.67 samples/sec Loss 8.9087 LearningRate 0.0761 Epoch: 2 Global Step: 42670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:15:44,715-Speed 9772.42 samples/sec Loss 8.8873 LearningRate 0.0761 Epoch: 2 Global Step: 42680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:15:45,777-Speed 9650.22 samples/sec Loss 8.9013 LearningRate 0.0761 Epoch: 2 Global Step: 42690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:15:46,879-Speed 9293.70 samples/sec Loss 8.8536 LearningRate 0.0761 Epoch: 2 Global Step: 42700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:15:47,947-Speed 9594.85 samples/sec Loss 8.7454 LearningRate 0.0760 Epoch: 2 Global Step: 42710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:49,002-Speed 9712.18 samples/sec Loss 8.9062 LearningRate 0.0760 Epoch: 2 Global Step: 42720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:50,079-Speed 9516.68 samples/sec Loss 8.8284 LearningRate 0.0760 Epoch: 2 Global Step: 42730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:51,132-Speed 9730.30 samples/sec Loss 8.8030 LearningRate 0.0760 Epoch: 2 Global Step: 42740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:52,175-Speed 9819.82 samples/sec Loss 8.8576 LearningRate 0.0760 Epoch: 2 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:53,193-Speed 10064.89 samples/sec Loss 8.7912 LearningRate 0.0760 Epoch: 2 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:54,279-Speed 9435.52 samples/sec Loss 8.8851 LearningRate 0.0760 Epoch: 2 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:55,343-Speed 9628.23 samples/sec Loss 8.9506 LearningRate 0.0760 Epoch: 2 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:56,430-Speed 9434.45 samples/sec Loss 8.8945 LearningRate 0.0760 Epoch: 2 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:57,497-Speed 9596.68 samples/sec Loss 8.7440 LearningRate 0.0760 Epoch: 2 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:15:58,550-Speed 9737.00 samples/sec Loss 8.8921 LearningRate 0.0760 Epoch: 2 Global Step: 42810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:15:59,603-Speed 9723.25 samples/sec Loss 8.8594 LearningRate 0.0760 Epoch: 2 Global Step: 42820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:16:00,638-Speed 9906.20 samples/sec Loss 8.9261 LearningRate 0.0760 Epoch: 2 Global Step: 42830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:01,687-Speed 9763.42 samples/sec Loss 8.7924 LearningRate 0.0760 Epoch: 2 Global Step: 42840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:02,756-Speed 9587.82 samples/sec Loss 8.8627 LearningRate 0.0760 Epoch: 2 Global Step: 42850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:03,862-Speed 9265.32 samples/sec Loss 8.8054 LearningRate 0.0760 Epoch: 2 Global Step: 42860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:04,963-Speed 9307.23 samples/sec Loss 8.7702 LearningRate 0.0760 Epoch: 2 Global Step: 42870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:06,067-Speed 9281.83 samples/sec Loss 8.9636 LearningRate 0.0760 Epoch: 2 Global Step: 42880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:07,129-Speed 9641.91 samples/sec Loss 8.8151 LearningRate 0.0760 Epoch: 2 Global Step: 42890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:08,253-Speed 9121.85 samples/sec Loss 8.8597 LearningRate 0.0759 Epoch: 2 Global Step: 42900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:09,296-Speed 9821.74 samples/sec Loss 8.9409 LearningRate 0.0759 Epoch: 2 Global Step: 42910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:10,320-Speed 10007.75 samples/sec Loss 8.8508 LearningRate 0.0759 Epoch: 2 Global Step: 42920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:11,349-Speed 9950.03 samples/sec Loss 8.9381 LearningRate 0.0759 Epoch: 2 Global Step: 42930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:12,438-Speed 9415.37 samples/sec Loss 8.8251 LearningRate 0.0759 Epoch: 2 Global Step: 42940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:13,512-Speed 9542.41 samples/sec Loss 8.8865 LearningRate 0.0759 Epoch: 2 Global Step: 42950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:14,562-Speed 9757.97 samples/sec Loss 8.7994 LearningRate 0.0759 Epoch: 2 Global Step: 42960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:15,671-Speed 9232.01 samples/sec Loss 8.9434 LearningRate 0.0759 Epoch: 2 Global Step: 42970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:16,718-Speed 9795.19 samples/sec Loss 8.9626 LearningRate 0.0759 Epoch: 2 Global Step: 42980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:17,817-Speed 9324.33 samples/sec Loss 8.8712 LearningRate 0.0759 Epoch: 2 Global Step: 42990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:18,872-Speed 9711.32 samples/sec Loss 8.8713 LearningRate 0.0759 Epoch: 2 Global Step: 43000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:19,909-Speed 9882.41 samples/sec Loss 8.8257 LearningRate 0.0759 Epoch: 2 Global Step: 43010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:20,941-Speed 9922.37 samples/sec Loss 8.8937 LearningRate 0.0759 Epoch: 2 Global Step: 43020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:22,004-Speed 9645.10 samples/sec Loss 8.9655 LearningRate 0.0759 Epoch: 2 Global Step: 43030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:16:23,049-Speed 9800.40 samples/sec Loss 8.8755 LearningRate 0.0759 Epoch: 2 Global Step: 43040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:16:24,137-Speed 9417.06 samples/sec Loss 8.8609 LearningRate 0.0759 Epoch: 2 Global Step: 43050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:25,262-Speed 9106.00 samples/sec Loss 8.8463 LearningRate 0.0759 Epoch: 2 Global Step: 43060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:26,342-Speed 9484.82 samples/sec Loss 8.8893 LearningRate 0.0759 Epoch: 2 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:27,404-Speed 9648.51 samples/sec Loss 8.8228 LearningRate 0.0759 Epoch: 2 Global Step: 43080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:28,536-Speed 9050.28 samples/sec Loss 8.8805 LearningRate 0.0758 Epoch: 2 Global Step: 43090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:29,666-Speed 9068.83 samples/sec Loss 8.7809 LearningRate 0.0758 Epoch: 2 Global Step: 43100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:30,740-Speed 9543.67 samples/sec Loss 8.9697 LearningRate 0.0758 Epoch: 2 Global Step: 43110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:31,840-Speed 9317.00 samples/sec Loss 8.7979 LearningRate 0.0758 Epoch: 2 Global Step: 43120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:32,890-Speed 9758.49 samples/sec Loss 8.7077 LearningRate 0.0758 Epoch: 2 Global Step: 43130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:33,933-Speed 9815.24 samples/sec Loss 8.8087 LearningRate 0.0758 Epoch: 2 Global Step: 43140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:34,995-Speed 9652.56 samples/sec Loss 8.9307 LearningRate 0.0758 Epoch: 2 Global Step: 43150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:36,060-Speed 9623.25 samples/sec Loss 8.7851 LearningRate 0.0758 Epoch: 2 Global Step: 43160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:37,166-Speed 9268.37 samples/sec Loss 8.8124 LearningRate 0.0758 Epoch: 2 Global Step: 43170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:38,204-Speed 9867.10 samples/sec Loss 8.8673 LearningRate 0.0758 Epoch: 2 Global Step: 43180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:39,239-Speed 9898.87 samples/sec Loss 8.8979 LearningRate 0.0758 Epoch: 2 Global Step: 43190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:40,293-Speed 9716.74 samples/sec Loss 8.8711 LearningRate 0.0758 Epoch: 2 Global Step: 43200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:41,368-Speed 9536.42 samples/sec Loss 8.9463 LearningRate 0.0758 Epoch: 2 Global Step: 43210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:42,458-Speed 9404.05 samples/sec Loss 8.8725 LearningRate 0.0758 Epoch: 2 Global Step: 43220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:43,539-Speed 9479.78 samples/sec Loss 9.0121 LearningRate 0.0758 Epoch: 2 Global Step: 43230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:44,626-Speed 9422.49 samples/sec Loss 8.7107 LearningRate 0.0758 Epoch: 2 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:45,701-Speed 9534.66 samples/sec Loss 8.8246 LearningRate 0.0758 Epoch: 2 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:46,741-Speed 9851.38 samples/sec Loss 8.7626 LearningRate 0.0758 Epoch: 2 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:47,780-Speed 9857.30 samples/sec Loss 8.7904 LearningRate 0.0758 Epoch: 2 Global Step: 43270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:48,876-Speed 9354.60 samples/sec Loss 8.8441 LearningRate 0.0758 Epoch: 2 Global Step: 43280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:49,927-Speed 9747.42 samples/sec Loss 8.8107 LearningRate 0.0757 Epoch: 2 Global Step: 43290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:50,998-Speed 9560.88 samples/sec Loss 8.7519 LearningRate 0.0757 Epoch: 2 Global Step: 43300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:52,072-Speed 9544.07 samples/sec Loss 8.9499 LearningRate 0.0757 Epoch: 2 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:53,171-Speed 9320.70 samples/sec Loss 8.7900 LearningRate 0.0757 Epoch: 2 Global Step: 43320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:54,258-Speed 9427.28 samples/sec Loss 8.9822 LearningRate 0.0757 Epoch: 2 Global Step: 43330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:55,320-Speed 9652.89 samples/sec Loss 8.7893 LearningRate 0.0757 Epoch: 2 Global Step: 43340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:56,408-Speed 9421.48 samples/sec Loss 8.9293 LearningRate 0.0757 Epoch: 2 Global Step: 43350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:57,531-Speed 9123.29 samples/sec Loss 8.9651 LearningRate 0.0757 Epoch: 2 Global Step: 43360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:16:58,643-Speed 9213.41 samples/sec Loss 8.7650 LearningRate 0.0757 Epoch: 2 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:16:59,730-Speed 9421.21 samples/sec Loss 8.8084 LearningRate 0.0757 Epoch: 2 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:00,812-Speed 9470.59 samples/sec Loss 8.8171 LearningRate 0.0757 Epoch: 2 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:01,902-Speed 9409.78 samples/sec Loss 8.9647 LearningRate 0.0757 Epoch: 2 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:02,978-Speed 9516.98 samples/sec Loss 8.8870 LearningRate 0.0757 Epoch: 2 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:04,015-Speed 9881.52 samples/sec Loss 8.8677 LearningRate 0.0757 Epoch: 2 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:05,092-Speed 9511.94 samples/sec Loss 8.8100 LearningRate 0.0757 Epoch: 2 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:06,229-Speed 9013.93 samples/sec Loss 8.8202 LearningRate 0.0757 Epoch: 2 Global Step: 43440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:07,296-Speed 9604.21 samples/sec Loss 8.8344 LearningRate 0.0757 Epoch: 2 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:08,364-Speed 9596.17 samples/sec Loss 8.8320 LearningRate 0.0757 Epoch: 2 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:09,438-Speed 9539.63 samples/sec Loss 8.7737 LearningRate 0.0757 Epoch: 2 Global Step: 43470 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:10,535-Speed 9334.49 samples/sec Loss 8.9993 LearningRate 0.0756 Epoch: 2 Global Step: 43480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:11,613-Speed 9503.78 samples/sec Loss 8.8626 LearningRate 0.0756 Epoch: 2 Global Step: 43490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:12,673-Speed 9677.68 samples/sec Loss 8.8459 LearningRate 0.0756 Epoch: 2 Global Step: 43500 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:13,736-Speed 9640.67 samples/sec Loss 8.8466 LearningRate 0.0756 Epoch: 2 Global Step: 43510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:14,836-Speed 9310.64 samples/sec Loss 8.8513 LearningRate 0.0756 Epoch: 2 Global Step: 43520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:15,924-Speed 9420.06 samples/sec Loss 8.9812 LearningRate 0.0756 Epoch: 2 Global Step: 43530 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:16,964-Speed 9844.86 samples/sec Loss 8.8493 LearningRate 0.0756 Epoch: 2 Global Step: 43540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:18,058-Speed 9370.71 samples/sec Loss 8.7925 LearningRate 0.0756 Epoch: 2 Global Step: 43550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:19,154-Speed 9353.12 samples/sec Loss 8.7460 LearningRate 0.0756 Epoch: 2 Global Step: 43560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:20,231-Speed 9508.34 samples/sec Loss 8.8563 LearningRate 0.0756 Epoch: 2 Global Step: 43570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:21,257-Speed 9991.28 samples/sec Loss 8.8153 LearningRate 0.0756 Epoch: 2 Global Step: 43580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:22,352-Speed 9350.28 samples/sec Loss 8.9829 LearningRate 0.0756 Epoch: 2 Global Step: 43590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:23,413-Speed 9660.74 samples/sec Loss 8.8824 LearningRate 0.0756 Epoch: 2 Global Step: 43600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:24,507-Speed 9367.16 samples/sec Loss 8.7703 LearningRate 0.0756 Epoch: 2 Global Step: 43610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:25,577-Speed 9574.23 samples/sec Loss 8.8738 LearningRate 0.0756 Epoch: 2 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:17:26,652-Speed 9525.43 samples/sec Loss 8.9146 LearningRate 0.0756 Epoch: 2 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:17:27,707-Speed 9710.11 samples/sec Loss 8.9286 LearningRate 0.0756 Epoch: 2 Global Step: 43640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:17:28,792-Speed 9449.28 samples/sec Loss 8.8034 LearningRate 0.0756 Epoch: 2 Global Step: 43650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:17:29,852-Speed 9667.42 samples/sec Loss 8.7475 LearningRate 0.0756 Epoch: 2 Global Step: 43660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:17:30,932-Speed 9489.87 samples/sec Loss 9.0600 LearningRate 0.0755 Epoch: 2 Global Step: 43670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:17:32,018-Speed 9437.53 samples/sec Loss 8.9077 LearningRate 0.0755 Epoch: 2 Global Step: 43680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:17:33,056-Speed 9875.00 samples/sec Loss 8.7997 LearningRate 0.0755 Epoch: 2 Global Step: 43690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:17:34,099-Speed 9821.15 samples/sec Loss 8.8908 LearningRate 0.0755 Epoch: 2 Global Step: 43700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:17:35,140-Speed 9843.95 samples/sec Loss 8.7314 LearningRate 0.0755 Epoch: 2 Global Step: 43710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:17:36,183-Speed 9819.89 samples/sec Loss 8.7923 LearningRate 0.0755 Epoch: 2 Global Step: 43720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:37,256-Speed 9555.46 samples/sec Loss 8.7669 LearningRate 0.0755 Epoch: 2 Global Step: 43730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:38,314-Speed 9676.04 samples/sec Loss 8.9514 LearningRate 0.0755 Epoch: 2 Global Step: 43740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:39,406-Speed 9389.08 samples/sec Loss 8.7852 LearningRate 0.0755 Epoch: 2 Global Step: 43750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:40,501-Speed 9358.26 samples/sec Loss 8.8011 LearningRate 0.0755 Epoch: 2 Global Step: 43760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:41,611-Speed 9228.93 samples/sec Loss 8.7557 LearningRate 0.0755 Epoch: 2 Global Step: 43770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:42,682-Speed 9568.12 samples/sec Loss 8.8144 LearningRate 0.0755 Epoch: 2 Global Step: 43780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:43,714-Speed 9924.99 samples/sec Loss 8.8859 LearningRate 0.0755 Epoch: 2 Global Step: 43790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:44,777-Speed 9638.52 samples/sec Loss 8.8308 LearningRate 0.0755 Epoch: 2 Global Step: 43800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:45,838-Speed 9662.31 samples/sec Loss 8.9899 LearningRate 0.0755 Epoch: 2 Global Step: 43810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:46,897-Speed 9679.74 samples/sec Loss 8.7963 LearningRate 0.0755 Epoch: 2 Global Step: 43820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:47,950-Speed 9727.61 samples/sec Loss 8.8820 LearningRate 0.0755 Epoch: 2 Global Step: 43830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:49,025-Speed 9530.61 samples/sec Loss 8.7989 LearningRate 0.0755 Epoch: 2 Global Step: 43840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:50,097-Speed 9559.55 samples/sec Loss 8.7891 LearningRate 0.0755 Epoch: 2 Global Step: 43850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:51,206-Speed 9246.44 samples/sec Loss 8.8128 LearningRate 0.0754 Epoch: 2 Global Step: 43860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:17:52,280-Speed 9537.64 samples/sec Loss 8.7883 LearningRate 0.0754 Epoch: 2 Global Step: 43870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:53,308-Speed 9967.36 samples/sec Loss 8.7888 LearningRate 0.0754 Epoch: 2 Global Step: 43880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:54,417-Speed 9239.69 samples/sec Loss 8.9476 LearningRate 0.0754 Epoch: 2 Global Step: 43890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:55,453-Speed 9881.74 samples/sec Loss 8.8484 LearningRate 0.0754 Epoch: 2 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:56,507-Speed 9727.64 samples/sec Loss 8.7750 LearningRate 0.0754 Epoch: 2 Global Step: 43910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:57,540-Speed 9913.31 samples/sec Loss 8.8479 LearningRate 0.0754 Epoch: 2 Global Step: 43920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:58,569-Speed 9961.00 samples/sec Loss 8.7917 LearningRate 0.0754 Epoch: 2 Global Step: 43930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:17:59,621-Speed 9735.38 samples/sec Loss 8.8305 LearningRate 0.0754 Epoch: 2 Global Step: 43940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:18:00,707-Speed 9437.25 samples/sec Loss 8.7624 LearningRate 0.0754 Epoch: 2 Global Step: 43950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:18:01,787-Speed 9489.31 samples/sec Loss 8.7394 LearningRate 0.0754 Epoch: 2 Global Step: 43960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:18:02,874-Speed 9423.26 samples/sec Loss 8.8629 LearningRate 0.0754 Epoch: 2 Global Step: 43970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:18:03,962-Speed 9423.30 samples/sec Loss 8.7772 LearningRate 0.0754 Epoch: 2 Global Step: 43980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:18:05,090-Speed 9077.65 samples/sec Loss 8.7716 LearningRate 0.0754 Epoch: 2 Global Step: 43990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:18:06,170-Speed 9492.68 samples/sec Loss 8.6962 LearningRate 0.0754 Epoch: 2 Global Step: 44000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:18:27,953-[lfw][44000]XNorm: 13.420222 Training: 2022-04-11 13:18:27,953-[lfw][44000]Accuracy-Flip: 0.99467+-0.00233 Training: 2022-04-11 13:18:27,954-[lfw][44000]Accuracy-Highest: 0.99533 Training: 2022-04-11 13:18:53,142-[cfp_fp][44000]XNorm: 11.165973 Training: 2022-04-11 13:18:53,143-[cfp_fp][44000]Accuracy-Flip: 0.94700+-0.00946 Training: 2022-04-11 13:18:53,144-[cfp_fp][44000]Accuracy-Highest: 0.94700 Training: 2022-04-11 13:19:14,870-[agedb_30][44000]XNorm: 12.899060 Training: 2022-04-11 13:19:14,871-[agedb_30][44000]Accuracy-Flip: 0.95117+-0.01057 Training: 2022-04-11 13:19:14,872-[agedb_30][44000]Accuracy-Highest: 0.95333 Training: 2022-04-11 13:19:15,919-Speed 146.81 samples/sec Loss 8.9209 LearningRate 0.0754 Epoch: 2 Global Step: 44010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:19:16,970-Speed 9749.72 samples/sec Loss 8.7271 LearningRate 0.0754 Epoch: 2 Global Step: 44020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:19:18,090-Speed 9146.68 samples/sec Loss 8.8448 LearningRate 0.0754 Epoch: 2 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:19,187-Speed 9334.14 samples/sec Loss 8.8155 LearningRate 0.0754 Epoch: 2 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:20,259-Speed 9555.58 samples/sec Loss 8.7834 LearningRate 0.0753 Epoch: 2 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:21,342-Speed 9460.41 samples/sec Loss 8.7854 LearningRate 0.0753 Epoch: 2 Global Step: 44060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:22,417-Speed 9531.08 samples/sec Loss 8.8808 LearningRate 0.0753 Epoch: 2 Global Step: 44070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:23,497-Speed 9488.12 samples/sec Loss 8.8473 LearningRate 0.0753 Epoch: 2 Global Step: 44080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:24,605-Speed 9251.73 samples/sec Loss 8.7933 LearningRate 0.0753 Epoch: 2 Global Step: 44090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:25,636-Speed 9935.47 samples/sec Loss 8.8326 LearningRate 0.0753 Epoch: 2 Global Step: 44100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:26,701-Speed 9621.19 samples/sec Loss 8.9323 LearningRate 0.0753 Epoch: 2 Global Step: 44110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:27,751-Speed 9756.41 samples/sec Loss 8.7485 LearningRate 0.0753 Epoch: 2 Global Step: 44120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:28,836-Speed 9448.75 samples/sec Loss 8.8605 LearningRate 0.0753 Epoch: 2 Global Step: 44130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:19:29,878-Speed 9833.47 samples/sec Loss 8.9039 LearningRate 0.0753 Epoch: 2 Global Step: 44140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:19:30,941-Speed 9636.88 samples/sec Loss 8.7672 LearningRate 0.0753 Epoch: 2 Global Step: 44150 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:19:32,045-Speed 9277.82 samples/sec Loss 8.7393 LearningRate 0.0753 Epoch: 2 Global Step: 44160 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:19:33,115-Speed 9578.76 samples/sec Loss 8.8539 LearningRate 0.0753 Epoch: 2 Global Step: 44170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:34,183-Speed 9597.73 samples/sec Loss 8.7441 LearningRate 0.0753 Epoch: 2 Global Step: 44180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:35,258-Speed 9532.01 samples/sec Loss 8.7095 LearningRate 0.0753 Epoch: 2 Global Step: 44190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:36,309-Speed 9745.00 samples/sec Loss 8.7469 LearningRate 0.0753 Epoch: 2 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:37,339-Speed 9948.98 samples/sec Loss 8.7673 LearningRate 0.0753 Epoch: 2 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:38,404-Speed 9619.98 samples/sec Loss 8.8417 LearningRate 0.0753 Epoch: 2 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:39,468-Speed 9629.76 samples/sec Loss 8.7360 LearningRate 0.0753 Epoch: 2 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:40,530-Speed 9647.05 samples/sec Loss 8.8016 LearningRate 0.0753 Epoch: 2 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:41,563-Speed 9923.33 samples/sec Loss 8.7889 LearningRate 0.0752 Epoch: 2 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:42,605-Speed 9836.66 samples/sec Loss 8.8353 LearningRate 0.0752 Epoch: 2 Global Step: 44260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:43,668-Speed 9639.52 samples/sec Loss 8.9564 LearningRate 0.0752 Epoch: 2 Global Step: 44270 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:19:44,732-Speed 9626.68 samples/sec Loss 8.7940 LearningRate 0.0752 Epoch: 2 Global Step: 44280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:19:45,784-Speed 9741.16 samples/sec Loss 8.9046 LearningRate 0.0752 Epoch: 2 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:46,884-Speed 9312.18 samples/sec Loss 8.8050 LearningRate 0.0752 Epoch: 2 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:48,000-Speed 9182.60 samples/sec Loss 8.7870 LearningRate 0.0752 Epoch: 2 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:49,040-Speed 9856.04 samples/sec Loss 8.9060 LearningRate 0.0752 Epoch: 2 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:50,136-Speed 9348.43 samples/sec Loss 8.9373 LearningRate 0.0752 Epoch: 2 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:51,214-Speed 9504.01 samples/sec Loss 8.8446 LearningRate 0.0752 Epoch: 2 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:52,316-Speed 9298.03 samples/sec Loss 8.7975 LearningRate 0.0752 Epoch: 2 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:53,376-Speed 9661.81 samples/sec Loss 8.8676 LearningRate 0.0752 Epoch: 2 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:54,453-Speed 9517.47 samples/sec Loss 8.8233 LearningRate 0.0752 Epoch: 2 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:55,592-Speed 8994.95 samples/sec Loss 8.8220 LearningRate 0.0752 Epoch: 2 Global Step: 44380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:56,673-Speed 9476.07 samples/sec Loss 8.7806 LearningRate 0.0752 Epoch: 2 Global Step: 44390 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:19:57,737-Speed 9639.75 samples/sec Loss 8.8816 LearningRate 0.0752 Epoch: 2 Global Step: 44400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:58,848-Speed 9219.63 samples/sec Loss 8.9045 LearningRate 0.0752 Epoch: 2 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:19:59,952-Speed 9282.08 samples/sec Loss 8.8531 LearningRate 0.0752 Epoch: 2 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:01,012-Speed 9667.49 samples/sec Loss 8.9604 LearningRate 0.0752 Epoch: 2 Global Step: 44430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:02,050-Speed 9865.46 samples/sec Loss 8.7230 LearningRate 0.0751 Epoch: 2 Global Step: 44440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:03,134-Speed 9453.44 samples/sec Loss 8.8121 LearningRate 0.0751 Epoch: 2 Global Step: 44450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:04,202-Speed 9595.80 samples/sec Loss 8.8299 LearningRate 0.0751 Epoch: 2 Global Step: 44460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:05,266-Speed 9631.67 samples/sec Loss 8.7917 LearningRate 0.0751 Epoch: 2 Global Step: 44470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:06,380-Speed 9196.98 samples/sec Loss 8.7676 LearningRate 0.0751 Epoch: 2 Global Step: 44480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:07,423-Speed 9822.86 samples/sec Loss 8.6792 LearningRate 0.0751 Epoch: 2 Global Step: 44490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:08,506-Speed 9459.64 samples/sec Loss 8.7228 LearningRate 0.0751 Epoch: 2 Global Step: 44500 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:20:09,608-Speed 9297.47 samples/sec Loss 8.8836 LearningRate 0.0751 Epoch: 2 Global Step: 44510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:20:10,696-Speed 9417.89 samples/sec Loss 8.7486 LearningRate 0.0751 Epoch: 2 Global Step: 44520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:11,738-Speed 9840.03 samples/sec Loss 8.8404 LearningRate 0.0751 Epoch: 2 Global Step: 44530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:12,827-Speed 9403.22 samples/sec Loss 8.7630 LearningRate 0.0751 Epoch: 2 Global Step: 44540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:13,922-Speed 9358.90 samples/sec Loss 8.6991 LearningRate 0.0751 Epoch: 2 Global Step: 44550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:15,000-Speed 9505.06 samples/sec Loss 8.8255 LearningRate 0.0751 Epoch: 2 Global Step: 44560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:16,078-Speed 9502.04 samples/sec Loss 8.6120 LearningRate 0.0751 Epoch: 2 Global Step: 44570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:17,180-Speed 9305.15 samples/sec Loss 8.7503 LearningRate 0.0751 Epoch: 2 Global Step: 44580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:18,232-Speed 9744.20 samples/sec Loss 8.8107 LearningRate 0.0751 Epoch: 2 Global Step: 44590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:19,299-Speed 9603.07 samples/sec Loss 8.8865 LearningRate 0.0751 Epoch: 2 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:20,380-Speed 9475.96 samples/sec Loss 8.7177 LearningRate 0.0751 Epoch: 2 Global Step: 44610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:21,438-Speed 9685.13 samples/sec Loss 8.7901 LearningRate 0.0751 Epoch: 2 Global Step: 44620 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:20:22,471-Speed 9917.24 samples/sec Loss 8.8259 LearningRate 0.0750 Epoch: 2 Global Step: 44630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:23,565-Speed 9368.19 samples/sec Loss 8.8098 LearningRate 0.0750 Epoch: 2 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:24,665-Speed 9313.90 samples/sec Loss 8.7540 LearningRate 0.0750 Epoch: 2 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:25,731-Speed 9613.74 samples/sec Loss 8.8423 LearningRate 0.0750 Epoch: 2 Global Step: 44660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:26,786-Speed 9712.62 samples/sec Loss 8.7207 LearningRate 0.0750 Epoch: 2 Global Step: 44670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:27,867-Speed 9472.33 samples/sec Loss 8.7951 LearningRate 0.0750 Epoch: 2 Global Step: 44680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:28,988-Speed 9146.11 samples/sec Loss 8.9003 LearningRate 0.0750 Epoch: 2 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:30,045-Speed 9685.44 samples/sec Loss 8.8927 LearningRate 0.0750 Epoch: 2 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:31,137-Speed 9386.37 samples/sec Loss 8.8176 LearningRate 0.0750 Epoch: 2 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:32,246-Speed 9234.17 samples/sec Loss 8.8114 LearningRate 0.0750 Epoch: 2 Global Step: 44720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:33,354-Speed 9248.09 samples/sec Loss 8.7559 LearningRate 0.0750 Epoch: 2 Global Step: 44730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:20:34,458-Speed 9287.45 samples/sec Loss 8.7448 LearningRate 0.0750 Epoch: 2 Global Step: 44740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:20:35,559-Speed 9305.89 samples/sec Loss 8.7844 LearningRate 0.0750 Epoch: 2 Global Step: 44750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:20:36,642-Speed 9464.69 samples/sec Loss 8.6456 LearningRate 0.0750 Epoch: 2 Global Step: 44760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:20:37,721-Speed 9493.23 samples/sec Loss 8.7767 LearningRate 0.0750 Epoch: 2 Global Step: 44770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:20:38,762-Speed 9842.32 samples/sec Loss 8.7308 LearningRate 0.0750 Epoch: 2 Global Step: 44780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:39,785-Speed 10011.60 samples/sec Loss 8.8490 LearningRate 0.0750 Epoch: 2 Global Step: 44790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:40,847-Speed 9654.20 samples/sec Loss 8.6208 LearningRate 0.0750 Epoch: 2 Global Step: 44800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:41,949-Speed 9298.66 samples/sec Loss 8.7948 LearningRate 0.0750 Epoch: 2 Global Step: 44810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:43,023-Speed 9531.73 samples/sec Loss 8.8818 LearningRate 0.0749 Epoch: 2 Global Step: 44820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:44,133-Speed 9232.57 samples/sec Loss 8.9192 LearningRate 0.0749 Epoch: 2 Global Step: 44830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:45,223-Speed 9403.19 samples/sec Loss 8.7029 LearningRate 0.0749 Epoch: 2 Global Step: 44840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:46,295-Speed 9556.37 samples/sec Loss 8.8851 LearningRate 0.0749 Epoch: 2 Global Step: 44850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:47,362-Speed 9602.39 samples/sec Loss 8.7453 LearningRate 0.0749 Epoch: 2 Global Step: 44860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:48,477-Speed 9192.46 samples/sec Loss 8.9138 LearningRate 0.0749 Epoch: 2 Global Step: 44870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:20:49,536-Speed 9668.74 samples/sec Loss 8.7085 LearningRate 0.0749 Epoch: 2 Global Step: 44880 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:20:50,610-Speed 9543.82 samples/sec Loss 8.7363 LearningRate 0.0749 Epoch: 2 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:20:51,657-Speed 9785.70 samples/sec Loss 8.7288 LearningRate 0.0749 Epoch: 2 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:20:52,717-Speed 9667.20 samples/sec Loss 8.6731 LearningRate 0.0749 Epoch: 2 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:20:53,755-Speed 9874.55 samples/sec Loss 8.6950 LearningRate 0.0749 Epoch: 2 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:20:54,842-Speed 9425.82 samples/sec Loss 8.8036 LearningRate 0.0749 Epoch: 2 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:20:55,916-Speed 9538.35 samples/sec Loss 8.8400 LearningRate 0.0749 Epoch: 2 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:20:56,995-Speed 9496.32 samples/sec Loss 8.8288 LearningRate 0.0749 Epoch: 2 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:20:58,067-Speed 9561.72 samples/sec Loss 8.7994 LearningRate 0.0749 Epoch: 2 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:20:59,166-Speed 9327.21 samples/sec Loss 8.8697 LearningRate 0.0749 Epoch: 2 Global Step: 44970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:21:00,217-Speed 9740.32 samples/sec Loss 8.9072 LearningRate 0.0749 Epoch: 2 Global Step: 44980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:21:01,250-Speed 9917.74 samples/sec Loss 8.7040 LearningRate 0.0749 Epoch: 2 Global Step: 44990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:02,304-Speed 9724.69 samples/sec Loss 8.8022 LearningRate 0.0749 Epoch: 2 Global Step: 45000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:03,341-Speed 9878.92 samples/sec Loss 8.7983 LearningRate 0.0749 Epoch: 2 Global Step: 45010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:04,444-Speed 9295.58 samples/sec Loss 8.6992 LearningRate 0.0748 Epoch: 2 Global Step: 45020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:05,522-Speed 9503.89 samples/sec Loss 8.7396 LearningRate 0.0748 Epoch: 2 Global Step: 45030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:06,588-Speed 9610.05 samples/sec Loss 8.7864 LearningRate 0.0748 Epoch: 2 Global Step: 45040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:07,652-Speed 9623.22 samples/sec Loss 8.7108 LearningRate 0.0748 Epoch: 2 Global Step: 45050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:08,709-Speed 9699.51 samples/sec Loss 8.7316 LearningRate 0.0748 Epoch: 2 Global Step: 45060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:09,797-Speed 9414.00 samples/sec Loss 8.7570 LearningRate 0.0748 Epoch: 2 Global Step: 45070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:10,870-Speed 9553.86 samples/sec Loss 8.7483 LearningRate 0.0748 Epoch: 2 Global Step: 45080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:11,938-Speed 9592.30 samples/sec Loss 8.6949 LearningRate 0.0748 Epoch: 2 Global Step: 45090 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:21:12,983-Speed 9809.54 samples/sec Loss 8.7839 LearningRate 0.0748 Epoch: 2 Global Step: 45100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:21:14,099-Speed 9179.26 samples/sec Loss 8.7972 LearningRate 0.0748 Epoch: 2 Global Step: 45110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:15,193-Speed 9366.42 samples/sec Loss 8.6970 LearningRate 0.0748 Epoch: 2 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:16,243-Speed 9752.55 samples/sec Loss 8.7498 LearningRate 0.0748 Epoch: 2 Global Step: 45130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:17,320-Speed 9522.50 samples/sec Loss 8.6524 LearningRate 0.0748 Epoch: 2 Global Step: 45140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:18,406-Speed 9434.92 samples/sec Loss 8.7273 LearningRate 0.0748 Epoch: 2 Global Step: 45150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:19,488-Speed 9468.54 samples/sec Loss 8.6519 LearningRate 0.0748 Epoch: 2 Global Step: 45160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:20,575-Speed 9425.63 samples/sec Loss 8.8382 LearningRate 0.0748 Epoch: 2 Global Step: 45170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:21,635-Speed 9658.00 samples/sec Loss 8.8214 LearningRate 0.0748 Epoch: 2 Global Step: 45180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:22,724-Speed 9410.29 samples/sec Loss 8.8231 LearningRate 0.0748 Epoch: 2 Global Step: 45190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:23,769-Speed 9803.21 samples/sec Loss 8.8637 LearningRate 0.0748 Epoch: 2 Global Step: 45200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:24,847-Speed 9510.65 samples/sec Loss 8.8781 LearningRate 0.0747 Epoch: 2 Global Step: 45210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:25,895-Speed 9771.84 samples/sec Loss 8.8383 LearningRate 0.0747 Epoch: 2 Global Step: 45220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:27,006-Speed 9217.52 samples/sec Loss 8.6908 LearningRate 0.0747 Epoch: 2 Global Step: 45230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:28,113-Speed 9260.03 samples/sec Loss 8.8400 LearningRate 0.0747 Epoch: 2 Global Step: 45240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:29,168-Speed 9720.72 samples/sec Loss 8.7119 LearningRate 0.0747 Epoch: 2 Global Step: 45250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:30,235-Speed 9598.49 samples/sec Loss 8.7656 LearningRate 0.0747 Epoch: 2 Global Step: 45260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:31,342-Speed 9262.26 samples/sec Loss 8.7184 LearningRate 0.0747 Epoch: 2 Global Step: 45270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:32,375-Speed 9911.25 samples/sec Loss 8.7845 LearningRate 0.0747 Epoch: 2 Global Step: 45280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:33,431-Speed 9706.28 samples/sec Loss 8.7855 LearningRate 0.0747 Epoch: 2 Global Step: 45290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:34,525-Speed 9368.83 samples/sec Loss 8.6352 LearningRate 0.0747 Epoch: 2 Global Step: 45300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:35,560-Speed 9901.41 samples/sec Loss 8.8178 LearningRate 0.0747 Epoch: 2 Global Step: 45310 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:21:36,620-Speed 9658.34 samples/sec Loss 8.6910 LearningRate 0.0747 Epoch: 2 Global Step: 45320 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:21:37,678-Speed 9690.68 samples/sec Loss 8.7396 LearningRate 0.0747 Epoch: 2 Global Step: 45330 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:21:38,739-Speed 9655.60 samples/sec Loss 8.6392 LearningRate 0.0747 Epoch: 2 Global Step: 45340 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:21:39,798-Speed 9675.71 samples/sec Loss 8.7956 LearningRate 0.0747 Epoch: 2 Global Step: 45350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:21:40,871-Speed 9547.33 samples/sec Loss 8.7834 LearningRate 0.0747 Epoch: 2 Global Step: 45360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:21:41,973-Speed 9295.59 samples/sec Loss 8.8126 LearningRate 0.0747 Epoch: 2 Global Step: 45370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:43,016-Speed 9820.39 samples/sec Loss 8.7409 LearningRate 0.0747 Epoch: 2 Global Step: 45380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:44,069-Speed 9735.33 samples/sec Loss 8.8166 LearningRate 0.0747 Epoch: 2 Global Step: 45390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:45,120-Speed 9743.62 samples/sec Loss 8.7521 LearningRate 0.0746 Epoch: 2 Global Step: 45400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:46,161-Speed 9843.68 samples/sec Loss 8.8527 LearningRate 0.0746 Epoch: 2 Global Step: 45410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:47,183-Speed 10034.77 samples/sec Loss 8.6200 LearningRate 0.0746 Epoch: 2 Global Step: 45420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:48,219-Speed 9891.64 samples/sec Loss 8.7878 LearningRate 0.0746 Epoch: 2 Global Step: 45430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:49,279-Speed 9664.69 samples/sec Loss 8.7301 LearningRate 0.0746 Epoch: 2 Global Step: 45440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:50,381-Speed 9294.42 samples/sec Loss 8.7245 LearningRate 0.0746 Epoch: 2 Global Step: 45450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:51,484-Speed 9288.74 samples/sec Loss 8.8085 LearningRate 0.0746 Epoch: 2 Global Step: 45460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:52,549-Speed 9622.64 samples/sec Loss 8.7663 LearningRate 0.0746 Epoch: 2 Global Step: 45470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:53,624-Speed 9527.49 samples/sec Loss 8.7546 LearningRate 0.0746 Epoch: 2 Global Step: 45480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:54,724-Speed 9317.98 samples/sec Loss 8.7000 LearningRate 0.0746 Epoch: 2 Global Step: 45490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:55,766-Speed 9835.97 samples/sec Loss 8.6527 LearningRate 0.0746 Epoch: 2 Global Step: 45500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:56,847-Speed 9473.42 samples/sec Loss 8.9544 LearningRate 0.0746 Epoch: 2 Global Step: 45510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:57,898-Speed 9749.95 samples/sec Loss 8.7188 LearningRate 0.0746 Epoch: 2 Global Step: 45520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:21:58,951-Speed 9730.14 samples/sec Loss 8.6906 LearningRate 0.0746 Epoch: 2 Global Step: 45530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:00,011-Speed 9670.28 samples/sec Loss 8.8114 LearningRate 0.0746 Epoch: 2 Global Step: 45540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:01,076-Speed 9623.36 samples/sec Loss 8.7845 LearningRate 0.0746 Epoch: 2 Global Step: 45550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:02,145-Speed 9582.74 samples/sec Loss 8.8469 LearningRate 0.0746 Epoch: 2 Global Step: 45560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:03,168-Speed 10009.02 samples/sec Loss 8.7908 LearningRate 0.0746 Epoch: 2 Global Step: 45570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:04,198-Speed 9951.26 samples/sec Loss 8.8896 LearningRate 0.0746 Epoch: 2 Global Step: 45580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:05,274-Speed 9523.44 samples/sec Loss 8.7742 LearningRate 0.0746 Epoch: 2 Global Step: 45590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:06,330-Speed 9707.82 samples/sec Loss 8.6579 LearningRate 0.0745 Epoch: 2 Global Step: 45600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:07,364-Speed 9910.47 samples/sec Loss 8.6794 LearningRate 0.0745 Epoch: 2 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:08,483-Speed 9158.95 samples/sec Loss 8.7770 LearningRate 0.0745 Epoch: 2 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:09,548-Speed 9611.87 samples/sec Loss 8.7659 LearningRate 0.0745 Epoch: 2 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:10,622-Speed 9545.25 samples/sec Loss 8.7466 LearningRate 0.0745 Epoch: 2 Global Step: 45640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:11,693-Speed 9566.86 samples/sec Loss 8.6494 LearningRate 0.0745 Epoch: 2 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:12,741-Speed 9779.72 samples/sec Loss 8.7305 LearningRate 0.0745 Epoch: 2 Global Step: 45660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:13,773-Speed 9923.39 samples/sec Loss 8.7280 LearningRate 0.0745 Epoch: 2 Global Step: 45670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:14,839-Speed 9610.32 samples/sec Loss 8.6897 LearningRate 0.0745 Epoch: 2 Global Step: 45680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:15,885-Speed 9797.15 samples/sec Loss 8.8029 LearningRate 0.0745 Epoch: 2 Global Step: 45690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:22:16,925-Speed 9854.83 samples/sec Loss 8.8452 LearningRate 0.0745 Epoch: 2 Global Step: 45700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:17,986-Speed 9654.66 samples/sec Loss 8.6930 LearningRate 0.0745 Epoch: 2 Global Step: 45710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:19,013-Speed 9980.32 samples/sec Loss 8.5679 LearningRate 0.0745 Epoch: 2 Global Step: 45720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:20,048-Speed 9903.65 samples/sec Loss 8.5868 LearningRate 0.0745 Epoch: 2 Global Step: 45730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:21,081-Speed 9917.35 samples/sec Loss 8.8030 LearningRate 0.0745 Epoch: 2 Global Step: 45740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:22,161-Speed 9488.83 samples/sec Loss 8.7530 LearningRate 0.0745 Epoch: 2 Global Step: 45750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:23,183-Speed 10017.97 samples/sec Loss 8.7187 LearningRate 0.0745 Epoch: 2 Global Step: 45760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:24,238-Speed 9711.65 samples/sec Loss 8.7876 LearningRate 0.0745 Epoch: 2 Global Step: 45770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:22:25,297-Speed 9678.92 samples/sec Loss 8.7119 LearningRate 0.0745 Epoch: 2 Global Step: 45780 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:22:26,337-Speed 9852.22 samples/sec Loss 8.6477 LearningRate 0.0744 Epoch: 2 Global Step: 45790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:22:27,406-Speed 9584.38 samples/sec Loss 8.6746 LearningRate 0.0744 Epoch: 2 Global Step: 45800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:28,475-Speed 9587.70 samples/sec Loss 8.7528 LearningRate 0.0744 Epoch: 2 Global Step: 45810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:29,507-Speed 9930.66 samples/sec Loss 8.7212 LearningRate 0.0744 Epoch: 2 Global Step: 45820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:30,569-Speed 9648.73 samples/sec Loss 8.7610 LearningRate 0.0744 Epoch: 2 Global Step: 45830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:31,642-Speed 9552.89 samples/sec Loss 8.7244 LearningRate 0.0744 Epoch: 2 Global Step: 45840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:32,717-Speed 9525.60 samples/sec Loss 8.7731 LearningRate 0.0744 Epoch: 2 Global Step: 45850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:33,798-Speed 9479.02 samples/sec Loss 8.6557 LearningRate 0.0744 Epoch: 2 Global Step: 45860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:34,879-Speed 9486.23 samples/sec Loss 8.7029 LearningRate 0.0744 Epoch: 2 Global Step: 45870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:35,948-Speed 9580.87 samples/sec Loss 8.7144 LearningRate 0.0744 Epoch: 2 Global Step: 45880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:37,020-Speed 9560.51 samples/sec Loss 8.7679 LearningRate 0.0744 Epoch: 2 Global Step: 45890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:38,113-Speed 9374.64 samples/sec Loss 8.7744 LearningRate 0.0744 Epoch: 2 Global Step: 45900 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:22:39,171-Speed 9681.67 samples/sec Loss 8.7900 LearningRate 0.0744 Epoch: 2 Global Step: 45910 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:22:40,197-Speed 9993.52 samples/sec Loss 8.8409 LearningRate 0.0744 Epoch: 2 Global Step: 45920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:41,261-Speed 9624.96 samples/sec Loss 8.6911 LearningRate 0.0744 Epoch: 2 Global Step: 45930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:42,303-Speed 9832.19 samples/sec Loss 8.8016 LearningRate 0.0744 Epoch: 2 Global Step: 45940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:43,409-Speed 9260.98 samples/sec Loss 8.8750 LearningRate 0.0744 Epoch: 2 Global Step: 45950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:44,478-Speed 9591.73 samples/sec Loss 8.7124 LearningRate 0.0744 Epoch: 2 Global Step: 45960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:45,526-Speed 9774.32 samples/sec Loss 8.7002 LearningRate 0.0744 Epoch: 2 Global Step: 45970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:46,599-Speed 9549.02 samples/sec Loss 8.6904 LearningRate 0.0743 Epoch: 2 Global Step: 45980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:47,649-Speed 9768.94 samples/sec Loss 8.8401 LearningRate 0.0743 Epoch: 2 Global Step: 45990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:22:48,715-Speed 9607.99 samples/sec Loss 8.8528 LearningRate 0.0743 Epoch: 2 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:23:10,955-[lfw][46000]XNorm: 13.305444 Training: 2022-04-11 13:23:10,956-[lfw][46000]Accuracy-Flip: 0.99583+-0.00171 Training: 2022-04-11 13:23:10,956-[lfw][46000]Accuracy-Highest: 0.99583 Training: 2022-04-11 13:23:36,540-[cfp_fp][46000]XNorm: 11.134653 Training: 2022-04-11 13:23:36,541-[cfp_fp][46000]Accuracy-Flip: 0.94143+-0.01166 Training: 2022-04-11 13:23:36,541-[cfp_fp][46000]Accuracy-Highest: 0.94700 Training: 2022-04-11 13:23:58,484-[agedb_30][46000]XNorm: 12.852607 Training: 2022-04-11 13:23:58,485-[agedb_30][46000]Accuracy-Flip: 0.95483+-0.01104 Training: 2022-04-11 13:23:58,485-[agedb_30][46000]Accuracy-Highest: 0.95483 Training: 2022-04-11 13:23:59,581-Speed 144.50 samples/sec Loss 8.6357 LearningRate 0.0743 Epoch: 2 Global Step: 46010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:24:00,631-Speed 9752.74 samples/sec Loss 8.8439 LearningRate 0.0743 Epoch: 2 Global Step: 46020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:24:01,716-Speed 9443.63 samples/sec Loss 8.6035 LearningRate 0.0743 Epoch: 2 Global Step: 46030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:24:02,762-Speed 9799.53 samples/sec Loss 8.6714 LearningRate 0.0743 Epoch: 2 Global Step: 46040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:24:03,794-Speed 9928.76 samples/sec Loss 8.7012 LearningRate 0.0743 Epoch: 2 Global Step: 46050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:24:04,828-Speed 9910.06 samples/sec Loss 8.7175 LearningRate 0.0743 Epoch: 2 Global Step: 46060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:24:05,910-Speed 9471.92 samples/sec Loss 8.7578 LearningRate 0.0743 Epoch: 2 Global Step: 46070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:24:06,979-Speed 9581.86 samples/sec Loss 8.6313 LearningRate 0.0743 Epoch: 2 Global Step: 46080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:24:08,029-Speed 9754.47 samples/sec Loss 8.8264 LearningRate 0.0743 Epoch: 2 Global Step: 46090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 13:24:09,089-Speed 9671.34 samples/sec Loss 8.5590 LearningRate 0.0743 Epoch: 2 Global Step: 46100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:10,143-Speed 9716.89 samples/sec Loss 8.7519 LearningRate 0.0743 Epoch: 2 Global Step: 46110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:11,206-Speed 9638.85 samples/sec Loss 8.7330 LearningRate 0.0743 Epoch: 2 Global Step: 46120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:12,317-Speed 9222.97 samples/sec Loss 8.6139 LearningRate 0.0743 Epoch: 2 Global Step: 46130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:13,437-Speed 9144.43 samples/sec Loss 8.7355 LearningRate 0.0743 Epoch: 2 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:14,530-Speed 9378.56 samples/sec Loss 8.6501 LearningRate 0.0743 Epoch: 2 Global Step: 46150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:15,618-Speed 9414.86 samples/sec Loss 8.6987 LearningRate 0.0743 Epoch: 2 Global Step: 46160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:16,681-Speed 9645.79 samples/sec Loss 8.7234 LearningRate 0.0743 Epoch: 2 Global Step: 46170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:17,752-Speed 9560.77 samples/sec Loss 8.7590 LearningRate 0.0742 Epoch: 2 Global Step: 46180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:18,846-Speed 9368.95 samples/sec Loss 8.7466 LearningRate 0.0742 Epoch: 2 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:19,929-Speed 9463.10 samples/sec Loss 8.7795 LearningRate 0.0742 Epoch: 2 Global Step: 46200 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:24:21,017-Speed 9413.89 samples/sec Loss 8.7236 LearningRate 0.0742 Epoch: 2 Global Step: 46210 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:24:22,083-Speed 9613.66 samples/sec Loss 8.6940 LearningRate 0.0742 Epoch: 2 Global Step: 46220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:23,185-Speed 9294.40 samples/sec Loss 8.7953 LearningRate 0.0742 Epoch: 2 Global Step: 46230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:24,256-Speed 9566.64 samples/sec Loss 8.7034 LearningRate 0.0742 Epoch: 2 Global Step: 46240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:25,300-Speed 9816.89 samples/sec Loss 8.7546 LearningRate 0.0742 Epoch: 2 Global Step: 46250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:26,411-Speed 9221.02 samples/sec Loss 8.5990 LearningRate 0.0742 Epoch: 2 Global Step: 46260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:27,523-Speed 9216.60 samples/sec Loss 8.7254 LearningRate 0.0742 Epoch: 2 Global Step: 46270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:28,582-Speed 9673.28 samples/sec Loss 8.8094 LearningRate 0.0742 Epoch: 2 Global Step: 46280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:29,626-Speed 9811.85 samples/sec Loss 8.7973 LearningRate 0.0742 Epoch: 2 Global Step: 46290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:30,686-Speed 9668.01 samples/sec Loss 8.7049 LearningRate 0.0742 Epoch: 2 Global Step: 46300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:31,745-Speed 9670.88 samples/sec Loss 8.7291 LearningRate 0.0742 Epoch: 2 Global Step: 46310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:32,831-Speed 9441.02 samples/sec Loss 8.6252 LearningRate 0.0742 Epoch: 2 Global Step: 46320 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:24:33,887-Speed 9702.69 samples/sec Loss 8.7548 LearningRate 0.0742 Epoch: 2 Global Step: 46330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:34,968-Speed 9473.48 samples/sec Loss 8.7271 LearningRate 0.0742 Epoch: 2 Global Step: 46340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:36,073-Speed 9275.18 samples/sec Loss 8.6294 LearningRate 0.0742 Epoch: 2 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:37,138-Speed 9624.50 samples/sec Loss 8.5904 LearningRate 0.0742 Epoch: 2 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:38,180-Speed 9838.04 samples/sec Loss 8.6019 LearningRate 0.0741 Epoch: 2 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:39,224-Speed 9811.36 samples/sec Loss 8.7150 LearningRate 0.0741 Epoch: 2 Global Step: 46380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:40,303-Speed 9499.45 samples/sec Loss 8.6764 LearningRate 0.0741 Epoch: 2 Global Step: 46390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:41,384-Speed 9479.11 samples/sec Loss 8.6426 LearningRate 0.0741 Epoch: 2 Global Step: 46400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:42,503-Speed 9157.98 samples/sec Loss 8.7718 LearningRate 0.0741 Epoch: 2 Global Step: 46410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:43,553-Speed 9755.28 samples/sec Loss 8.7260 LearningRate 0.0741 Epoch: 2 Global Step: 46420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:44,628-Speed 9535.81 samples/sec Loss 8.5450 LearningRate 0.0741 Epoch: 2 Global Step: 46430 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:24:45,689-Speed 9657.44 samples/sec Loss 8.7284 LearningRate 0.0741 Epoch: 2 Global Step: 46440 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:24:46,784-Speed 9353.50 samples/sec Loss 8.7257 LearningRate 0.0741 Epoch: 2 Global Step: 46450 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:24:47,879-Speed 9357.30 samples/sec Loss 8.5912 LearningRate 0.0741 Epoch: 2 Global Step: 46460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:24:48,971-Speed 9382.82 samples/sec Loss 8.7290 LearningRate 0.0741 Epoch: 2 Global Step: 46470 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:24:50,048-Speed 9511.17 samples/sec Loss 8.7418 LearningRate 0.0741 Epoch: 2 Global Step: 46480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:51,108-Speed 9666.24 samples/sec Loss 8.6601 LearningRate 0.0741 Epoch: 2 Global Step: 46490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:52,161-Speed 9727.27 samples/sec Loss 8.6692 LearningRate 0.0741 Epoch: 2 Global Step: 46500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:53,203-Speed 9837.24 samples/sec Loss 8.7101 LearningRate 0.0741 Epoch: 2 Global Step: 46510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:54,282-Speed 9490.56 samples/sec Loss 8.6920 LearningRate 0.0741 Epoch: 2 Global Step: 46520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:55,353-Speed 9574.27 samples/sec Loss 8.7327 LearningRate 0.0741 Epoch: 2 Global Step: 46530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:56,455-Speed 9293.61 samples/sec Loss 8.7420 LearningRate 0.0741 Epoch: 2 Global Step: 46540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:57,518-Speed 9642.60 samples/sec Loss 8.7686 LearningRate 0.0741 Epoch: 2 Global Step: 46550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:58,592-Speed 9540.11 samples/sec Loss 8.6922 LearningRate 0.0741 Epoch: 2 Global Step: 46560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:24:59,668-Speed 9524.79 samples/sec Loss 8.8519 LearningRate 0.0740 Epoch: 2 Global Step: 46570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:00,730-Speed 9645.62 samples/sec Loss 8.7020 LearningRate 0.0740 Epoch: 2 Global Step: 46580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:25:01,772-Speed 9837.04 samples/sec Loss 8.6507 LearningRate 0.0740 Epoch: 2 Global Step: 46590 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:25:02,845-Speed 9541.90 samples/sec Loss 8.6257 LearningRate 0.0740 Epoch: 2 Global Step: 46600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:25:03,908-Speed 9644.12 samples/sec Loss 8.7092 LearningRate 0.0740 Epoch: 2 Global Step: 46610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:25:04,986-Speed 9504.53 samples/sec Loss 8.7492 LearningRate 0.0740 Epoch: 2 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:06,104-Speed 9161.54 samples/sec Loss 8.6625 LearningRate 0.0740 Epoch: 2 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:07,205-Speed 9309.36 samples/sec Loss 8.6244 LearningRate 0.0740 Epoch: 2 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:08,244-Speed 9867.03 samples/sec Loss 8.8757 LearningRate 0.0740 Epoch: 2 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:09,286-Speed 9831.06 samples/sec Loss 8.5867 LearningRate 0.0740 Epoch: 2 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:10,340-Speed 9715.17 samples/sec Loss 8.6006 LearningRate 0.0740 Epoch: 2 Global Step: 46670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:11,421-Speed 9482.80 samples/sec Loss 8.7421 LearningRate 0.0740 Epoch: 2 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:12,464-Speed 9821.45 samples/sec Loss 8.7749 LearningRate 0.0740 Epoch: 2 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:13,559-Speed 9357.83 samples/sec Loss 8.7510 LearningRate 0.0740 Epoch: 2 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:14,645-Speed 9440.15 samples/sec Loss 8.7732 LearningRate 0.0740 Epoch: 2 Global Step: 46710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:15,706-Speed 9656.29 samples/sec Loss 8.7648 LearningRate 0.0740 Epoch: 2 Global Step: 46720 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:25:16,769-Speed 9638.02 samples/sec Loss 8.5846 LearningRate 0.0740 Epoch: 2 Global Step: 46730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:25:17,900-Speed 9061.75 samples/sec Loss 8.6816 LearningRate 0.0740 Epoch: 2 Global Step: 46740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:25:18,995-Speed 9351.44 samples/sec Loss 8.6397 LearningRate 0.0740 Epoch: 2 Global Step: 46750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:25:20,090-Speed 9361.24 samples/sec Loss 8.7717 LearningRate 0.0739 Epoch: 2 Global Step: 46760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:25:21,175-Speed 9445.25 samples/sec Loss 8.6981 LearningRate 0.0739 Epoch: 2 Global Step: 46770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-04-11 13:25:22,243-Speed 9588.34 samples/sec Loss 8.7360 LearningRate 0.0739 Epoch: 2 Global Step: 46780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:23,330-Speed 9424.98 samples/sec Loss 8.7308 LearningRate 0.0739 Epoch: 2 Global Step: 46790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:24,378-Speed 9783.66 samples/sec Loss 8.7553 LearningRate 0.0739 Epoch: 2 Global Step: 46800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:25,447-Speed 9578.62 samples/sec Loss 8.6155 LearningRate 0.0739 Epoch: 2 Global Step: 46810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:26,472-Speed 9994.49 samples/sec Loss 8.7316 LearningRate 0.0739 Epoch: 2 Global Step: 46820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:27,515-Speed 9832.38 samples/sec Loss 8.7957 LearningRate 0.0739 Epoch: 2 Global Step: 46830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:28,581-Speed 9607.85 samples/sec Loss 8.6501 LearningRate 0.0739 Epoch: 2 Global Step: 46840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:29,676-Speed 9358.31 samples/sec Loss 8.7031 LearningRate 0.0739 Epoch: 2 Global Step: 46850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 13:25:30,763-Speed 9424.28 samples/sec Loss 8.5608 LearningRate 0.0739 Epoch: 2 Global Step: 46860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:31,866-Speed 9287.54 samples/sec Loss 8.6613 LearningRate 0.0739 Epoch: 2 Global Step: 46870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:32,958-Speed 9385.59 samples/sec Loss 8.6814 LearningRate 0.0739 Epoch: 2 Global Step: 46880 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:25:34,090-Speed 9045.96 samples/sec Loss 8.5878 LearningRate 0.0739 Epoch: 2 Global Step: 46890 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:25:35,174-Speed 9457.19 samples/sec Loss 8.7130 LearningRate 0.0739 Epoch: 2 Global Step: 46900 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:25:36,301-Speed 9090.59 samples/sec Loss 8.7463 LearningRate 0.0739 Epoch: 2 Global Step: 46910 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:25:37,393-Speed 9390.01 samples/sec Loss 8.6399 LearningRate 0.0739 Epoch: 2 Global Step: 46920 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:25:38,450-Speed 9701.96 samples/sec Loss 8.5601 LearningRate 0.0739 Epoch: 2 Global Step: 46930 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:25:39,515-Speed 9622.70 samples/sec Loss 8.6885 LearningRate 0.0739 Epoch: 2 Global Step: 46940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:40,593-Speed 9498.59 samples/sec Loss 8.6525 LearningRate 0.0738 Epoch: 2 Global Step: 46950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:41,652-Speed 9677.36 samples/sec Loss 8.6925 LearningRate 0.0738 Epoch: 2 Global Step: 46960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:42,729-Speed 9510.48 samples/sec Loss 8.7086 LearningRate 0.0738 Epoch: 2 Global Step: 46970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:43,833-Speed 9288.14 samples/sec Loss 8.6497 LearningRate 0.0738 Epoch: 2 Global Step: 46980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:44,912-Speed 9492.20 samples/sec Loss 8.6237 LearningRate 0.0738 Epoch: 2 Global Step: 46990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:45,968-Speed 9704.06 samples/sec Loss 8.6669 LearningRate 0.0738 Epoch: 2 Global Step: 47000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:47,042-Speed 9542.92 samples/sec Loss 8.7187 LearningRate 0.0738 Epoch: 2 Global Step: 47010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:48,112-Speed 9569.79 samples/sec Loss 8.8249 LearningRate 0.0738 Epoch: 2 Global Step: 47020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:49,160-Speed 9782.20 samples/sec Loss 8.6180 LearningRate 0.0738 Epoch: 2 Global Step: 47030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:50,222-Speed 9643.24 samples/sec Loss 8.7391 LearningRate 0.0738 Epoch: 2 Global Step: 47040 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:25:51,310-Speed 9415.76 samples/sec Loss 8.6799 LearningRate 0.0738 Epoch: 2 Global Step: 47050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:52,409-Speed 9328.43 samples/sec Loss 8.5987 LearningRate 0.0738 Epoch: 2 Global Step: 47060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:53,540-Speed 9057.40 samples/sec Loss 8.6794 LearningRate 0.0738 Epoch: 2 Global Step: 47070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:54,628-Speed 9418.13 samples/sec Loss 8.6305 LearningRate 0.0738 Epoch: 2 Global Step: 47080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:55,727-Speed 9323.16 samples/sec Loss 8.7785 LearningRate 0.0738 Epoch: 2 Global Step: 47090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:56,804-Speed 9510.59 samples/sec Loss 8.6437 LearningRate 0.0738 Epoch: 2 Global Step: 47100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:57,898-Speed 9370.64 samples/sec Loss 8.7383 LearningRate 0.0738 Epoch: 2 Global Step: 47110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:25:58,959-Speed 9655.16 samples/sec Loss 8.6786 LearningRate 0.0738 Epoch: 2 Global Step: 47120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:00,045-Speed 9440.12 samples/sec Loss 8.5937 LearningRate 0.0738 Epoch: 2 Global Step: 47130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:01,136-Speed 9387.97 samples/sec Loss 8.6676 LearningRate 0.0738 Epoch: 2 Global Step: 47140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:02,259-Speed 9121.74 samples/sec Loss 8.6650 LearningRate 0.0737 Epoch: 2 Global Step: 47150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:03,325-Speed 9616.21 samples/sec Loss 8.6786 LearningRate 0.0737 Epoch: 2 Global Step: 47160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:04,384-Speed 9677.04 samples/sec Loss 8.6906 LearningRate 0.0737 Epoch: 2 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:05,459-Speed 9530.02 samples/sec Loss 8.5705 LearningRate 0.0737 Epoch: 2 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:06,546-Speed 9423.98 samples/sec Loss 8.7331 LearningRate 0.0737 Epoch: 2 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:07,623-Speed 9515.46 samples/sec Loss 8.5832 LearningRate 0.0737 Epoch: 2 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:08,701-Speed 9501.41 samples/sec Loss 8.7269 LearningRate 0.0737 Epoch: 2 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:09,855-Speed 8884.19 samples/sec Loss 8.8033 LearningRate 0.0737 Epoch: 2 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:10,924-Speed 9582.13 samples/sec Loss 8.7055 LearningRate 0.0737 Epoch: 2 Global Step: 47230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:12,008-Speed 9453.53 samples/sec Loss 8.8053 LearningRate 0.0737 Epoch: 2 Global Step: 47240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:13,098-Speed 9391.98 samples/sec Loss 8.7512 LearningRate 0.0737 Epoch: 2 Global Step: 47250 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:26:14,174-Speed 9524.71 samples/sec Loss 8.7874 LearningRate 0.0737 Epoch: 2 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:15,209-Speed 9903.86 samples/sec Loss 8.7318 LearningRate 0.0737 Epoch: 2 Global Step: 47270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:16,259-Speed 9764.14 samples/sec Loss 8.6651 LearningRate 0.0737 Epoch: 2 Global Step: 47280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:17,314-Speed 9710.37 samples/sec Loss 8.7815 LearningRate 0.0737 Epoch: 2 Global Step: 47290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:18,419-Speed 9266.33 samples/sec Loss 8.5583 LearningRate 0.0737 Epoch: 2 Global Step: 47300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:19,488-Speed 9585.23 samples/sec Loss 8.7416 LearningRate 0.0737 Epoch: 2 Global Step: 47310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:20,584-Speed 9348.49 samples/sec Loss 8.7142 LearningRate 0.0737 Epoch: 2 Global Step: 47320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:21,699-Speed 9189.62 samples/sec Loss 8.6408 LearningRate 0.0737 Epoch: 2 Global Step: 47330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:22,768-Speed 9585.13 samples/sec Loss 8.7605 LearningRate 0.0736 Epoch: 2 Global Step: 47340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:23,811-Speed 9823.79 samples/sec Loss 8.7864 LearningRate 0.0736 Epoch: 2 Global Step: 47350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:24,866-Speed 9710.90 samples/sec Loss 8.7204 LearningRate 0.0736 Epoch: 2 Global Step: 47360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:25,918-Speed 9741.58 samples/sec Loss 8.6229 LearningRate 0.0736 Epoch: 2 Global Step: 47370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:26,986-Speed 9591.81 samples/sec Loss 8.7811 LearningRate 0.0736 Epoch: 2 Global Step: 47380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:28,063-Speed 9513.97 samples/sec Loss 8.7027 LearningRate 0.0736 Epoch: 2 Global Step: 47390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:29,154-Speed 9393.47 samples/sec Loss 8.7529 LearningRate 0.0736 Epoch: 2 Global Step: 47400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:30,274-Speed 9142.25 samples/sec Loss 8.6324 LearningRate 0.0736 Epoch: 2 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:31,367-Speed 9379.66 samples/sec Loss 8.6475 LearningRate 0.0736 Epoch: 2 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:32,433-Speed 9611.67 samples/sec Loss 8.6540 LearningRate 0.0736 Epoch: 2 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:33,511-Speed 9509.10 samples/sec Loss 8.5979 LearningRate 0.0736 Epoch: 2 Global Step: 47440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:34,551-Speed 9851.39 samples/sec Loss 8.7277 LearningRate 0.0736 Epoch: 2 Global Step: 47450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:35,616-Speed 9618.48 samples/sec Loss 8.7038 LearningRate 0.0736 Epoch: 2 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:36,683-Speed 9607.86 samples/sec Loss 8.7254 LearningRate 0.0736 Epoch: 2 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:37,741-Speed 9683.29 samples/sec Loss 8.6028 LearningRate 0.0736 Epoch: 2 Global Step: 47480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:38,801-Speed 9667.72 samples/sec Loss 8.7128 LearningRate 0.0736 Epoch: 2 Global Step: 47490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:39,912-Speed 9224.70 samples/sec Loss 8.5821 LearningRate 0.0736 Epoch: 2 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:26:40,992-Speed 9488.79 samples/sec Loss 8.7421 LearningRate 0.0736 Epoch: 2 Global Step: 47510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:42,113-Speed 9134.38 samples/sec Loss 8.6554 LearningRate 0.0736 Epoch: 2 Global Step: 47520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:43,158-Speed 9805.41 samples/sec Loss 8.6531 LearningRate 0.0736 Epoch: 2 Global Step: 47530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:44,231-Speed 9549.21 samples/sec Loss 8.7014 LearningRate 0.0735 Epoch: 2 Global Step: 47540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:45,277-Speed 9800.70 samples/sec Loss 8.5862 LearningRate 0.0735 Epoch: 2 Global Step: 47550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:46,324-Speed 9790.43 samples/sec Loss 8.6744 LearningRate 0.0735 Epoch: 2 Global Step: 47560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:47,402-Speed 9501.12 samples/sec Loss 8.5998 LearningRate 0.0735 Epoch: 2 Global Step: 47570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:48,470-Speed 9589.39 samples/sec Loss 8.7331 LearningRate 0.0735 Epoch: 2 Global Step: 47580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:49,561-Speed 9398.34 samples/sec Loss 8.6499 LearningRate 0.0735 Epoch: 2 Global Step: 47590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:50,668-Speed 9254.46 samples/sec Loss 8.7910 LearningRate 0.0735 Epoch: 2 Global Step: 47600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:51,726-Speed 9687.07 samples/sec Loss 8.6900 LearningRate 0.0735 Epoch: 2 Global Step: 47610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:52,829-Speed 9294.91 samples/sec Loss 8.6720 LearningRate 0.0735 Epoch: 2 Global Step: 47620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:53,907-Speed 9501.32 samples/sec Loss 8.5804 LearningRate 0.0735 Epoch: 2 Global Step: 47630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:54,954-Speed 9788.95 samples/sec Loss 8.7009 LearningRate 0.0735 Epoch: 2 Global Step: 47640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:56,004-Speed 9749.33 samples/sec Loss 8.8158 LearningRate 0.0735 Epoch: 2 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:57,096-Speed 9386.12 samples/sec Loss 8.6576 LearningRate 0.0735 Epoch: 2 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:58,152-Speed 9708.38 samples/sec Loss 8.5466 LearningRate 0.0735 Epoch: 2 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:26:59,232-Speed 9485.18 samples/sec Loss 8.6743 LearningRate 0.0735 Epoch: 2 Global Step: 47680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:00,294-Speed 9642.45 samples/sec Loss 8.6471 LearningRate 0.0735 Epoch: 2 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:01,374-Speed 9486.08 samples/sec Loss 8.6964 LearningRate 0.0735 Epoch: 2 Global Step: 47700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:02,507-Speed 9044.67 samples/sec Loss 8.6115 LearningRate 0.0735 Epoch: 2 Global Step: 47710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:03,588-Speed 9482.81 samples/sec Loss 8.6674 LearningRate 0.0735 Epoch: 2 Global Step: 47720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:04,632-Speed 9810.06 samples/sec Loss 8.7321 LearningRate 0.0734 Epoch: 2 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:05,714-Speed 9465.71 samples/sec Loss 8.6510 LearningRate 0.0734 Epoch: 2 Global Step: 47740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:06,803-Speed 9409.63 samples/sec Loss 8.6085 LearningRate 0.0734 Epoch: 2 Global Step: 47750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:07,904-Speed 9305.61 samples/sec Loss 8.7034 LearningRate 0.0734 Epoch: 2 Global Step: 47760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:08,996-Speed 9394.68 samples/sec Loss 8.7727 LearningRate 0.0734 Epoch: 2 Global Step: 47770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:10,065-Speed 9587.92 samples/sec Loss 8.8193 LearningRate 0.0734 Epoch: 2 Global Step: 47780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:11,180-Speed 9190.68 samples/sec Loss 8.6929 LearningRate 0.0734 Epoch: 2 Global Step: 47790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:12,288-Speed 9245.27 samples/sec Loss 8.8012 LearningRate 0.0734 Epoch: 2 Global Step: 47800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:13,373-Speed 9448.74 samples/sec Loss 8.6952 LearningRate 0.0734 Epoch: 2 Global Step: 47810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:14,446-Speed 9541.46 samples/sec Loss 8.6531 LearningRate 0.0734 Epoch: 2 Global Step: 47820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:15,542-Speed 9357.78 samples/sec Loss 8.6714 LearningRate 0.0734 Epoch: 2 Global Step: 47830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:16,564-Speed 10023.56 samples/sec Loss 8.7735 LearningRate 0.0734 Epoch: 2 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:17,664-Speed 9310.53 samples/sec Loss 8.7448 LearningRate 0.0734 Epoch: 2 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:18,747-Speed 9460.39 samples/sec Loss 8.6661 LearningRate 0.0734 Epoch: 2 Global Step: 47860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:19,825-Speed 9508.27 samples/sec Loss 8.6612 LearningRate 0.0734 Epoch: 2 Global Step: 47870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:20,908-Speed 9460.08 samples/sec Loss 8.8066 LearningRate 0.0734 Epoch: 2 Global Step: 47880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:21,958-Speed 9759.22 samples/sec Loss 8.5782 LearningRate 0.0734 Epoch: 2 Global Step: 47890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:23,074-Speed 9178.10 samples/sec Loss 8.6388 LearningRate 0.0734 Epoch: 2 Global Step: 47900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:27:24,139-Speed 9625.44 samples/sec Loss 8.6433 LearningRate 0.0734 Epoch: 2 Global Step: 47910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:25,235-Speed 9345.23 samples/sec Loss 8.6256 LearningRate 0.0734 Epoch: 2 Global Step: 47920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:26,305-Speed 9577.82 samples/sec Loss 8.5434 LearningRate 0.0733 Epoch: 2 Global Step: 47930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:27,409-Speed 9279.94 samples/sec Loss 8.7116 LearningRate 0.0733 Epoch: 2 Global Step: 47940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:28,489-Speed 9493.21 samples/sec Loss 8.7112 LearningRate 0.0733 Epoch: 2 Global Step: 47950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:29,552-Speed 9645.44 samples/sec Loss 8.5132 LearningRate 0.0733 Epoch: 2 Global Step: 47960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:30,644-Speed 9382.21 samples/sec Loss 8.6563 LearningRate 0.0733 Epoch: 2 Global Step: 47970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:31,794-Speed 8909.86 samples/sec Loss 8.5490 LearningRate 0.0733 Epoch: 2 Global Step: 47980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:32,884-Speed 9395.87 samples/sec Loss 8.6384 LearningRate 0.0733 Epoch: 2 Global Step: 47990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:33,940-Speed 9703.08 samples/sec Loss 8.5644 LearningRate 0.0733 Epoch: 2 Global Step: 48000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:27:55,715-[lfw][48000]XNorm: 12.743046 Training: 2022-04-11 13:27:55,715-[lfw][48000]Accuracy-Flip: 0.99583+-0.00291 Training: 2022-04-11 13:27:55,716-[lfw][48000]Accuracy-Highest: 0.99583 Training: 2022-04-11 13:28:20,929-[cfp_fp][48000]XNorm: 10.735440 Training: 2022-04-11 13:28:20,930-[cfp_fp][48000]Accuracy-Flip: 0.94686+-0.01185 Training: 2022-04-11 13:28:20,930-[cfp_fp][48000]Accuracy-Highest: 0.94700 Training: 2022-04-11 13:28:42,638-[agedb_30][48000]XNorm: 12.322298 Training: 2022-04-11 13:28:42,638-[agedb_30][48000]Accuracy-Flip: 0.95433+-0.00810 Training: 2022-04-11 13:28:42,639-[agedb_30][48000]Accuracy-Highest: 0.95483 Training: 2022-04-11 13:28:43,712-Speed 146.77 samples/sec Loss 8.5289 LearningRate 0.0733 Epoch: 2 Global Step: 48010 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:28:44,781-Speed 9585.18 samples/sec Loss 8.6732 LearningRate 0.0733 Epoch: 2 Global Step: 48020 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:28:45,849-Speed 9592.96 samples/sec Loss 8.5902 LearningRate 0.0733 Epoch: 2 Global Step: 48030 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:28:46,891-Speed 9829.54 samples/sec Loss 8.6468 LearningRate 0.0733 Epoch: 2 Global Step: 48040 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:28:47,950-Speed 9680.07 samples/sec Loss 8.6227 LearningRate 0.0733 Epoch: 2 Global Step: 48050 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:28:49,060-Speed 9225.03 samples/sec Loss 8.6764 LearningRate 0.0733 Epoch: 2 Global Step: 48060 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:28:50,121-Speed 9662.51 samples/sec Loss 8.6311 LearningRate 0.0733 Epoch: 2 Global Step: 48070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:28:51,223-Speed 9300.14 samples/sec Loss 8.6428 LearningRate 0.0733 Epoch: 2 Global Step: 48080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:28:52,289-Speed 9611.40 samples/sec Loss 8.5692 LearningRate 0.0733 Epoch: 2 Global Step: 48090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:28:53,338-Speed 9768.16 samples/sec Loss 8.6217 LearningRate 0.0733 Epoch: 2 Global Step: 48100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:28:54,406-Speed 9587.74 samples/sec Loss 8.5579 LearningRate 0.0733 Epoch: 2 Global Step: 48110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:28:55,469-Speed 9645.63 samples/sec Loss 8.6980 LearningRate 0.0732 Epoch: 2 Global Step: 48120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:28:56,542-Speed 9546.59 samples/sec Loss 8.6658 LearningRate 0.0732 Epoch: 2 Global Step: 48130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:28:57,561-Speed 10050.75 samples/sec Loss 8.5673 LearningRate 0.0732 Epoch: 2 Global Step: 48140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:28:58,631-Speed 9573.38 samples/sec Loss 8.6536 LearningRate 0.0732 Epoch: 2 Global Step: 48150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:28:59,697-Speed 9615.14 samples/sec Loss 8.7337 LearningRate 0.0732 Epoch: 2 Global Step: 48160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:00,803-Speed 9260.95 samples/sec Loss 8.7263 LearningRate 0.0732 Epoch: 2 Global Step: 48170 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:29:01,876-Speed 9549.57 samples/sec Loss 8.5941 LearningRate 0.0732 Epoch: 2 Global Step: 48180 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:29:02,965-Speed 9416.97 samples/sec Loss 8.6580 LearningRate 0.0732 Epoch: 2 Global Step: 48190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:04,055-Speed 9396.71 samples/sec Loss 8.6893 LearningRate 0.0732 Epoch: 2 Global Step: 48200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:05,170-Speed 9189.62 samples/sec Loss 8.5824 LearningRate 0.0732 Epoch: 2 Global Step: 48210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:06,215-Speed 9806.91 samples/sec Loss 8.6169 LearningRate 0.0732 Epoch: 2 Global Step: 48220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:07,300-Speed 9443.92 samples/sec Loss 8.6369 LearningRate 0.0732 Epoch: 2 Global Step: 48230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:08,366-Speed 9608.58 samples/sec Loss 8.6595 LearningRate 0.0732 Epoch: 2 Global Step: 48240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:09,455-Speed 9413.30 samples/sec Loss 8.5946 LearningRate 0.0732 Epoch: 2 Global Step: 48250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:10,529-Speed 9537.72 samples/sec Loss 8.7465 LearningRate 0.0732 Epoch: 2 Global Step: 48260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:11,613-Speed 9453.52 samples/sec Loss 8.6218 LearningRate 0.0732 Epoch: 2 Global Step: 48270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:12,674-Speed 9656.88 samples/sec Loss 8.6803 LearningRate 0.0732 Epoch: 2 Global Step: 48280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:13,737-Speed 9633.13 samples/sec Loss 8.6808 LearningRate 0.0732 Epoch: 2 Global Step: 48290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:14,852-Speed 9197.41 samples/sec Loss 8.5292 LearningRate 0.0732 Epoch: 2 Global Step: 48300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:15,952-Speed 9311.60 samples/sec Loss 8.6503 LearningRate 0.0732 Epoch: 2 Global Step: 48310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:17,043-Speed 9396.96 samples/sec Loss 8.6123 LearningRate 0.0731 Epoch: 2 Global Step: 48320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:18,185-Speed 8970.60 samples/sec Loss 8.7287 LearningRate 0.0731 Epoch: 2 Global Step: 48330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:19,267-Speed 9473.60 samples/sec Loss 8.5009 LearningRate 0.0731 Epoch: 2 Global Step: 48340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:20,373-Speed 9258.57 samples/sec Loss 8.6217 LearningRate 0.0731 Epoch: 2 Global Step: 48350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:21,450-Speed 9519.92 samples/sec Loss 8.7171 LearningRate 0.0731 Epoch: 2 Global Step: 48360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:22,517-Speed 9597.31 samples/sec Loss 8.6066 LearningRate 0.0731 Epoch: 2 Global Step: 48370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:23,550-Speed 9921.85 samples/sec Loss 8.4243 LearningRate 0.0731 Epoch: 2 Global Step: 48380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:24,626-Speed 9523.50 samples/sec Loss 8.5695 LearningRate 0.0731 Epoch: 2 Global Step: 48390 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:29:25,679-Speed 9729.11 samples/sec Loss 8.7001 LearningRate 0.0731 Epoch: 2 Global Step: 48400 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:29:26,734-Speed 9706.75 samples/sec Loss 8.6078 LearningRate 0.0731 Epoch: 2 Global Step: 48410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:27,851-Speed 9177.45 samples/sec Loss 8.6925 LearningRate 0.0731 Epoch: 2 Global Step: 48420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:28,931-Speed 9487.51 samples/sec Loss 8.5497 LearningRate 0.0731 Epoch: 2 Global Step: 48430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:30,018-Speed 9423.30 samples/sec Loss 8.5698 LearningRate 0.0731 Epoch: 2 Global Step: 48440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:31,091-Speed 9548.16 samples/sec Loss 8.6191 LearningRate 0.0731 Epoch: 2 Global Step: 48450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:32,124-Speed 9919.28 samples/sec Loss 8.7051 LearningRate 0.0731 Epoch: 2 Global Step: 48460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:33,207-Speed 9457.39 samples/sec Loss 8.5098 LearningRate 0.0731 Epoch: 2 Global Step: 48470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:34,233-Speed 9989.85 samples/sec Loss 8.6481 LearningRate 0.0731 Epoch: 2 Global Step: 48480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:35,266-Speed 9923.32 samples/sec Loss 8.6776 LearningRate 0.0731 Epoch: 2 Global Step: 48490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:36,350-Speed 9449.14 samples/sec Loss 8.7051 LearningRate 0.0731 Epoch: 2 Global Step: 48500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:37,405-Speed 9708.99 samples/sec Loss 8.5729 LearningRate 0.0730 Epoch: 2 Global Step: 48510 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:29:38,494-Speed 9421.30 samples/sec Loss 8.6538 LearningRate 0.0730 Epoch: 2 Global Step: 48520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:29:39,551-Speed 9693.34 samples/sec Loss 8.5190 LearningRate 0.0730 Epoch: 2 Global Step: 48530 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:29:40,621-Speed 9576.81 samples/sec Loss 8.6359 LearningRate 0.0730 Epoch: 2 Global Step: 48540 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:29:41,660-Speed 9858.95 samples/sec Loss 8.6428 LearningRate 0.0730 Epoch: 2 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:29:42,725-Speed 9619.81 samples/sec Loss 8.6477 LearningRate 0.0730 Epoch: 2 Global Step: 48560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:29:43,802-Speed 9512.88 samples/sec Loss 8.6414 LearningRate 0.0730 Epoch: 2 Global Step: 48570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:29:44,877-Speed 9537.09 samples/sec Loss 8.6020 LearningRate 0.0730 Epoch: 2 Global Step: 48580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:29:45,941-Speed 9624.91 samples/sec Loss 8.5658 LearningRate 0.0730 Epoch: 2 Global Step: 48590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:29:47,024-Speed 9465.08 samples/sec Loss 8.6597 LearningRate 0.0730 Epoch: 2 Global Step: 48600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:29:48,084-Speed 9666.83 samples/sec Loss 8.6671 LearningRate 0.0730 Epoch: 2 Global Step: 48610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:29:49,167-Speed 9461.77 samples/sec Loss 8.7540 LearningRate 0.0730 Epoch: 2 Global Step: 48620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:29:50,281-Speed 9193.79 samples/sec Loss 8.5777 LearningRate 0.0730 Epoch: 2 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:29:51,354-Speed 9549.36 samples/sec Loss 8.6982 LearningRate 0.0730 Epoch: 2 Global Step: 48640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:29:52,455-Speed 9307.72 samples/sec Loss 8.6999 LearningRate 0.0730 Epoch: 2 Global Step: 48650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:53,528-Speed 9554.09 samples/sec Loss 8.6042 LearningRate 0.0730 Epoch: 2 Global Step: 48660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:54,585-Speed 9688.23 samples/sec Loss 8.6605 LearningRate 0.0730 Epoch: 2 Global Step: 48670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:55,614-Speed 9959.80 samples/sec Loss 8.6233 LearningRate 0.0730 Epoch: 2 Global Step: 48680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:56,694-Speed 9492.56 samples/sec Loss 8.4690 LearningRate 0.0730 Epoch: 2 Global Step: 48690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:57,794-Speed 9321.27 samples/sec Loss 8.6001 LearningRate 0.0730 Epoch: 2 Global Step: 48700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:58,903-Speed 9235.87 samples/sec Loss 8.7219 LearningRate 0.0729 Epoch: 2 Global Step: 48710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:29:59,997-Speed 9363.74 samples/sec Loss 8.6060 LearningRate 0.0729 Epoch: 2 Global Step: 48720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:01,089-Speed 9381.34 samples/sec Loss 8.5705 LearningRate 0.0729 Epoch: 2 Global Step: 48730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:02,194-Speed 9276.79 samples/sec Loss 8.5421 LearningRate 0.0729 Epoch: 2 Global Step: 48740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:03,254-Speed 9662.19 samples/sec Loss 8.6753 LearningRate 0.0729 Epoch: 2 Global Step: 48750 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:30:04,321-Speed 9603.80 samples/sec Loss 8.4803 LearningRate 0.0729 Epoch: 2 Global Step: 48760 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:30:05,410-Speed 9413.26 samples/sec Loss 8.5345 LearningRate 0.0729 Epoch: 2 Global Step: 48770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:06,484-Speed 9537.59 samples/sec Loss 8.6330 LearningRate 0.0729 Epoch: 2 Global Step: 48780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:07,560-Speed 9522.02 samples/sec Loss 8.5256 LearningRate 0.0729 Epoch: 2 Global Step: 48790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:08,661-Speed 9307.06 samples/sec Loss 8.5493 LearningRate 0.0729 Epoch: 2 Global Step: 48800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:09,731-Speed 9573.73 samples/sec Loss 8.6787 LearningRate 0.0729 Epoch: 2 Global Step: 48810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:10,766-Speed 9903.47 samples/sec Loss 8.6279 LearningRate 0.0729 Epoch: 2 Global Step: 48820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:11,820-Speed 9715.70 samples/sec Loss 8.7059 LearningRate 0.0729 Epoch: 2 Global Step: 48830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:12,859-Speed 9867.75 samples/sec Loss 8.5279 LearningRate 0.0729 Epoch: 2 Global Step: 48840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:13,957-Speed 9333.55 samples/sec Loss 8.6545 LearningRate 0.0729 Epoch: 2 Global Step: 48850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:15,024-Speed 9598.10 samples/sec Loss 8.5421 LearningRate 0.0729 Epoch: 2 Global Step: 48860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:16,085-Speed 9662.49 samples/sec Loss 8.5912 LearningRate 0.0729 Epoch: 2 Global Step: 48870 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:30:17,151-Speed 9607.83 samples/sec Loss 8.6583 LearningRate 0.0729 Epoch: 2 Global Step: 48880 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:30:18,257-Speed 9268.80 samples/sec Loss 8.6153 LearningRate 0.0729 Epoch: 2 Global Step: 48890 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:30:19,313-Speed 9700.57 samples/sec Loss 8.6202 LearningRate 0.0728 Epoch: 2 Global Step: 48900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:20,383-Speed 9580.99 samples/sec Loss 8.6080 LearningRate 0.0728 Epoch: 2 Global Step: 48910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:21,466-Speed 9462.23 samples/sec Loss 8.5397 LearningRate 0.0728 Epoch: 2 Global Step: 48920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:22,500-Speed 9911.00 samples/sec Loss 8.5591 LearningRate 0.0728 Epoch: 2 Global Step: 48930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:23,556-Speed 9700.43 samples/sec Loss 8.6573 LearningRate 0.0728 Epoch: 2 Global Step: 48940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:24,619-Speed 9632.48 samples/sec Loss 8.6347 LearningRate 0.0728 Epoch: 2 Global Step: 48950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:25,671-Speed 9740.68 samples/sec Loss 8.5618 LearningRate 0.0728 Epoch: 2 Global Step: 48960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:26,819-Speed 8928.02 samples/sec Loss 8.5120 LearningRate 0.0728 Epoch: 2 Global Step: 48970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:27,931-Speed 9212.03 samples/sec Loss 8.6643 LearningRate 0.0728 Epoch: 2 Global Step: 48980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:28,982-Speed 9751.53 samples/sec Loss 8.6410 LearningRate 0.0728 Epoch: 2 Global Step: 48990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:30,066-Speed 9446.62 samples/sec Loss 8.6574 LearningRate 0.0728 Epoch: 2 Global Step: 49000 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:30:31,155-Speed 9411.14 samples/sec Loss 8.5935 LearningRate 0.0728 Epoch: 2 Global Step: 49010 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:30:32,235-Speed 9484.05 samples/sec Loss 8.5682 LearningRate 0.0728 Epoch: 2 Global Step: 49020 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:30:33,306-Speed 9573.09 samples/sec Loss 8.5277 LearningRate 0.0728 Epoch: 2 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:30:34,392-Speed 9436.78 samples/sec Loss 8.5329 LearningRate 0.0728 Epoch: 2 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:30:35,491-Speed 9322.02 samples/sec Loss 8.6214 LearningRate 0.0728 Epoch: 2 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:30:36,566-Speed 9535.61 samples/sec Loss 8.6183 LearningRate 0.0728 Epoch: 2 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:30:37,678-Speed 9212.08 samples/sec Loss 8.6525 LearningRate 0.0728 Epoch: 2 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:30:38,750-Speed 9558.05 samples/sec Loss 8.5166 LearningRate 0.0728 Epoch: 2 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:30:39,815-Speed 9624.91 samples/sec Loss 8.5288 LearningRate 0.0728 Epoch: 2 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:30:40,910-Speed 9352.59 samples/sec Loss 8.6464 LearningRate 0.0727 Epoch: 2 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:30:41,990-Speed 9491.65 samples/sec Loss 8.5637 LearningRate 0.0727 Epoch: 2 Global Step: 49110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:30:43,053-Speed 9638.25 samples/sec Loss 8.6384 LearningRate 0.0727 Epoch: 2 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:30:44,114-Speed 9653.90 samples/sec Loss 8.6716 LearningRate 0.0727 Epoch: 2 Global Step: 49130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:45,188-Speed 9541.67 samples/sec Loss 8.6559 LearningRate 0.0727 Epoch: 2 Global Step: 49140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:46,276-Speed 9421.02 samples/sec Loss 8.5534 LearningRate 0.0727 Epoch: 2 Global Step: 49150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:47,390-Speed 9199.70 samples/sec Loss 8.6470 LearningRate 0.0727 Epoch: 2 Global Step: 49160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:48,448-Speed 9675.35 samples/sec Loss 8.6038 LearningRate 0.0727 Epoch: 2 Global Step: 49170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:49,557-Speed 9245.58 samples/sec Loss 8.6068 LearningRate 0.0727 Epoch: 2 Global Step: 49180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:50,629-Speed 9558.14 samples/sec Loss 8.7646 LearningRate 0.0727 Epoch: 2 Global Step: 49190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:51,677-Speed 9783.23 samples/sec Loss 8.5147 LearningRate 0.0727 Epoch: 2 Global Step: 49200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:52,713-Speed 9892.45 samples/sec Loss 8.6210 LearningRate 0.0727 Epoch: 2 Global Step: 49210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:53,788-Speed 9528.90 samples/sec Loss 8.4373 LearningRate 0.0727 Epoch: 2 Global Step: 49220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:54,805-Speed 10076.51 samples/sec Loss 8.5735 LearningRate 0.0727 Epoch: 2 Global Step: 49230 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:30:55,826-Speed 10034.01 samples/sec Loss 8.5748 LearningRate 0.0727 Epoch: 2 Global Step: 49240 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:30:56,876-Speed 9756.73 samples/sec Loss 8.5109 LearningRate 0.0727 Epoch: 2 Global Step: 49250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:57,899-Speed 10013.89 samples/sec Loss 8.5253 LearningRate 0.0727 Epoch: 2 Global Step: 49260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:30:58,959-Speed 9667.70 samples/sec Loss 8.7142 LearningRate 0.0727 Epoch: 2 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:00,003-Speed 9809.23 samples/sec Loss 8.6269 LearningRate 0.0727 Epoch: 2 Global Step: 49280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:01,094-Speed 9394.07 samples/sec Loss 8.6152 LearningRate 0.0726 Epoch: 2 Global Step: 49290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:02,211-Speed 9173.87 samples/sec Loss 8.5770 LearningRate 0.0726 Epoch: 2 Global Step: 49300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:03,341-Speed 9069.47 samples/sec Loss 8.5504 LearningRate 0.0726 Epoch: 2 Global Step: 49310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:04,392-Speed 9746.35 samples/sec Loss 8.6053 LearningRate 0.0726 Epoch: 2 Global Step: 49320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:05,466-Speed 9540.79 samples/sec Loss 8.6308 LearningRate 0.0726 Epoch: 2 Global Step: 49330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:06,525-Speed 9684.15 samples/sec Loss 8.5654 LearningRate 0.0726 Epoch: 2 Global Step: 49340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:07,619-Speed 9365.32 samples/sec Loss 8.6275 LearningRate 0.0726 Epoch: 2 Global Step: 49350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:08,691-Speed 9557.00 samples/sec Loss 8.6253 LearningRate 0.0726 Epoch: 2 Global Step: 49360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:09,733-Speed 9833.57 samples/sec Loss 8.5626 LearningRate 0.0726 Epoch: 2 Global Step: 49370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:10,825-Speed 9379.73 samples/sec Loss 8.6415 LearningRate 0.0726 Epoch: 2 Global Step: 49380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:11,928-Speed 9283.55 samples/sec Loss 8.6018 LearningRate 0.0726 Epoch: 2 Global Step: 49390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:13,020-Speed 9391.79 samples/sec Loss 8.6718 LearningRate 0.0726 Epoch: 2 Global Step: 49400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:14,079-Speed 9671.77 samples/sec Loss 8.6470 LearningRate 0.0726 Epoch: 2 Global Step: 49410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:15,153-Speed 9541.01 samples/sec Loss 8.6156 LearningRate 0.0726 Epoch: 2 Global Step: 49420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:16,243-Speed 9404.30 samples/sec Loss 8.6124 LearningRate 0.0726 Epoch: 2 Global Step: 49430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:17,313-Speed 9571.70 samples/sec Loss 8.6623 LearningRate 0.0726 Epoch: 2 Global Step: 49440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:18,391-Speed 9503.62 samples/sec Loss 8.6222 LearningRate 0.0726 Epoch: 2 Global Step: 49450 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:31:19,436-Speed 9810.72 samples/sec Loss 8.5883 LearningRate 0.0726 Epoch: 2 Global Step: 49460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:20,553-Speed 9171.16 samples/sec Loss 8.4676 LearningRate 0.0726 Epoch: 2 Global Step: 49470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:21,644-Speed 9390.44 samples/sec Loss 8.6011 LearningRate 0.0726 Epoch: 2 Global Step: 49480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:22,729-Speed 9445.39 samples/sec Loss 8.5911 LearningRate 0.0725 Epoch: 2 Global Step: 49490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:23,816-Speed 9430.90 samples/sec Loss 8.6282 LearningRate 0.0725 Epoch: 2 Global Step: 49500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:24,902-Speed 9434.56 samples/sec Loss 8.4980 LearningRate 0.0725 Epoch: 2 Global Step: 49510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:25,972-Speed 9576.67 samples/sec Loss 8.5675 LearningRate 0.0725 Epoch: 2 Global Step: 49520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:27,009-Speed 9878.75 samples/sec Loss 8.7179 LearningRate 0.0725 Epoch: 2 Global Step: 49530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:28,065-Speed 9706.53 samples/sec Loss 8.5737 LearningRate 0.0725 Epoch: 2 Global Step: 49540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:29,132-Speed 9603.87 samples/sec Loss 8.6653 LearningRate 0.0725 Epoch: 2 Global Step: 49550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:30,181-Speed 9766.85 samples/sec Loss 8.6526 LearningRate 0.0725 Epoch: 2 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:31,282-Speed 9301.50 samples/sec Loss 8.6555 LearningRate 0.0725 Epoch: 2 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:32,386-Speed 9283.18 samples/sec Loss 8.5450 LearningRate 0.0725 Epoch: 2 Global Step: 49580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:33,482-Speed 9354.17 samples/sec Loss 8.6426 LearningRate 0.0725 Epoch: 2 Global Step: 49590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:34,587-Speed 9271.40 samples/sec Loss 8.5823 LearningRate 0.0725 Epoch: 2 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:35,685-Speed 9329.08 samples/sec Loss 8.6485 LearningRate 0.0725 Epoch: 2 Global Step: 49610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:36,782-Speed 9342.49 samples/sec Loss 8.5508 LearningRate 0.0725 Epoch: 2 Global Step: 49620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:37,880-Speed 9330.17 samples/sec Loss 8.5255 LearningRate 0.0725 Epoch: 2 Global Step: 49630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:38,988-Speed 9253.93 samples/sec Loss 8.5603 LearningRate 0.0725 Epoch: 2 Global Step: 49640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:40,092-Speed 9281.12 samples/sec Loss 8.6701 LearningRate 0.0725 Epoch: 2 Global Step: 49650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:41,159-Speed 9601.62 samples/sec Loss 8.5656 LearningRate 0.0725 Epoch: 2 Global Step: 49660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:31:42,208-Speed 9773.47 samples/sec Loss 8.6017 LearningRate 0.0725 Epoch: 2 Global Step: 49670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:43,256-Speed 9779.31 samples/sec Loss 8.5634 LearningRate 0.0725 Epoch: 2 Global Step: 49680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:44,319-Speed 9640.34 samples/sec Loss 8.6704 LearningRate 0.0724 Epoch: 2 Global Step: 49690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:45,367-Speed 9777.11 samples/sec Loss 8.6426 LearningRate 0.0724 Epoch: 2 Global Step: 49700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:46,442-Speed 9528.42 samples/sec Loss 8.6496 LearningRate 0.0724 Epoch: 2 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:47,535-Speed 9374.41 samples/sec Loss 8.6322 LearningRate 0.0724 Epoch: 2 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:48,572-Speed 9876.72 samples/sec Loss 8.5751 LearningRate 0.0724 Epoch: 2 Global Step: 49730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:49,638-Speed 9614.73 samples/sec Loss 8.6972 LearningRate 0.0724 Epoch: 2 Global Step: 49740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:50,678-Speed 9850.15 samples/sec Loss 8.4684 LearningRate 0.0724 Epoch: 2 Global Step: 49750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:51,757-Speed 9495.83 samples/sec Loss 8.6256 LearningRate 0.0724 Epoch: 2 Global Step: 49760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:52,823-Speed 9615.29 samples/sec Loss 8.5607 LearningRate 0.0724 Epoch: 2 Global Step: 49770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:31:53,907-Speed 9450.03 samples/sec Loss 8.6310 LearningRate 0.0724 Epoch: 2 Global Step: 49780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:54,975-Speed 9598.65 samples/sec Loss 8.6012 LearningRate 0.0724 Epoch: 2 Global Step: 49790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:56,090-Speed 9191.31 samples/sec Loss 8.6054 LearningRate 0.0724 Epoch: 2 Global Step: 49800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:57,144-Speed 9716.49 samples/sec Loss 8.6025 LearningRate 0.0724 Epoch: 2 Global Step: 49810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:58,223-Speed 9497.14 samples/sec Loss 8.5761 LearningRate 0.0724 Epoch: 2 Global Step: 49820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:31:59,327-Speed 9282.01 samples/sec Loss 8.6417 LearningRate 0.0724 Epoch: 2 Global Step: 49830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:00,462-Speed 9021.99 samples/sec Loss 8.5760 LearningRate 0.0724 Epoch: 2 Global Step: 49840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:01,552-Speed 9405.27 samples/sec Loss 8.4439 LearningRate 0.0724 Epoch: 2 Global Step: 49850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:02,597-Speed 9808.32 samples/sec Loss 8.5452 LearningRate 0.0724 Epoch: 2 Global Step: 49860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:03,636-Speed 9871.01 samples/sec Loss 8.4468 LearningRate 0.0724 Epoch: 2 Global Step: 49870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:04,724-Speed 9412.90 samples/sec Loss 8.5748 LearningRate 0.0723 Epoch: 2 Global Step: 49880 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:32:05,812-Speed 9425.41 samples/sec Loss 8.5480 LearningRate 0.0723 Epoch: 2 Global Step: 49890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:06,888-Speed 9518.90 samples/sec Loss 8.5140 LearningRate 0.0723 Epoch: 2 Global Step: 49900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:07,963-Speed 9528.31 samples/sec Loss 8.5696 LearningRate 0.0723 Epoch: 2 Global Step: 49910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:09,072-Speed 9241.86 samples/sec Loss 8.5518 LearningRate 0.0723 Epoch: 2 Global Step: 49920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:10,137-Speed 9619.09 samples/sec Loss 8.5864 LearningRate 0.0723 Epoch: 2 Global Step: 49930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:11,206-Speed 9588.67 samples/sec Loss 8.5398 LearningRate 0.0723 Epoch: 2 Global Step: 49940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:12,236-Speed 9939.30 samples/sec Loss 8.5693 LearningRate 0.0723 Epoch: 2 Global Step: 49950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:13,275-Speed 9868.40 samples/sec Loss 8.5204 LearningRate 0.0723 Epoch: 2 Global Step: 49960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:14,343-Speed 9595.07 samples/sec Loss 8.6014 LearningRate 0.0723 Epoch: 2 Global Step: 49970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:15,385-Speed 9835.45 samples/sec Loss 8.5849 LearningRate 0.0723 Epoch: 2 Global Step: 49980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:32:16,434-Speed 9764.13 samples/sec Loss 8.5771 LearningRate 0.0723 Epoch: 2 Global Step: 49990 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:32:17,521-Speed 9424.60 samples/sec Loss 8.6085 LearningRate 0.0723 Epoch: 2 Global Step: 50000 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:32:39,655-[lfw][50000]XNorm: 13.013867 Training: 2022-04-11 13:32:39,655-[lfw][50000]Accuracy-Flip: 0.99417+-0.00367 Training: 2022-04-11 13:32:39,656-[lfw][50000]Accuracy-Highest: 0.99583 Training: 2022-04-11 13:33:05,454-[cfp_fp][50000]XNorm: 10.992907 Training: 2022-04-11 13:33:05,454-[cfp_fp][50000]Accuracy-Flip: 0.94686+-0.01207 Training: 2022-04-11 13:33:05,455-[cfp_fp][50000]Accuracy-Highest: 0.94700 Training: 2022-04-11 13:33:27,735-[agedb_30][50000]XNorm: 12.554788 Training: 2022-04-11 13:33:27,736-[agedb_30][50000]Accuracy-Flip: 0.95467+-0.00921 Training: 2022-04-11 13:33:27,736-[agedb_30][50000]Accuracy-Highest: 0.95483 Training: 2022-04-11 13:33:28,833-Speed 143.59 samples/sec Loss 8.6626 LearningRate 0.0723 Epoch: 2 Global Step: 50010 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:33:29,890-Speed 9695.62 samples/sec Loss 8.5304 LearningRate 0.0723 Epoch: 2 Global Step: 50020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:33:30,920-Speed 9946.00 samples/sec Loss 8.7091 LearningRate 0.0723 Epoch: 2 Global Step: 50030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:33:31,988-Speed 9591.46 samples/sec Loss 8.5207 LearningRate 0.0723 Epoch: 2 Global Step: 50040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:33:33,044-Speed 9700.96 samples/sec Loss 8.5665 LearningRate 0.0723 Epoch: 2 Global Step: 50050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:33:34,132-Speed 9424.69 samples/sec Loss 8.4573 LearningRate 0.0723 Epoch: 2 Global Step: 50060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:33:35,413-Speed 7994.61 samples/sec Loss 8.5390 LearningRate 0.0723 Epoch: 2 Global Step: 50070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:03,019-Speed 370.96 samples/sec Loss 7.9372 LearningRate 0.0722 Epoch: 3 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:04,697-Speed 6104.68 samples/sec Loss 7.7026 LearningRate 0.0722 Epoch: 3 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:05,939-Speed 8248.86 samples/sec Loss 7.7866 LearningRate 0.0722 Epoch: 3 Global Step: 50100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:07,068-Speed 9080.95 samples/sec Loss 7.6512 LearningRate 0.0722 Epoch: 3 Global Step: 50110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:08,163-Speed 9352.93 samples/sec Loss 7.7732 LearningRate 0.0722 Epoch: 3 Global Step: 50120 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:34:09,529-Speed 7502.26 samples/sec Loss 7.7649 LearningRate 0.0722 Epoch: 3 Global Step: 50130 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:34:10,595-Speed 9621.29 samples/sec Loss 7.6923 LearningRate 0.0722 Epoch: 3 Global Step: 50140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:11,692-Speed 9339.08 samples/sec Loss 7.8423 LearningRate 0.0722 Epoch: 3 Global Step: 50150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:12,787-Speed 9355.73 samples/sec Loss 7.7708 LearningRate 0.0722 Epoch: 3 Global Step: 50160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:13,839-Speed 9744.54 samples/sec Loss 7.7831 LearningRate 0.0722 Epoch: 3 Global Step: 50170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:14,907-Speed 9594.41 samples/sec Loss 7.7230 LearningRate 0.0722 Epoch: 3 Global Step: 50180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:16,121-Speed 8434.37 samples/sec Loss 7.7178 LearningRate 0.0722 Epoch: 3 Global Step: 50190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:17,258-Speed 9012.44 samples/sec Loss 7.7357 LearningRate 0.0722 Epoch: 3 Global Step: 50200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:18,338-Speed 9487.87 samples/sec Loss 7.7081 LearningRate 0.0722 Epoch: 3 Global Step: 50210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:19,426-Speed 9426.37 samples/sec Loss 7.7461 LearningRate 0.0722 Epoch: 3 Global Step: 50220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:20,530-Speed 9274.74 samples/sec Loss 7.6717 LearningRate 0.0722 Epoch: 3 Global Step: 50230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:21,651-Speed 9141.92 samples/sec Loss 7.8544 LearningRate 0.0722 Epoch: 3 Global Step: 50240 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:34:22,712-Speed 9659.35 samples/sec Loss 7.8009 LearningRate 0.0722 Epoch: 3 Global Step: 50250 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:34:24,056-Speed 7623.55 samples/sec Loss 7.8475 LearningRate 0.0722 Epoch: 3 Global Step: 50260 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:34:25,170-Speed 9191.62 samples/sec Loss 7.6610 LearningRate 0.0721 Epoch: 3 Global Step: 50270 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:34:26,285-Speed 9196.63 samples/sec Loss 7.7167 LearningRate 0.0721 Epoch: 3 Global Step: 50280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:27,365-Speed 9489.68 samples/sec Loss 7.7954 LearningRate 0.0721 Epoch: 3 Global Step: 50290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:28,443-Speed 9504.35 samples/sec Loss 7.7362 LearningRate 0.0721 Epoch: 3 Global Step: 50300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:29,538-Speed 9352.62 samples/sec Loss 7.7372 LearningRate 0.0721 Epoch: 3 Global Step: 50310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:30,619-Speed 9477.49 samples/sec Loss 7.7471 LearningRate 0.0721 Epoch: 3 Global Step: 50320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:31,664-Speed 9805.83 samples/sec Loss 7.8190 LearningRate 0.0721 Epoch: 3 Global Step: 50330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:32,727-Speed 9638.27 samples/sec Loss 7.8527 LearningRate 0.0721 Epoch: 3 Global Step: 50340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:33,803-Speed 9520.75 samples/sec Loss 7.7960 LearningRate 0.0721 Epoch: 3 Global Step: 50350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:34,887-Speed 9449.16 samples/sec Loss 7.8814 LearningRate 0.0721 Epoch: 3 Global Step: 50360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:35,962-Speed 9530.36 samples/sec Loss 7.9161 LearningRate 0.0721 Epoch: 3 Global Step: 50370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:37,057-Speed 9363.73 samples/sec Loss 7.7806 LearningRate 0.0721 Epoch: 3 Global Step: 50380 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:34:38,147-Speed 9397.88 samples/sec Loss 7.8523 LearningRate 0.0721 Epoch: 3 Global Step: 50390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:39,346-Speed 8549.42 samples/sec Loss 7.8130 LearningRate 0.0721 Epoch: 3 Global Step: 50400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:40,444-Speed 9324.70 samples/sec Loss 7.9039 LearningRate 0.0721 Epoch: 3 Global Step: 50410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:41,547-Speed 9292.40 samples/sec Loss 7.8687 LearningRate 0.0721 Epoch: 3 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:42,633-Speed 9440.86 samples/sec Loss 7.7654 LearningRate 0.0721 Epoch: 3 Global Step: 50430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:43,679-Speed 9796.52 samples/sec Loss 7.9068 LearningRate 0.0721 Epoch: 3 Global Step: 50440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:44,744-Speed 9612.37 samples/sec Loss 7.7492 LearningRate 0.0721 Epoch: 3 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:45,830-Speed 9440.19 samples/sec Loss 7.7499 LearningRate 0.0721 Epoch: 3 Global Step: 50460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:47,758-Speed 5312.98 samples/sec Loss 7.8558 LearningRate 0.0720 Epoch: 3 Global Step: 50470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:49,033-Speed 8032.50 samples/sec Loss 7.7369 LearningRate 0.0720 Epoch: 3 Global Step: 50480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:50,106-Speed 9555.50 samples/sec Loss 7.7753 LearningRate 0.0720 Epoch: 3 Global Step: 50490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:51,184-Speed 9500.51 samples/sec Loss 7.9901 LearningRate 0.0720 Epoch: 3 Global Step: 50500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:52,230-Speed 9794.65 samples/sec Loss 7.7782 LearningRate 0.0720 Epoch: 3 Global Step: 50510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:53,328-Speed 9329.29 samples/sec Loss 7.9215 LearningRate 0.0720 Epoch: 3 Global Step: 50520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:54,440-Speed 9219.50 samples/sec Loss 7.9155 LearningRate 0.0720 Epoch: 3 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:55,541-Speed 9303.90 samples/sec Loss 7.8676 LearningRate 0.0720 Epoch: 3 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:56,602-Speed 9662.27 samples/sec Loss 7.8156 LearningRate 0.0720 Epoch: 3 Global Step: 50550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:57,695-Speed 9368.04 samples/sec Loss 7.8181 LearningRate 0.0720 Epoch: 3 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:58,790-Speed 9355.36 samples/sec Loss 7.9685 LearningRate 0.0720 Epoch: 3 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:34:59,925-Speed 9031.75 samples/sec Loss 7.8473 LearningRate 0.0720 Epoch: 3 Global Step: 50580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:01,033-Speed 9251.74 samples/sec Loss 7.9339 LearningRate 0.0720 Epoch: 3 Global Step: 50590 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:35:02,092-Speed 9675.25 samples/sec Loss 7.9426 LearningRate 0.0720 Epoch: 3 Global Step: 50600 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:35:03,148-Speed 9702.23 samples/sec Loss 7.8271 LearningRate 0.0720 Epoch: 3 Global Step: 50610 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:35:04,300-Speed 8900.39 samples/sec Loss 7.8809 LearningRate 0.0720 Epoch: 3 Global Step: 50620 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:35:05,415-Speed 9191.25 samples/sec Loss 7.9798 LearningRate 0.0720 Epoch: 3 Global Step: 50630 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:35:06,489-Speed 9537.20 samples/sec Loss 7.8743 LearningRate 0.0720 Epoch: 3 Global Step: 50640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:35:07,583-Speed 9372.03 samples/sec Loss 7.9252 LearningRate 0.0720 Epoch: 3 Global Step: 50650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:35:08,650-Speed 9598.26 samples/sec Loss 7.8309 LearningRate 0.0720 Epoch: 3 Global Step: 50660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:35:09,707-Speed 9692.07 samples/sec Loss 7.7595 LearningRate 0.0719 Epoch: 3 Global Step: 50670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:10,743-Speed 9893.71 samples/sec Loss 7.8112 LearningRate 0.0719 Epoch: 3 Global Step: 50680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:11,793-Speed 9756.12 samples/sec Loss 7.7976 LearningRate 0.0719 Epoch: 3 Global Step: 50690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:12,913-Speed 9152.66 samples/sec Loss 7.8680 LearningRate 0.0719 Epoch: 3 Global Step: 50700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:13,969-Speed 9699.36 samples/sec Loss 7.8249 LearningRate 0.0719 Epoch: 3 Global Step: 50710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:15,074-Speed 9273.28 samples/sec Loss 7.8844 LearningRate 0.0719 Epoch: 3 Global Step: 50720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:16,110-Speed 9882.90 samples/sec Loss 8.0126 LearningRate 0.0719 Epoch: 3 Global Step: 50730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:35:17,207-Speed 9343.43 samples/sec Loss 7.9024 LearningRate 0.0719 Epoch: 3 Global Step: 50740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:35:18,254-Speed 9792.11 samples/sec Loss 7.8770 LearningRate 0.0719 Epoch: 3 Global Step: 50750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:35:19,342-Speed 9417.67 samples/sec Loss 7.8336 LearningRate 0.0719 Epoch: 3 Global Step: 50760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:35:20,407-Speed 9615.27 samples/sec Loss 7.9026 LearningRate 0.0719 Epoch: 3 Global Step: 50770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:35:21,461-Speed 9727.68 samples/sec Loss 7.7711 LearningRate 0.0719 Epoch: 3 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:35:22,546-Speed 9437.71 samples/sec Loss 7.9148 LearningRate 0.0719 Epoch: 3 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:35:23,602-Speed 9705.15 samples/sec Loss 7.9416 LearningRate 0.0719 Epoch: 3 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:35:24,676-Speed 9539.33 samples/sec Loss 8.0119 LearningRate 0.0719 Epoch: 3 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:35:25,757-Speed 9485.04 samples/sec Loss 8.0232 LearningRate 0.0719 Epoch: 3 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:35:26,833-Speed 9521.25 samples/sec Loss 7.9131 LearningRate 0.0719 Epoch: 3 Global Step: 50830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:27,935-Speed 9291.09 samples/sec Loss 7.9950 LearningRate 0.0719 Epoch: 3 Global Step: 50840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:29,010-Speed 9534.44 samples/sec Loss 7.9308 LearningRate 0.0719 Epoch: 3 Global Step: 50850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:30,054-Speed 9817.11 samples/sec Loss 7.8796 LearningRate 0.0718 Epoch: 3 Global Step: 50860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:31,188-Speed 9033.52 samples/sec Loss 7.9510 LearningRate 0.0718 Epoch: 3 Global Step: 50870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:32,252-Speed 9628.24 samples/sec Loss 7.9018 LearningRate 0.0718 Epoch: 3 Global Step: 50880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:33,313-Speed 9656.46 samples/sec Loss 7.9446 LearningRate 0.0718 Epoch: 3 Global Step: 50890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:34,360-Speed 9788.62 samples/sec Loss 8.0498 LearningRate 0.0718 Epoch: 3 Global Step: 50900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:35,476-Speed 9181.02 samples/sec Loss 7.9171 LearningRate 0.0718 Epoch: 3 Global Step: 50910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:36,592-Speed 9184.42 samples/sec Loss 7.9478 LearningRate 0.0718 Epoch: 3 Global Step: 50920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:37,623-Speed 9931.02 samples/sec Loss 7.8964 LearningRate 0.0718 Epoch: 3 Global Step: 50930 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:35:38,654-Speed 9941.03 samples/sec Loss 7.9857 LearningRate 0.0718 Epoch: 3 Global Step: 50940 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:35:39,692-Speed 9876.26 samples/sec Loss 7.9305 LearningRate 0.0718 Epoch: 3 Global Step: 50950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:40,737-Speed 9801.00 samples/sec Loss 8.1610 LearningRate 0.0718 Epoch: 3 Global Step: 50960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:41,785-Speed 9774.00 samples/sec Loss 8.0386 LearningRate 0.0718 Epoch: 3 Global Step: 50970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:42,890-Speed 9276.79 samples/sec Loss 7.8977 LearningRate 0.0718 Epoch: 3 Global Step: 50980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:43,953-Speed 9637.56 samples/sec Loss 7.9367 LearningRate 0.0718 Epoch: 3 Global Step: 50990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:45,030-Speed 9509.64 samples/sec Loss 7.9879 LearningRate 0.0718 Epoch: 3 Global Step: 51000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:46,082-Speed 9738.78 samples/sec Loss 8.0358 LearningRate 0.0718 Epoch: 3 Global Step: 51010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:47,145-Speed 9643.26 samples/sec Loss 7.9259 LearningRate 0.0718 Epoch: 3 Global Step: 51020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:48,253-Speed 9242.01 samples/sec Loss 8.0252 LearningRate 0.0718 Epoch: 3 Global Step: 51030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:49,297-Speed 9820.11 samples/sec Loss 7.8942 LearningRate 0.0718 Epoch: 3 Global Step: 51040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:50,345-Speed 9779.15 samples/sec Loss 7.9136 LearningRate 0.0718 Epoch: 3 Global Step: 51050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:51,389-Speed 9816.46 samples/sec Loss 7.8881 LearningRate 0.0717 Epoch: 3 Global Step: 51060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:52,439-Speed 9754.45 samples/sec Loss 7.8833 LearningRate 0.0717 Epoch: 3 Global Step: 51070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:53,479-Speed 9850.18 samples/sec Loss 7.9122 LearningRate 0.0717 Epoch: 3 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:54,516-Speed 9881.45 samples/sec Loss 8.0642 LearningRate 0.0717 Epoch: 3 Global Step: 51090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:55,593-Speed 9519.29 samples/sec Loss 8.0843 LearningRate 0.0717 Epoch: 3 Global Step: 51100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:56,635-Speed 9836.48 samples/sec Loss 8.0955 LearningRate 0.0717 Epoch: 3 Global Step: 51110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:57,690-Speed 9706.24 samples/sec Loss 7.9927 LearningRate 0.0717 Epoch: 3 Global Step: 51120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:58,785-Speed 9357.62 samples/sec Loss 7.9386 LearningRate 0.0717 Epoch: 3 Global Step: 51130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:35:59,856-Speed 9564.62 samples/sec Loss 8.0671 LearningRate 0.0717 Epoch: 3 Global Step: 51140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:00,928-Speed 9556.58 samples/sec Loss 7.9339 LearningRate 0.0717 Epoch: 3 Global Step: 51150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:36:01,963-Speed 9906.70 samples/sec Loss 7.9170 LearningRate 0.0717 Epoch: 3 Global Step: 51160 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:36:03,033-Speed 9575.84 samples/sec Loss 8.1064 LearningRate 0.0717 Epoch: 3 Global Step: 51170 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:36:04,143-Speed 9227.64 samples/sec Loss 8.0719 LearningRate 0.0717 Epoch: 3 Global Step: 51180 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:36:05,196-Speed 9726.54 samples/sec Loss 7.9729 LearningRate 0.0717 Epoch: 3 Global Step: 51190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:06,271-Speed 9535.36 samples/sec Loss 7.9333 LearningRate 0.0717 Epoch: 3 Global Step: 51200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:07,343-Speed 9552.23 samples/sec Loss 8.0842 LearningRate 0.0717 Epoch: 3 Global Step: 51210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:08,415-Speed 9568.70 samples/sec Loss 8.0664 LearningRate 0.0717 Epoch: 3 Global Step: 51220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:09,463-Speed 9772.40 samples/sec Loss 8.0177 LearningRate 0.0717 Epoch: 3 Global Step: 51230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:10,510-Speed 9788.95 samples/sec Loss 7.8352 LearningRate 0.0717 Epoch: 3 Global Step: 51240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:11,573-Speed 9641.20 samples/sec Loss 8.0327 LearningRate 0.0717 Epoch: 3 Global Step: 51250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:12,620-Speed 9780.62 samples/sec Loss 8.0293 LearningRate 0.0716 Epoch: 3 Global Step: 51260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:13,689-Speed 9581.39 samples/sec Loss 8.0703 LearningRate 0.0716 Epoch: 3 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:14,748-Speed 9678.55 samples/sec Loss 7.9514 LearningRate 0.0716 Epoch: 3 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:15,787-Speed 9859.80 samples/sec Loss 8.0732 LearningRate 0.0716 Epoch: 3 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:16,830-Speed 9824.16 samples/sec Loss 7.9161 LearningRate 0.0716 Epoch: 3 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:17,880-Speed 9755.38 samples/sec Loss 8.0554 LearningRate 0.0716 Epoch: 3 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:18,996-Speed 9189.11 samples/sec Loss 8.0119 LearningRate 0.0716 Epoch: 3 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:20,035-Speed 9861.79 samples/sec Loss 7.8894 LearningRate 0.0716 Epoch: 3 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:21,132-Speed 9332.01 samples/sec Loss 8.2109 LearningRate 0.0716 Epoch: 3 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:22,193-Speed 9663.26 samples/sec Loss 7.9923 LearningRate 0.0716 Epoch: 3 Global Step: 51350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:23,290-Speed 9332.86 samples/sec Loss 8.1016 LearningRate 0.0716 Epoch: 3 Global Step: 51360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:24,333-Speed 9835.13 samples/sec Loss 8.1285 LearningRate 0.0716 Epoch: 3 Global Step: 51370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:25,369-Speed 9888.46 samples/sec Loss 7.9687 LearningRate 0.0716 Epoch: 3 Global Step: 51380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:26,434-Speed 9618.46 samples/sec Loss 8.0259 LearningRate 0.0716 Epoch: 3 Global Step: 51390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:27,513-Speed 9495.26 samples/sec Loss 8.1173 LearningRate 0.0716 Epoch: 3 Global Step: 51400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:28,598-Speed 9443.00 samples/sec Loss 8.0964 LearningRate 0.0716 Epoch: 3 Global Step: 51410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:29,671-Speed 9549.38 samples/sec Loss 8.0113 LearningRate 0.0716 Epoch: 3 Global Step: 51420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:30,756-Speed 9455.27 samples/sec Loss 8.0618 LearningRate 0.0716 Epoch: 3 Global Step: 51430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:31,886-Speed 9064.94 samples/sec Loss 8.1028 LearningRate 0.0716 Epoch: 3 Global Step: 51440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:32,955-Speed 9588.46 samples/sec Loss 8.1454 LearningRate 0.0716 Epoch: 3 Global Step: 51450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:34,038-Speed 9453.01 samples/sec Loss 8.2106 LearningRate 0.0715 Epoch: 3 Global Step: 51460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:35,076-Speed 9869.16 samples/sec Loss 8.1026 LearningRate 0.0715 Epoch: 3 Global Step: 51470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:36:36,147-Speed 9569.13 samples/sec Loss 8.0111 LearningRate 0.0715 Epoch: 3 Global Step: 51480 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:36:37,273-Speed 9096.49 samples/sec Loss 8.1833 LearningRate 0.0715 Epoch: 3 Global Step: 51490 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:36:38,338-Speed 9620.77 samples/sec Loss 8.1434 LearningRate 0.0715 Epoch: 3 Global Step: 51500 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:36:39,426-Speed 9420.22 samples/sec Loss 8.0253 LearningRate 0.0715 Epoch: 3 Global Step: 51510 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:36:40,474-Speed 9779.05 samples/sec Loss 8.0582 LearningRate 0.0715 Epoch: 3 Global Step: 51520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:36:41,509-Speed 9895.17 samples/sec Loss 8.1606 LearningRate 0.0715 Epoch: 3 Global Step: 51530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:42,627-Speed 9168.27 samples/sec Loss 8.0235 LearningRate 0.0715 Epoch: 3 Global Step: 51540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:43,703-Speed 9521.25 samples/sec Loss 8.0241 LearningRate 0.0715 Epoch: 3 Global Step: 51550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:44,769-Speed 9611.39 samples/sec Loss 8.1393 LearningRate 0.0715 Epoch: 3 Global Step: 51560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:45,877-Speed 9250.26 samples/sec Loss 8.1237 LearningRate 0.0715 Epoch: 3 Global Step: 51570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:46,966-Speed 9411.84 samples/sec Loss 8.1078 LearningRate 0.0715 Epoch: 3 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:48,054-Speed 9414.97 samples/sec Loss 8.1381 LearningRate 0.0715 Epoch: 3 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:49,138-Speed 9457.10 samples/sec Loss 8.1278 LearningRate 0.0715 Epoch: 3 Global Step: 51600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:50,229-Speed 9388.78 samples/sec Loss 8.0882 LearningRate 0.0715 Epoch: 3 Global Step: 51610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:51,287-Speed 9684.53 samples/sec Loss 8.0886 LearningRate 0.0715 Epoch: 3 Global Step: 51620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:52,343-Speed 9702.53 samples/sec Loss 8.0056 LearningRate 0.0715 Epoch: 3 Global Step: 51630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:53,415-Speed 9556.35 samples/sec Loss 8.1355 LearningRate 0.0715 Epoch: 3 Global Step: 51640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:54,468-Speed 9735.96 samples/sec Loss 8.0754 LearningRate 0.0714 Epoch: 3 Global Step: 51650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:55,538-Speed 9578.19 samples/sec Loss 8.1559 LearningRate 0.0714 Epoch: 3 Global Step: 51660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:56,627-Speed 9401.27 samples/sec Loss 7.9985 LearningRate 0.0714 Epoch: 3 Global Step: 51670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:57,717-Speed 9399.32 samples/sec Loss 8.0194 LearningRate 0.0714 Epoch: 3 Global Step: 51680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:36:58,749-Speed 9933.51 samples/sec Loss 8.0634 LearningRate 0.0714 Epoch: 3 Global Step: 51690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:36:59,863-Speed 9193.46 samples/sec Loss 8.0883 LearningRate 0.0714 Epoch: 3 Global Step: 51700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:37:00,948-Speed 9448.48 samples/sec Loss 8.1016 LearningRate 0.0714 Epoch: 3 Global Step: 51710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:37:02,015-Speed 9600.46 samples/sec Loss 7.9962 LearningRate 0.0714 Epoch: 3 Global Step: 51720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:37:03,072-Speed 9695.24 samples/sec Loss 8.0894 LearningRate 0.0714 Epoch: 3 Global Step: 51730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:37:04,147-Speed 9532.38 samples/sec Loss 8.1524 LearningRate 0.0714 Epoch: 3 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:37:05,219-Speed 9554.67 samples/sec Loss 8.0707 LearningRate 0.0714 Epoch: 3 Global Step: 51750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:37:06,305-Speed 9442.87 samples/sec Loss 8.1119 LearningRate 0.0714 Epoch: 3 Global Step: 51760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:37:07,365-Speed 9660.70 samples/sec Loss 8.0144 LearningRate 0.0714 Epoch: 3 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:37:08,422-Speed 9697.53 samples/sec Loss 8.0218 LearningRate 0.0714 Epoch: 3 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:37:09,483-Speed 9652.57 samples/sec Loss 8.0432 LearningRate 0.0714 Epoch: 3 Global Step: 51790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:10,575-Speed 9384.58 samples/sec Loss 8.0733 LearningRate 0.0714 Epoch: 3 Global Step: 51800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:11,633-Speed 9680.03 samples/sec Loss 8.2104 LearningRate 0.0714 Epoch: 3 Global Step: 51810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:12,693-Speed 9664.20 samples/sec Loss 8.0859 LearningRate 0.0714 Epoch: 3 Global Step: 51820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:13,763-Speed 9581.19 samples/sec Loss 8.0686 LearningRate 0.0714 Epoch: 3 Global Step: 51830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:14,821-Speed 9681.47 samples/sec Loss 8.1982 LearningRate 0.0714 Epoch: 3 Global Step: 51840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:15,904-Speed 9462.26 samples/sec Loss 8.0741 LearningRate 0.0713 Epoch: 3 Global Step: 51850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:16,960-Speed 9704.19 samples/sec Loss 8.1649 LearningRate 0.0713 Epoch: 3 Global Step: 51860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:18,025-Speed 9622.99 samples/sec Loss 8.0866 LearningRate 0.0713 Epoch: 3 Global Step: 51870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:19,116-Speed 9384.64 samples/sec Loss 8.1190 LearningRate 0.0713 Epoch: 3 Global Step: 51880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:20,207-Speed 9393.90 samples/sec Loss 8.2586 LearningRate 0.0713 Epoch: 3 Global Step: 51890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:21,300-Speed 9371.78 samples/sec Loss 8.1380 LearningRate 0.0713 Epoch: 3 Global Step: 51900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:22,435-Speed 9029.15 samples/sec Loss 8.1153 LearningRate 0.0713 Epoch: 3 Global Step: 51910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:23,518-Speed 9461.92 samples/sec Loss 8.0372 LearningRate 0.0713 Epoch: 3 Global Step: 51920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:24,573-Speed 9718.68 samples/sec Loss 8.1527 LearningRate 0.0713 Epoch: 3 Global Step: 51930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:25,689-Speed 9180.33 samples/sec Loss 8.1530 LearningRate 0.0713 Epoch: 3 Global Step: 51940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:26,789-Speed 9309.54 samples/sec Loss 8.1775 LearningRate 0.0713 Epoch: 3 Global Step: 51950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:27,938-Speed 8922.15 samples/sec Loss 8.0289 LearningRate 0.0713 Epoch: 3 Global Step: 51960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:29,002-Speed 9622.69 samples/sec Loss 8.0509 LearningRate 0.0713 Epoch: 3 Global Step: 51970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:30,066-Speed 9629.56 samples/sec Loss 8.1327 LearningRate 0.0713 Epoch: 3 Global Step: 51980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:37:31,179-Speed 9214.44 samples/sec Loss 8.0502 LearningRate 0.0713 Epoch: 3 Global Step: 51990 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:37:32,258-Speed 9493.36 samples/sec Loss 8.1574 LearningRate 0.0713 Epoch: 3 Global Step: 52000 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:37:54,475-[lfw][52000]XNorm: 12.606832 Training: 2022-04-11 13:37:54,476-[lfw][52000]Accuracy-Flip: 0.99483+-0.00293 Training: 2022-04-11 13:37:54,476-[lfw][52000]Accuracy-Highest: 0.99583 Training: 2022-04-11 13:38:20,180-[cfp_fp][52000]XNorm: 10.611780 Training: 2022-04-11 13:38:20,180-[cfp_fp][52000]Accuracy-Flip: 0.94686+-0.01178 Training: 2022-04-11 13:38:20,181-[cfp_fp][52000]Accuracy-Highest: 0.94700 Training: 2022-04-11 13:38:42,644-[agedb_30][52000]XNorm: 12.101429 Training: 2022-04-11 13:38:42,645-[agedb_30][52000]Accuracy-Flip: 0.95383+-0.01049 Training: 2022-04-11 13:38:42,646-[agedb_30][52000]Accuracy-Highest: 0.95483 Training: 2022-04-11 13:38:43,696-Speed 143.34 samples/sec Loss 8.2263 LearningRate 0.0713 Epoch: 3 Global Step: 52010 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:38:44,769-Speed 9549.66 samples/sec Loss 8.1483 LearningRate 0.0713 Epoch: 3 Global Step: 52020 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:38:45,821-Speed 9751.13 samples/sec Loss 8.0570 LearningRate 0.0713 Epoch: 3 Global Step: 52030 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:38:46,894-Speed 9552.37 samples/sec Loss 8.1761 LearningRate 0.0713 Epoch: 3 Global Step: 52040 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:38:47,975-Speed 9473.40 samples/sec Loss 8.1266 LearningRate 0.0712 Epoch: 3 Global Step: 52050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:38:49,082-Speed 9257.56 samples/sec Loss 8.1055 LearningRate 0.0712 Epoch: 3 Global Step: 52060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:38:50,304-Speed 8387.34 samples/sec Loss 8.1777 LearningRate 0.0712 Epoch: 3 Global Step: 52070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:38:51,344-Speed 9854.17 samples/sec Loss 8.1970 LearningRate 0.0712 Epoch: 3 Global Step: 52080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:38:52,408-Speed 9625.66 samples/sec Loss 8.0860 LearningRate 0.0712 Epoch: 3 Global Step: 52090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:38:53,465-Speed 9698.92 samples/sec Loss 8.1673 LearningRate 0.0712 Epoch: 3 Global Step: 52100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:38:54,694-Speed 8336.63 samples/sec Loss 8.0418 LearningRate 0.0712 Epoch: 3 Global Step: 52110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:38:55,752-Speed 9687.39 samples/sec Loss 8.0985 LearningRate 0.0712 Epoch: 3 Global Step: 52120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:38:56,783-Speed 9935.88 samples/sec Loss 8.1774 LearningRate 0.0712 Epoch: 3 Global Step: 52130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:38:57,831-Speed 9775.61 samples/sec Loss 8.1756 LearningRate 0.0712 Epoch: 3 Global Step: 52140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:38:58,921-Speed 9402.02 samples/sec Loss 8.0671 LearningRate 0.0712 Epoch: 3 Global Step: 52150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:38:59,980-Speed 9675.35 samples/sec Loss 8.0896 LearningRate 0.0712 Epoch: 3 Global Step: 52160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:01,057-Speed 9522.78 samples/sec Loss 8.2286 LearningRate 0.0712 Epoch: 3 Global Step: 52170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:02,152-Speed 9352.89 samples/sec Loss 8.0966 LearningRate 0.0712 Epoch: 3 Global Step: 52180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:03,231-Speed 9491.51 samples/sec Loss 8.1187 LearningRate 0.0712 Epoch: 3 Global Step: 52190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:04,294-Speed 9637.61 samples/sec Loss 8.1910 LearningRate 0.0712 Epoch: 3 Global Step: 52200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:05,344-Speed 9760.55 samples/sec Loss 8.1010 LearningRate 0.0712 Epoch: 3 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:06,498-Speed 8880.61 samples/sec Loss 8.1558 LearningRate 0.0712 Epoch: 3 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:07,570-Speed 9556.83 samples/sec Loss 8.2178 LearningRate 0.0712 Epoch: 3 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:08,640-Speed 9578.69 samples/sec Loss 8.0594 LearningRate 0.0712 Epoch: 3 Global Step: 52240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 13:39:09,688-Speed 9772.14 samples/sec Loss 8.0657 LearningRate 0.0711 Epoch: 3 Global Step: 52250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 13:39:10,768-Speed 9493.29 samples/sec Loss 8.0974 LearningRate 0.0711 Epoch: 3 Global Step: 52260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 13:39:11,834-Speed 9605.63 samples/sec Loss 8.0948 LearningRate 0.0711 Epoch: 3 Global Step: 52270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 13:39:12,925-Speed 9397.61 samples/sec Loss 8.1620 LearningRate 0.0711 Epoch: 3 Global Step: 52280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 13:39:14,043-Speed 9167.19 samples/sec Loss 8.0501 LearningRate 0.0711 Epoch: 3 Global Step: 52290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 13:39:15,113-Speed 9574.13 samples/sec Loss 8.1912 LearningRate 0.0711 Epoch: 3 Global Step: 52300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 13:39:16,206-Speed 9376.21 samples/sec Loss 8.0495 LearningRate 0.0711 Epoch: 3 Global Step: 52310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 13:39:17,275-Speed 9578.39 samples/sec Loss 8.0841 LearningRate 0.0711 Epoch: 3 Global Step: 52320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 13:39:18,331-Speed 9707.68 samples/sec Loss 8.1680 LearningRate 0.0711 Epoch: 3 Global Step: 52330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 13:39:19,409-Speed 9503.11 samples/sec Loss 8.1114 LearningRate 0.0711 Epoch: 3 Global Step: 52340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:20,449-Speed 9852.64 samples/sec Loss 8.2681 LearningRate 0.0711 Epoch: 3 Global Step: 52350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:21,571-Speed 9132.47 samples/sec Loss 8.0953 LearningRate 0.0711 Epoch: 3 Global Step: 52360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:22,646-Speed 9527.17 samples/sec Loss 8.1058 LearningRate 0.0711 Epoch: 3 Global Step: 52370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:23,709-Speed 9642.85 samples/sec Loss 8.1472 LearningRate 0.0711 Epoch: 3 Global Step: 52380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:24,743-Speed 9911.34 samples/sec Loss 8.1894 LearningRate 0.0711 Epoch: 3 Global Step: 52390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:25,819-Speed 9521.49 samples/sec Loss 8.2171 LearningRate 0.0711 Epoch: 3 Global Step: 52400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:26,870-Speed 9748.08 samples/sec Loss 8.0587 LearningRate 0.0711 Epoch: 3 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:27,941-Speed 9568.95 samples/sec Loss 8.1388 LearningRate 0.0711 Epoch: 3 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:29,038-Speed 9342.07 samples/sec Loss 8.0219 LearningRate 0.0711 Epoch: 3 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:39:30,149-Speed 9221.57 samples/sec Loss 8.1597 LearningRate 0.0710 Epoch: 3 Global Step: 52440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:31,229-Speed 9490.30 samples/sec Loss 8.1978 LearningRate 0.0710 Epoch: 3 Global Step: 52450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:32,298-Speed 9587.17 samples/sec Loss 8.0561 LearningRate 0.0710 Epoch: 3 Global Step: 52460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:33,363-Speed 9617.33 samples/sec Loss 8.1551 LearningRate 0.0710 Epoch: 3 Global Step: 52470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:34,438-Speed 9529.00 samples/sec Loss 8.1761 LearningRate 0.0710 Epoch: 3 Global Step: 52480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:35,506-Speed 9597.45 samples/sec Loss 8.2204 LearningRate 0.0710 Epoch: 3 Global Step: 52490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:36,588-Speed 9468.57 samples/sec Loss 8.1533 LearningRate 0.0710 Epoch: 3 Global Step: 52500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:37,647-Speed 9674.17 samples/sec Loss 8.3020 LearningRate 0.0710 Epoch: 3 Global Step: 52510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:38,736-Speed 9409.21 samples/sec Loss 8.1491 LearningRate 0.0710 Epoch: 3 Global Step: 52520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:39,835-Speed 9319.80 samples/sec Loss 8.0958 LearningRate 0.0710 Epoch: 3 Global Step: 52530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:40,886-Speed 9744.96 samples/sec Loss 8.0636 LearningRate 0.0710 Epoch: 3 Global Step: 52540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:41,959-Speed 9555.78 samples/sec Loss 8.1772 LearningRate 0.0710 Epoch: 3 Global Step: 52550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:43,005-Speed 9790.24 samples/sec Loss 8.1416 LearningRate 0.0710 Epoch: 3 Global Step: 52560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:44,064-Speed 9682.04 samples/sec Loss 8.1745 LearningRate 0.0710 Epoch: 3 Global Step: 52570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:45,122-Speed 9682.83 samples/sec Loss 8.1865 LearningRate 0.0710 Epoch: 3 Global Step: 52580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:46,195-Speed 9546.59 samples/sec Loss 8.1172 LearningRate 0.0710 Epoch: 3 Global Step: 52590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:47,285-Speed 9406.27 samples/sec Loss 8.1739 LearningRate 0.0710 Epoch: 3 Global Step: 52600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:48,351-Speed 9613.58 samples/sec Loss 8.2287 LearningRate 0.0710 Epoch: 3 Global Step: 52610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:49,409-Speed 9683.95 samples/sec Loss 8.1908 LearningRate 0.0710 Epoch: 3 Global Step: 52620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:50,451-Speed 9832.42 samples/sec Loss 8.1590 LearningRate 0.0710 Epoch: 3 Global Step: 52630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:39:51,512-Speed 9656.87 samples/sec Loss 8.1395 LearningRate 0.0709 Epoch: 3 Global Step: 52640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:39:52,541-Speed 9953.05 samples/sec Loss 8.2835 LearningRate 0.0709 Epoch: 3 Global Step: 52650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:39:53,644-Speed 9297.54 samples/sec Loss 8.2015 LearningRate 0.0709 Epoch: 3 Global Step: 52660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:39:54,775-Speed 9064.60 samples/sec Loss 8.1719 LearningRate 0.0709 Epoch: 3 Global Step: 52670 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:39:55,881-Speed 9258.28 samples/sec Loss 8.2039 LearningRate 0.0709 Epoch: 3 Global Step: 52680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:39:56,974-Speed 9377.23 samples/sec Loss 8.1566 LearningRate 0.0709 Epoch: 3 Global Step: 52690 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:39:58,071-Speed 9338.76 samples/sec Loss 8.2022 LearningRate 0.0709 Epoch: 3 Global Step: 52700 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:39:59,180-Speed 9240.25 samples/sec Loss 8.1622 LearningRate 0.0709 Epoch: 3 Global Step: 52710 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:40:00,239-Speed 9669.00 samples/sec Loss 8.2643 LearningRate 0.0709 Epoch: 3 Global Step: 52720 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:40:01,304-Speed 9628.37 samples/sec Loss 8.1011 LearningRate 0.0709 Epoch: 3 Global Step: 52730 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:40:02,365-Speed 9651.49 samples/sec Loss 8.2176 LearningRate 0.0709 Epoch: 3 Global Step: 52740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:03,452-Speed 9433.07 samples/sec Loss 8.1553 LearningRate 0.0709 Epoch: 3 Global Step: 52750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:04,550-Speed 9324.61 samples/sec Loss 8.1677 LearningRate 0.0709 Epoch: 3 Global Step: 52760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:05,628-Speed 9512.77 samples/sec Loss 8.1932 LearningRate 0.0709 Epoch: 3 Global Step: 52770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:06,718-Speed 9400.43 samples/sec Loss 8.1630 LearningRate 0.0709 Epoch: 3 Global Step: 52780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:07,839-Speed 9135.19 samples/sec Loss 8.1441 LearningRate 0.0709 Epoch: 3 Global Step: 52790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:08,908-Speed 9589.77 samples/sec Loss 8.1657 LearningRate 0.0709 Epoch: 3 Global Step: 52800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:09,959-Speed 9748.60 samples/sec Loss 8.2225 LearningRate 0.0709 Epoch: 3 Global Step: 52810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:11,023-Speed 9626.41 samples/sec Loss 8.1348 LearningRate 0.0709 Epoch: 3 Global Step: 52820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:12,104-Speed 9481.69 samples/sec Loss 8.2576 LearningRate 0.0709 Epoch: 3 Global Step: 52830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:13,240-Speed 9017.03 samples/sec Loss 8.3476 LearningRate 0.0708 Epoch: 3 Global Step: 52840 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:40:14,325-Speed 9448.69 samples/sec Loss 8.1222 LearningRate 0.0708 Epoch: 3 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:40:15,407-Speed 9470.85 samples/sec Loss 8.1431 LearningRate 0.0708 Epoch: 3 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:40:16,491-Speed 9449.38 samples/sec Loss 8.2106 LearningRate 0.0708 Epoch: 3 Global Step: 52870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:40:17,597-Speed 9260.57 samples/sec Loss 8.0731 LearningRate 0.0708 Epoch: 3 Global Step: 52880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:40:18,665-Speed 9592.23 samples/sec Loss 8.2211 LearningRate 0.0708 Epoch: 3 Global Step: 52890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:40:19,715-Speed 9757.87 samples/sec Loss 8.1522 LearningRate 0.0708 Epoch: 3 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:40:20,739-Speed 10006.59 samples/sec Loss 8.2654 LearningRate 0.0708 Epoch: 3 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:40:21,819-Speed 9491.88 samples/sec Loss 8.1650 LearningRate 0.0708 Epoch: 3 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:40:22,860-Speed 9838.29 samples/sec Loss 8.1981 LearningRate 0.0708 Epoch: 3 Global Step: 52930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:40:23,982-Speed 9137.50 samples/sec Loss 8.1910 LearningRate 0.0708 Epoch: 3 Global Step: 52940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:40:25,086-Speed 9280.86 samples/sec Loss 8.1791 LearningRate 0.0708 Epoch: 3 Global Step: 52950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:26,176-Speed 9398.99 samples/sec Loss 8.1356 LearningRate 0.0708 Epoch: 3 Global Step: 52960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:27,237-Speed 9659.26 samples/sec Loss 8.0749 LearningRate 0.0708 Epoch: 3 Global Step: 52970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:28,298-Speed 9655.09 samples/sec Loss 8.1959 LearningRate 0.0708 Epoch: 3 Global Step: 52980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:29,337-Speed 9865.70 samples/sec Loss 8.1255 LearningRate 0.0708 Epoch: 3 Global Step: 52990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:30,377-Speed 9846.22 samples/sec Loss 8.1814 LearningRate 0.0708 Epoch: 3 Global Step: 53000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:31,445-Speed 9603.05 samples/sec Loss 8.1961 LearningRate 0.0708 Epoch: 3 Global Step: 53010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:32,522-Speed 9506.42 samples/sec Loss 8.1726 LearningRate 0.0708 Epoch: 3 Global Step: 53020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:33,620-Speed 9335.75 samples/sec Loss 8.2505 LearningRate 0.0708 Epoch: 3 Global Step: 53030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:34,704-Speed 9452.10 samples/sec Loss 8.2283 LearningRate 0.0707 Epoch: 3 Global Step: 53040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:35,769-Speed 9615.29 samples/sec Loss 8.0925 LearningRate 0.0707 Epoch: 3 Global Step: 53050 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:40:36,838-Speed 9583.87 samples/sec Loss 8.0925 LearningRate 0.0707 Epoch: 3 Global Step: 53060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:37,944-Speed 9261.88 samples/sec Loss 8.1183 LearningRate 0.0707 Epoch: 3 Global Step: 53070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:38,989-Speed 9810.72 samples/sec Loss 8.1994 LearningRate 0.0707 Epoch: 3 Global Step: 53080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:40,057-Speed 9588.53 samples/sec Loss 8.1935 LearningRate 0.0707 Epoch: 3 Global Step: 53090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:41,126-Speed 9591.08 samples/sec Loss 8.2505 LearningRate 0.0707 Epoch: 3 Global Step: 53100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:42,195-Speed 9582.20 samples/sec Loss 8.1925 LearningRate 0.0707 Epoch: 3 Global Step: 53110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:43,254-Speed 9678.25 samples/sec Loss 8.1964 LearningRate 0.0707 Epoch: 3 Global Step: 53120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:44,348-Speed 9364.38 samples/sec Loss 8.1828 LearningRate 0.0707 Epoch: 3 Global Step: 53130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:45,433-Speed 9444.61 samples/sec Loss 8.2002 LearningRate 0.0707 Epoch: 3 Global Step: 53140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:46,498-Speed 9623.20 samples/sec Loss 8.2899 LearningRate 0.0707 Epoch: 3 Global Step: 53150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:47,570-Speed 9556.63 samples/sec Loss 8.1920 LearningRate 0.0707 Epoch: 3 Global Step: 53160 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:40:48,638-Speed 9594.07 samples/sec Loss 8.2152 LearningRate 0.0707 Epoch: 3 Global Step: 53170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:49,717-Speed 9498.87 samples/sec Loss 8.1642 LearningRate 0.0707 Epoch: 3 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:50,811-Speed 9363.32 samples/sec Loss 8.2314 LearningRate 0.0707 Epoch: 3 Global Step: 53190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:51,856-Speed 9806.24 samples/sec Loss 8.0783 LearningRate 0.0707 Epoch: 3 Global Step: 53200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:52,924-Speed 9589.95 samples/sec Loss 8.3011 LearningRate 0.0707 Epoch: 3 Global Step: 53210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:54,059-Speed 9027.31 samples/sec Loss 8.1012 LearningRate 0.0707 Epoch: 3 Global Step: 53220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:55,102-Speed 9823.30 samples/sec Loss 8.1973 LearningRate 0.0707 Epoch: 3 Global Step: 53230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:56,174-Speed 9558.44 samples/sec Loss 8.2248 LearningRate 0.0706 Epoch: 3 Global Step: 53240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:57,249-Speed 9530.33 samples/sec Loss 8.1378 LearningRate 0.0706 Epoch: 3 Global Step: 53250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:58,340-Speed 9395.06 samples/sec Loss 8.2104 LearningRate 0.0706 Epoch: 3 Global Step: 53260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:40:59,421-Speed 9473.80 samples/sec Loss 8.1745 LearningRate 0.0706 Epoch: 3 Global Step: 53270 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:41:00,493-Speed 9559.18 samples/sec Loss 8.2149 LearningRate 0.0706 Epoch: 3 Global Step: 53280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:01,575-Speed 9474.72 samples/sec Loss 8.3245 LearningRate 0.0706 Epoch: 3 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:02,675-Speed 9314.47 samples/sec Loss 8.2374 LearningRate 0.0706 Epoch: 3 Global Step: 53300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:03,767-Speed 9384.10 samples/sec Loss 8.3554 LearningRate 0.0706 Epoch: 3 Global Step: 53310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:04,858-Speed 9392.29 samples/sec Loss 8.2188 LearningRate 0.0706 Epoch: 3 Global Step: 53320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:05,928-Speed 9574.07 samples/sec Loss 8.2509 LearningRate 0.0706 Epoch: 3 Global Step: 53330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:06,964-Speed 9892.50 samples/sec Loss 8.2309 LearningRate 0.0706 Epoch: 3 Global Step: 53340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:08,048-Speed 9449.84 samples/sec Loss 8.1764 LearningRate 0.0706 Epoch: 3 Global Step: 53350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:09,170-Speed 9127.70 samples/sec Loss 8.3361 LearningRate 0.0706 Epoch: 3 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:10,244-Speed 9543.08 samples/sec Loss 8.1817 LearningRate 0.0706 Epoch: 3 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:11,312-Speed 9594.68 samples/sec Loss 8.3211 LearningRate 0.0706 Epoch: 3 Global Step: 53380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:12,381-Speed 9581.58 samples/sec Loss 8.2553 LearningRate 0.0706 Epoch: 3 Global Step: 53390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:13,475-Speed 9365.82 samples/sec Loss 8.1901 LearningRate 0.0706 Epoch: 3 Global Step: 53400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:14,569-Speed 9369.52 samples/sec Loss 8.3465 LearningRate 0.0706 Epoch: 3 Global Step: 53410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:15,657-Speed 9415.63 samples/sec Loss 8.1435 LearningRate 0.0706 Epoch: 3 Global Step: 53420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:16,728-Speed 9564.95 samples/sec Loss 8.3571 LearningRate 0.0706 Epoch: 3 Global Step: 53430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:17,832-Speed 9280.51 samples/sec Loss 8.2376 LearningRate 0.0705 Epoch: 3 Global Step: 53440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:18,900-Speed 9596.01 samples/sec Loss 8.1365 LearningRate 0.0705 Epoch: 3 Global Step: 53450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:20,012-Speed 9217.58 samples/sec Loss 8.2522 LearningRate 0.0705 Epoch: 3 Global Step: 53460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:21,124-Speed 9217.90 samples/sec Loss 8.2699 LearningRate 0.0705 Epoch: 3 Global Step: 53470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:22,160-Speed 9885.74 samples/sec Loss 8.1410 LearningRate 0.0705 Epoch: 3 Global Step: 53480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:23,206-Speed 9798.29 samples/sec Loss 8.2233 LearningRate 0.0705 Epoch: 3 Global Step: 53490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:24,298-Speed 9388.67 samples/sec Loss 8.2908 LearningRate 0.0705 Epoch: 3 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:25,398-Speed 9310.50 samples/sec Loss 8.1779 LearningRate 0.0705 Epoch: 3 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:26,482-Speed 9455.45 samples/sec Loss 8.2930 LearningRate 0.0705 Epoch: 3 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:27,543-Speed 9660.05 samples/sec Loss 8.3384 LearningRate 0.0705 Epoch: 3 Global Step: 53530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:28,652-Speed 9238.57 samples/sec Loss 8.1971 LearningRate 0.0705 Epoch: 3 Global Step: 53540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:29,762-Speed 9228.46 samples/sec Loss 8.2311 LearningRate 0.0705 Epoch: 3 Global Step: 53550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:30,858-Speed 9345.54 samples/sec Loss 8.1460 LearningRate 0.0705 Epoch: 3 Global Step: 53560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:31,924-Speed 9616.26 samples/sec Loss 8.1873 LearningRate 0.0705 Epoch: 3 Global Step: 53570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:32,996-Speed 9552.13 samples/sec Loss 8.1961 LearningRate 0.0705 Epoch: 3 Global Step: 53580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:34,142-Speed 8943.68 samples/sec Loss 8.1966 LearningRate 0.0705 Epoch: 3 Global Step: 53590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:35,197-Speed 9706.64 samples/sec Loss 8.1617 LearningRate 0.0705 Epoch: 3 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:36,285-Speed 9417.87 samples/sec Loss 8.3019 LearningRate 0.0705 Epoch: 3 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:37,373-Speed 9418.59 samples/sec Loss 8.1877 LearningRate 0.0705 Epoch: 3 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:41:38,464-Speed 9398.14 samples/sec Loss 8.2598 LearningRate 0.0704 Epoch: 3 Global Step: 53630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:39,546-Speed 9470.50 samples/sec Loss 8.2452 LearningRate 0.0704 Epoch: 3 Global Step: 53640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:40,670-Speed 9117.05 samples/sec Loss 8.3087 LearningRate 0.0704 Epoch: 3 Global Step: 53650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:41,726-Speed 9701.22 samples/sec Loss 8.1612 LearningRate 0.0704 Epoch: 3 Global Step: 53660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:42,818-Speed 9377.09 samples/sec Loss 8.1409 LearningRate 0.0704 Epoch: 3 Global Step: 53670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:43,914-Speed 9349.33 samples/sec Loss 8.3211 LearningRate 0.0704 Epoch: 3 Global Step: 53680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:45,006-Speed 9388.46 samples/sec Loss 8.1447 LearningRate 0.0704 Epoch: 3 Global Step: 53690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:46,088-Speed 9472.53 samples/sec Loss 8.2956 LearningRate 0.0704 Epoch: 3 Global Step: 53700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:47,163-Speed 9531.72 samples/sec Loss 8.3247 LearningRate 0.0704 Epoch: 3 Global Step: 53710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:48,234-Speed 9563.72 samples/sec Loss 8.2513 LearningRate 0.0704 Epoch: 3 Global Step: 53720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:49,349-Speed 9192.23 samples/sec Loss 8.1626 LearningRate 0.0704 Epoch: 3 Global Step: 53730 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:41:50,405-Speed 9702.20 samples/sec Loss 8.1666 LearningRate 0.0704 Epoch: 3 Global Step: 53740 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:41:51,484-Speed 9490.38 samples/sec Loss 8.1644 LearningRate 0.0704 Epoch: 3 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:52,556-Speed 9558.55 samples/sec Loss 8.2693 LearningRate 0.0704 Epoch: 3 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:53,659-Speed 9292.23 samples/sec Loss 8.1331 LearningRate 0.0704 Epoch: 3 Global Step: 53770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:54,730-Speed 9574.45 samples/sec Loss 8.1036 LearningRate 0.0704 Epoch: 3 Global Step: 53780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:55,815-Speed 9442.36 samples/sec Loss 8.1174 LearningRate 0.0704 Epoch: 3 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:56,888-Speed 9544.57 samples/sec Loss 8.2055 LearningRate 0.0704 Epoch: 3 Global Step: 53800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:57,980-Speed 9384.14 samples/sec Loss 8.2306 LearningRate 0.0704 Epoch: 3 Global Step: 53810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:41:59,097-Speed 9175.95 samples/sec Loss 8.2002 LearningRate 0.0704 Epoch: 3 Global Step: 53820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:00,162-Speed 9613.84 samples/sec Loss 8.2248 LearningRate 0.0703 Epoch: 3 Global Step: 53830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:01,222-Speed 9668.85 samples/sec Loss 8.1618 LearningRate 0.0703 Epoch: 3 Global Step: 53840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:02,304-Speed 9477.19 samples/sec Loss 8.2953 LearningRate 0.0703 Epoch: 3 Global Step: 53850 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:42:03,361-Speed 9693.14 samples/sec Loss 8.1653 LearningRate 0.0703 Epoch: 3 Global Step: 53860 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:42:04,389-Speed 9964.99 samples/sec Loss 8.1264 LearningRate 0.0703 Epoch: 3 Global Step: 53870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:05,478-Speed 9406.90 samples/sec Loss 8.1782 LearningRate 0.0703 Epoch: 3 Global Step: 53880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:06,518-Speed 9849.80 samples/sec Loss 8.1850 LearningRate 0.0703 Epoch: 3 Global Step: 53890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:07,593-Speed 9528.16 samples/sec Loss 8.1535 LearningRate 0.0703 Epoch: 3 Global Step: 53900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:08,657-Speed 9632.29 samples/sec Loss 8.2027 LearningRate 0.0703 Epoch: 3 Global Step: 53910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:09,742-Speed 9444.21 samples/sec Loss 8.2432 LearningRate 0.0703 Epoch: 3 Global Step: 53920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:10,810-Speed 9596.40 samples/sec Loss 8.2155 LearningRate 0.0703 Epoch: 3 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:11,892-Speed 9465.62 samples/sec Loss 8.1439 LearningRate 0.0703 Epoch: 3 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:12,952-Speed 9670.93 samples/sec Loss 8.1168 LearningRate 0.0703 Epoch: 3 Global Step: 53950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:14,046-Speed 9371.62 samples/sec Loss 8.1308 LearningRate 0.0703 Epoch: 3 Global Step: 53960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:15,113-Speed 9599.31 samples/sec Loss 8.2446 LearningRate 0.0703 Epoch: 3 Global Step: 53970 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:42:16,154-Speed 9839.25 samples/sec Loss 8.2080 LearningRate 0.0703 Epoch: 3 Global Step: 53980 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:42:17,217-Speed 9639.70 samples/sec Loss 8.2194 LearningRate 0.0703 Epoch: 3 Global Step: 53990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:18,300-Speed 9465.92 samples/sec Loss 8.1913 LearningRate 0.0703 Epoch: 3 Global Step: 54000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:42:40,487-[lfw][54000]XNorm: 12.649266 Training: 2022-04-11 13:42:40,488-[lfw][54000]Accuracy-Flip: 0.99467+-0.00256 Training: 2022-04-11 13:42:40,488-[lfw][54000]Accuracy-Highest: 0.99583 Training: 2022-04-11 13:43:06,183-[cfp_fp][54000]XNorm: 10.701333 Training: 2022-04-11 13:43:06,184-[cfp_fp][54000]Accuracy-Flip: 0.94943+-0.01430 Training: 2022-04-11 13:43:06,184-[cfp_fp][54000]Accuracy-Highest: 0.94943 Training: 2022-04-11 13:43:28,587-[agedb_30][54000]XNorm: 12.216305 Training: 2022-04-11 13:43:28,588-[agedb_30][54000]Accuracy-Flip: 0.95400+-0.01188 Training: 2022-04-11 13:43:28,588-[agedb_30][54000]Accuracy-Highest: 0.95483 Training: 2022-04-11 13:43:29,703-Speed 143.41 samples/sec Loss 8.1593 LearningRate 0.0703 Epoch: 3 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:30,794-Speed 9387.95 samples/sec Loss 8.2810 LearningRate 0.0703 Epoch: 3 Global Step: 54020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:31,857-Speed 9637.82 samples/sec Loss 8.2453 LearningRate 0.0702 Epoch: 3 Global Step: 54030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:32,978-Speed 9142.86 samples/sec Loss 8.2523 LearningRate 0.0702 Epoch: 3 Global Step: 54040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:34,053-Speed 9531.68 samples/sec Loss 8.2273 LearningRate 0.0702 Epoch: 3 Global Step: 54050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:35,150-Speed 9335.04 samples/sec Loss 8.2847 LearningRate 0.0702 Epoch: 3 Global Step: 54060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:36,195-Speed 9812.66 samples/sec Loss 8.0513 LearningRate 0.0702 Epoch: 3 Global Step: 54070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:37,242-Speed 9780.69 samples/sec Loss 8.2570 LearningRate 0.0702 Epoch: 3 Global Step: 54080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:38,297-Speed 9711.16 samples/sec Loss 8.2688 LearningRate 0.0702 Epoch: 3 Global Step: 54090 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:43:39,346-Speed 9771.54 samples/sec Loss 8.2621 LearningRate 0.0702 Epoch: 3 Global Step: 54100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:40,482-Speed 9017.46 samples/sec Loss 8.2609 LearningRate 0.0702 Epoch: 3 Global Step: 54110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:41,534-Speed 9738.28 samples/sec Loss 8.1630 LearningRate 0.0702 Epoch: 3 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:43:42,628-Speed 9372.10 samples/sec Loss 8.3370 LearningRate 0.0702 Epoch: 3 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:43:43,712-Speed 9447.98 samples/sec Loss 8.3074 LearningRate 0.0702 Epoch: 3 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:43:44,792-Speed 9492.72 samples/sec Loss 8.2015 LearningRate 0.0702 Epoch: 3 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:43:45,861-Speed 9580.03 samples/sec Loss 8.2472 LearningRate 0.0702 Epoch: 3 Global Step: 54160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:43:46,945-Speed 9458.34 samples/sec Loss 8.2168 LearningRate 0.0702 Epoch: 3 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:43:48,033-Speed 9410.29 samples/sec Loss 8.2702 LearningRate 0.0702 Epoch: 3 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:43:49,174-Speed 8985.49 samples/sec Loss 8.2058 LearningRate 0.0702 Epoch: 3 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:43:50,235-Speed 9652.16 samples/sec Loss 8.1362 LearningRate 0.0702 Epoch: 3 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:43:51,302-Speed 9608.50 samples/sec Loss 8.2032 LearningRate 0.0702 Epoch: 3 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:43:52,370-Speed 9591.90 samples/sec Loss 8.2934 LearningRate 0.0702 Epoch: 3 Global Step: 54220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:53,415-Speed 9806.13 samples/sec Loss 8.3101 LearningRate 0.0701 Epoch: 3 Global Step: 54230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:54,514-Speed 9324.95 samples/sec Loss 8.1376 LearningRate 0.0701 Epoch: 3 Global Step: 54240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:55,606-Speed 9379.42 samples/sec Loss 8.1973 LearningRate 0.0701 Epoch: 3 Global Step: 54250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:56,657-Speed 9744.76 samples/sec Loss 8.2702 LearningRate 0.0701 Epoch: 3 Global Step: 54260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:57,756-Speed 9326.19 samples/sec Loss 8.2137 LearningRate 0.0701 Epoch: 3 Global Step: 54270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:58,828-Speed 9558.54 samples/sec Loss 8.2548 LearningRate 0.0701 Epoch: 3 Global Step: 54280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:43:59,898-Speed 9575.38 samples/sec Loss 8.1448 LearningRate 0.0701 Epoch: 3 Global Step: 54290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:00,974-Speed 9515.74 samples/sec Loss 8.2836 LearningRate 0.0701 Epoch: 3 Global Step: 54300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:02,033-Speed 9682.10 samples/sec Loss 8.1851 LearningRate 0.0701 Epoch: 3 Global Step: 54310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:03,087-Speed 9718.35 samples/sec Loss 8.2715 LearningRate 0.0701 Epoch: 3 Global Step: 54320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:04,122-Speed 9906.45 samples/sec Loss 8.2447 LearningRate 0.0701 Epoch: 3 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:05,192-Speed 9568.09 samples/sec Loss 8.2169 LearningRate 0.0701 Epoch: 3 Global Step: 54340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:06,331-Speed 9002.42 samples/sec Loss 8.2179 LearningRate 0.0701 Epoch: 3 Global Step: 54350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:07,428-Speed 9341.84 samples/sec Loss 8.1644 LearningRate 0.0701 Epoch: 3 Global Step: 54360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:08,530-Speed 9294.81 samples/sec Loss 8.2863 LearningRate 0.0701 Epoch: 3 Global Step: 54370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:09,601-Speed 9571.20 samples/sec Loss 8.2568 LearningRate 0.0701 Epoch: 3 Global Step: 54380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:10,643-Speed 9825.84 samples/sec Loss 8.2781 LearningRate 0.0701 Epoch: 3 Global Step: 54390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:11,714-Speed 9568.40 samples/sec Loss 8.2323 LearningRate 0.0701 Epoch: 3 Global Step: 54400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:12,748-Speed 9916.89 samples/sec Loss 8.2184 LearningRate 0.0701 Epoch: 3 Global Step: 54410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:13,809-Speed 9649.75 samples/sec Loss 8.3204 LearningRate 0.0701 Epoch: 3 Global Step: 54420 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:14,897-Speed 9422.35 samples/sec Loss 8.1797 LearningRate 0.0700 Epoch: 3 Global Step: 54430 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:16,000-Speed 9289.05 samples/sec Loss 8.1874 LearningRate 0.0700 Epoch: 3 Global Step: 54440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:17,109-Speed 9236.13 samples/sec Loss 8.2270 LearningRate 0.0700 Epoch: 3 Global Step: 54450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:18,207-Speed 9335.52 samples/sec Loss 8.1297 LearningRate 0.0700 Epoch: 3 Global Step: 54460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:19,283-Speed 9524.98 samples/sec Loss 8.2907 LearningRate 0.0700 Epoch: 3 Global Step: 54470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:20,360-Speed 9507.89 samples/sec Loss 8.3008 LearningRate 0.0700 Epoch: 3 Global Step: 54480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:21,428-Speed 9600.47 samples/sec Loss 8.2268 LearningRate 0.0700 Epoch: 3 Global Step: 54490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:22,518-Speed 9399.99 samples/sec Loss 8.1561 LearningRate 0.0700 Epoch: 3 Global Step: 54500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:23,590-Speed 9551.98 samples/sec Loss 8.2581 LearningRate 0.0700 Epoch: 3 Global Step: 54510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:24,657-Speed 9604.95 samples/sec Loss 8.1601 LearningRate 0.0700 Epoch: 3 Global Step: 54520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:25,733-Speed 9524.40 samples/sec Loss 8.2678 LearningRate 0.0700 Epoch: 3 Global Step: 54530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:26,826-Speed 9378.48 samples/sec Loss 8.1538 LearningRate 0.0700 Epoch: 3 Global Step: 54540 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:27,916-Speed 9398.49 samples/sec Loss 8.1910 LearningRate 0.0700 Epoch: 3 Global Step: 54550 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:28,951-Speed 9897.45 samples/sec Loss 8.3511 LearningRate 0.0700 Epoch: 3 Global Step: 54560 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:29,992-Speed 9847.32 samples/sec Loss 8.2830 LearningRate 0.0700 Epoch: 3 Global Step: 54570 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:31,103-Speed 9217.18 samples/sec Loss 8.2647 LearningRate 0.0700 Epoch: 3 Global Step: 54580 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:32,192-Speed 9412.53 samples/sec Loss 8.1839 LearningRate 0.0700 Epoch: 3 Global Step: 54590 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:33,295-Speed 9280.68 samples/sec Loss 8.3049 LearningRate 0.0700 Epoch: 3 Global Step: 54600 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:34,364-Speed 9595.15 samples/sec Loss 8.3536 LearningRate 0.0700 Epoch: 3 Global Step: 54610 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:35,441-Speed 9512.77 samples/sec Loss 8.2795 LearningRate 0.0700 Epoch: 3 Global Step: 54620 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:36,475-Speed 9904.55 samples/sec Loss 8.2459 LearningRate 0.0699 Epoch: 3 Global Step: 54630 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:37,544-Speed 9584.20 samples/sec Loss 8.2192 LearningRate 0.0699 Epoch: 3 Global Step: 54640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:38,593-Speed 9766.27 samples/sec Loss 8.1944 LearningRate 0.0699 Epoch: 3 Global Step: 54650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:39,666-Speed 9552.83 samples/sec Loss 8.1121 LearningRate 0.0699 Epoch: 3 Global Step: 54660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:44:40,727-Speed 9658.23 samples/sec Loss 8.2042 LearningRate 0.0699 Epoch: 3 Global Step: 54670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:41,820-Speed 9367.40 samples/sec Loss 8.2266 LearningRate 0.0699 Epoch: 3 Global Step: 54680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:42,884-Speed 9638.43 samples/sec Loss 8.2756 LearningRate 0.0699 Epoch: 3 Global Step: 54690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:43,942-Speed 9683.33 samples/sec Loss 8.2216 LearningRate 0.0699 Epoch: 3 Global Step: 54700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:45,014-Speed 9626.25 samples/sec Loss 8.1866 LearningRate 0.0699 Epoch: 3 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:44:46,060-Speed 9798.46 samples/sec Loss 8.1668 LearningRate 0.0699 Epoch: 3 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:44:47,137-Speed 9505.76 samples/sec Loss 8.2870 LearningRate 0.0699 Epoch: 3 Global Step: 54730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:44:48,185-Speed 9778.73 samples/sec Loss 8.3115 LearningRate 0.0699 Epoch: 3 Global Step: 54740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:44:49,271-Speed 9434.10 samples/sec Loss 8.2150 LearningRate 0.0699 Epoch: 3 Global Step: 54750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:44:50,316-Speed 9800.66 samples/sec Loss 8.2640 LearningRate 0.0699 Epoch: 3 Global Step: 54760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:44:51,359-Speed 9833.75 samples/sec Loss 8.2032 LearningRate 0.0699 Epoch: 3 Global Step: 54770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:44:52,436-Speed 9513.94 samples/sec Loss 8.2904 LearningRate 0.0699 Epoch: 3 Global Step: 54780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:44:53,516-Speed 9485.65 samples/sec Loss 8.1529 LearningRate 0.0699 Epoch: 3 Global Step: 54790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:44:54,578-Speed 9645.21 samples/sec Loss 8.3394 LearningRate 0.0699 Epoch: 3 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:44:55,617-Speed 9865.49 samples/sec Loss 8.1529 LearningRate 0.0699 Epoch: 3 Global Step: 54810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:56,694-Speed 9513.52 samples/sec Loss 8.2656 LearningRate 0.0699 Epoch: 3 Global Step: 54820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:57,821-Speed 9088.74 samples/sec Loss 8.1323 LearningRate 0.0698 Epoch: 3 Global Step: 54830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:58,889-Speed 9590.68 samples/sec Loss 8.1642 LearningRate 0.0698 Epoch: 3 Global Step: 54840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:44:59,964-Speed 9533.51 samples/sec Loss 8.2490 LearningRate 0.0698 Epoch: 3 Global Step: 54850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:01,048-Speed 9456.11 samples/sec Loss 8.1615 LearningRate 0.0698 Epoch: 3 Global Step: 54860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:02,121-Speed 9544.98 samples/sec Loss 8.2762 LearningRate 0.0698 Epoch: 3 Global Step: 54870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:03,175-Speed 9720.93 samples/sec Loss 8.2611 LearningRate 0.0698 Epoch: 3 Global Step: 54880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:04,221-Speed 9807.28 samples/sec Loss 8.1443 LearningRate 0.0698 Epoch: 3 Global Step: 54890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:05,285-Speed 9632.64 samples/sec Loss 8.1568 LearningRate 0.0698 Epoch: 3 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:06,366-Speed 9484.70 samples/sec Loss 8.1623 LearningRate 0.0698 Epoch: 3 Global Step: 54910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:07,450-Speed 9454.70 samples/sec Loss 8.1600 LearningRate 0.0698 Epoch: 3 Global Step: 54920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:08,511-Speed 9654.22 samples/sec Loss 8.2425 LearningRate 0.0698 Epoch: 3 Global Step: 54930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:09,607-Speed 9341.63 samples/sec Loss 8.1786 LearningRate 0.0698 Epoch: 3 Global Step: 54940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:45:10,701-Speed 9371.15 samples/sec Loss 8.2613 LearningRate 0.0698 Epoch: 3 Global Step: 54950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:45:11,762-Speed 9650.13 samples/sec Loss 8.2149 LearningRate 0.0698 Epoch: 3 Global Step: 54960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:45:12,829-Speed 9607.14 samples/sec Loss 8.2566 LearningRate 0.0698 Epoch: 3 Global Step: 54970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:45:13,905-Speed 9518.30 samples/sec Loss 8.3345 LearningRate 0.0698 Epoch: 3 Global Step: 54980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:45:14,979-Speed 9550.27 samples/sec Loss 8.3482 LearningRate 0.0698 Epoch: 3 Global Step: 54990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:45:16,091-Speed 9213.84 samples/sec Loss 8.1742 LearningRate 0.0698 Epoch: 3 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:45:17,202-Speed 9217.58 samples/sec Loss 8.2265 LearningRate 0.0698 Epoch: 3 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:45:18,281-Speed 9493.01 samples/sec Loss 8.1072 LearningRate 0.0698 Epoch: 3 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:45:19,378-Speed 9344.03 samples/sec Loss 8.2418 LearningRate 0.0697 Epoch: 3 Global Step: 55030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:45:20,465-Speed 9429.81 samples/sec Loss 8.1556 LearningRate 0.0697 Epoch: 3 Global Step: 55040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:21,630-Speed 8794.10 samples/sec Loss 8.1550 LearningRate 0.0697 Epoch: 3 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:22,745-Speed 9192.64 samples/sec Loss 8.2624 LearningRate 0.0697 Epoch: 3 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:23,836-Speed 9395.10 samples/sec Loss 8.1905 LearningRate 0.0697 Epoch: 3 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:24,919-Speed 9462.98 samples/sec Loss 8.2953 LearningRate 0.0697 Epoch: 3 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:25,943-Speed 9999.66 samples/sec Loss 8.2893 LearningRate 0.0697 Epoch: 3 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:26,996-Speed 9726.45 samples/sec Loss 8.4008 LearningRate 0.0697 Epoch: 3 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:28,057-Speed 9662.30 samples/sec Loss 8.3523 LearningRate 0.0697 Epoch: 3 Global Step: 55110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:29,148-Speed 9391.80 samples/sec Loss 8.2298 LearningRate 0.0697 Epoch: 3 Global Step: 55120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:30,203-Speed 9705.30 samples/sec Loss 8.3522 LearningRate 0.0697 Epoch: 3 Global Step: 55130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:31,312-Speed 9240.98 samples/sec Loss 8.2204 LearningRate 0.0697 Epoch: 3 Global Step: 55140 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:45:32,413-Speed 9308.12 samples/sec Loss 8.1523 LearningRate 0.0697 Epoch: 3 Global Step: 55150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:45:33,507-Speed 9364.87 samples/sec Loss 8.2908 LearningRate 0.0697 Epoch: 3 Global Step: 55160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:34,575-Speed 9597.42 samples/sec Loss 8.2348 LearningRate 0.0697 Epoch: 3 Global Step: 55170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:35,663-Speed 9417.90 samples/sec Loss 8.1691 LearningRate 0.0697 Epoch: 3 Global Step: 55180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:36,737-Speed 9537.69 samples/sec Loss 8.2878 LearningRate 0.0697 Epoch: 3 Global Step: 55190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:37,787-Speed 9762.01 samples/sec Loss 8.2933 LearningRate 0.0697 Epoch: 3 Global Step: 55200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:38,835-Speed 9773.06 samples/sec Loss 8.3158 LearningRate 0.0697 Epoch: 3 Global Step: 55210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:39,905-Speed 9569.92 samples/sec Loss 8.2417 LearningRate 0.0697 Epoch: 3 Global Step: 55220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:41,000-Speed 9360.41 samples/sec Loss 8.2263 LearningRate 0.0696 Epoch: 3 Global Step: 55230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:42,059-Speed 9680.92 samples/sec Loss 8.3120 LearningRate 0.0696 Epoch: 3 Global Step: 55240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:43,142-Speed 9461.22 samples/sec Loss 8.2692 LearningRate 0.0696 Epoch: 3 Global Step: 55250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:44,201-Speed 9674.72 samples/sec Loss 8.2786 LearningRate 0.0696 Epoch: 3 Global Step: 55260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:45,240-Speed 9860.05 samples/sec Loss 8.2122 LearningRate 0.0696 Epoch: 3 Global Step: 55270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:46,323-Speed 9466.11 samples/sec Loss 8.1666 LearningRate 0.0696 Epoch: 3 Global Step: 55280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:47,448-Speed 9106.36 samples/sec Loss 8.2822 LearningRate 0.0696 Epoch: 3 Global Step: 55290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:48,521-Speed 9544.17 samples/sec Loss 8.3066 LearningRate 0.0696 Epoch: 3 Global Step: 55300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:49,631-Speed 9233.21 samples/sec Loss 8.3007 LearningRate 0.0696 Epoch: 3 Global Step: 55310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:50,731-Speed 9314.14 samples/sec Loss 8.3097 LearningRate 0.0696 Epoch: 3 Global Step: 55320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:51,805-Speed 9547.61 samples/sec Loss 8.3224 LearningRate 0.0696 Epoch: 3 Global Step: 55330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:52,872-Speed 9599.57 samples/sec Loss 8.3417 LearningRate 0.0696 Epoch: 3 Global Step: 55340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:53,925-Speed 9727.39 samples/sec Loss 8.3079 LearningRate 0.0696 Epoch: 3 Global Step: 55350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:55,010-Speed 9444.78 samples/sec Loss 8.2579 LearningRate 0.0696 Epoch: 3 Global Step: 55360 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:45:56,053-Speed 9821.43 samples/sec Loss 8.2357 LearningRate 0.0696 Epoch: 3 Global Step: 55370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:57,118-Speed 9627.12 samples/sec Loss 8.2656 LearningRate 0.0696 Epoch: 3 Global Step: 55380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:58,182-Speed 9629.05 samples/sec Loss 8.2648 LearningRate 0.0696 Epoch: 3 Global Step: 55390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:45:59,257-Speed 9526.78 samples/sec Loss 8.1061 LearningRate 0.0696 Epoch: 3 Global Step: 55400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:00,342-Speed 9442.13 samples/sec Loss 8.2697 LearningRate 0.0696 Epoch: 3 Global Step: 55410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:01,474-Speed 9056.53 samples/sec Loss 8.1595 LearningRate 0.0696 Epoch: 3 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:02,518-Speed 9817.09 samples/sec Loss 8.2704 LearningRate 0.0695 Epoch: 3 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:03,597-Speed 9492.83 samples/sec Loss 8.3163 LearningRate 0.0695 Epoch: 3 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:04,715-Speed 9170.11 samples/sec Loss 8.2302 LearningRate 0.0695 Epoch: 3 Global Step: 55450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:05,819-Speed 9277.13 samples/sec Loss 8.3101 LearningRate 0.0695 Epoch: 3 Global Step: 55460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:06,890-Speed 9562.97 samples/sec Loss 8.2888 LearningRate 0.0695 Epoch: 3 Global Step: 55470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:07,982-Speed 9380.42 samples/sec Loss 8.1523 LearningRate 0.0695 Epoch: 3 Global Step: 55480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:09,056-Speed 9543.20 samples/sec Loss 8.2959 LearningRate 0.0695 Epoch: 3 Global Step: 55490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:10,144-Speed 9418.11 samples/sec Loss 8.2013 LearningRate 0.0695 Epoch: 3 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:11,208-Speed 9629.81 samples/sec Loss 8.2706 LearningRate 0.0695 Epoch: 3 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:12,263-Speed 9707.35 samples/sec Loss 8.1558 LearningRate 0.0695 Epoch: 3 Global Step: 55520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:13,338-Speed 9538.48 samples/sec Loss 8.1635 LearningRate 0.0695 Epoch: 3 Global Step: 55530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:14,414-Speed 9513.59 samples/sec Loss 8.2313 LearningRate 0.0695 Epoch: 3 Global Step: 55540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:15,473-Speed 9679.24 samples/sec Loss 8.3965 LearningRate 0.0695 Epoch: 3 Global Step: 55550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:16,562-Speed 9415.33 samples/sec Loss 8.2489 LearningRate 0.0695 Epoch: 3 Global Step: 55560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:17,669-Speed 9254.24 samples/sec Loss 8.3517 LearningRate 0.0695 Epoch: 3 Global Step: 55570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:18,724-Speed 9712.19 samples/sec Loss 8.1804 LearningRate 0.0695 Epoch: 3 Global Step: 55580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:19,765-Speed 9845.46 samples/sec Loss 8.3627 LearningRate 0.0695 Epoch: 3 Global Step: 55590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:20,877-Speed 9216.77 samples/sec Loss 8.3075 LearningRate 0.0695 Epoch: 3 Global Step: 55600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:21,986-Speed 9233.95 samples/sec Loss 8.2339 LearningRate 0.0695 Epoch: 3 Global Step: 55610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:23,108-Speed 9132.91 samples/sec Loss 8.1566 LearningRate 0.0695 Epoch: 3 Global Step: 55620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:24,199-Speed 9393.47 samples/sec Loss 8.2822 LearningRate 0.0694 Epoch: 3 Global Step: 55630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:25,289-Speed 9396.66 samples/sec Loss 8.3542 LearningRate 0.0694 Epoch: 3 Global Step: 55640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:26,429-Speed 8987.23 samples/sec Loss 8.2432 LearningRate 0.0694 Epoch: 3 Global Step: 55650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:27,508-Speed 9497.51 samples/sec Loss 8.1570 LearningRate 0.0694 Epoch: 3 Global Step: 55660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:28,551-Speed 9821.38 samples/sec Loss 8.2208 LearningRate 0.0694 Epoch: 3 Global Step: 55670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:29,590-Speed 9861.09 samples/sec Loss 8.1997 LearningRate 0.0694 Epoch: 3 Global Step: 55680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:30,655-Speed 9622.32 samples/sec Loss 8.2155 LearningRate 0.0694 Epoch: 3 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:31,726-Speed 9562.43 samples/sec Loss 8.1751 LearningRate 0.0694 Epoch: 3 Global Step: 55700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:32,772-Speed 9802.21 samples/sec Loss 8.2925 LearningRate 0.0694 Epoch: 3 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:33,853-Speed 9473.37 samples/sec Loss 8.2803 LearningRate 0.0694 Epoch: 3 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:34,942-Speed 9416.25 samples/sec Loss 8.2522 LearningRate 0.0694 Epoch: 3 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:46:36,014-Speed 9553.32 samples/sec Loss 8.2135 LearningRate 0.0694 Epoch: 3 Global Step: 55740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:37,106-Speed 9385.66 samples/sec Loss 8.2079 LearningRate 0.0694 Epoch: 3 Global Step: 55750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:38,187-Speed 9476.36 samples/sec Loss 8.2347 LearningRate 0.0694 Epoch: 3 Global Step: 55760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:39,284-Speed 9337.84 samples/sec Loss 8.3358 LearningRate 0.0694 Epoch: 3 Global Step: 55770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:40,367-Speed 9464.17 samples/sec Loss 8.2169 LearningRate 0.0694 Epoch: 3 Global Step: 55780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:41,467-Speed 9317.97 samples/sec Loss 8.2930 LearningRate 0.0694 Epoch: 3 Global Step: 55790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:42,527-Speed 9664.45 samples/sec Loss 8.2882 LearningRate 0.0694 Epoch: 3 Global Step: 55800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:43,627-Speed 9308.91 samples/sec Loss 8.2009 LearningRate 0.0694 Epoch: 3 Global Step: 55810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:44,735-Speed 9251.98 samples/sec Loss 8.1982 LearningRate 0.0694 Epoch: 3 Global Step: 55820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:45,794-Speed 9671.49 samples/sec Loss 8.2371 LearningRate 0.0693 Epoch: 3 Global Step: 55830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:46,890-Speed 9351.82 samples/sec Loss 8.2420 LearningRate 0.0693 Epoch: 3 Global Step: 55840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:47,976-Speed 9433.82 samples/sec Loss 8.1583 LearningRate 0.0693 Epoch: 3 Global Step: 55850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:49,094-Speed 9161.05 samples/sec Loss 8.2111 LearningRate 0.0693 Epoch: 3 Global Step: 55860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:50,158-Speed 9635.64 samples/sec Loss 8.3429 LearningRate 0.0693 Epoch: 3 Global Step: 55870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:51,242-Speed 9446.49 samples/sec Loss 8.2537 LearningRate 0.0693 Epoch: 3 Global Step: 55880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:52,353-Speed 9225.18 samples/sec Loss 8.1808 LearningRate 0.0693 Epoch: 3 Global Step: 55890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:53,414-Speed 9655.32 samples/sec Loss 8.1541 LearningRate 0.0693 Epoch: 3 Global Step: 55900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:54,463-Speed 9769.34 samples/sec Loss 8.3049 LearningRate 0.0693 Epoch: 3 Global Step: 55910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:55,499-Speed 9895.61 samples/sec Loss 8.1911 LearningRate 0.0693 Epoch: 3 Global Step: 55920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:56,581-Speed 9471.03 samples/sec Loss 8.2246 LearningRate 0.0693 Epoch: 3 Global Step: 55930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:57,673-Speed 9382.79 samples/sec Loss 8.2258 LearningRate 0.0693 Epoch: 3 Global Step: 55940 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:46:58,763-Speed 9396.17 samples/sec Loss 8.2825 LearningRate 0.0693 Epoch: 3 Global Step: 55950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:46:59,844-Speed 9479.17 samples/sec Loss 8.2862 LearningRate 0.0693 Epoch: 3 Global Step: 55960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:47:00,903-Speed 9671.68 samples/sec Loss 8.2384 LearningRate 0.0693 Epoch: 3 Global Step: 55970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:47:01,952-Speed 9771.01 samples/sec Loss 8.3176 LearningRate 0.0693 Epoch: 3 Global Step: 55980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:47:03,078-Speed 9101.04 samples/sec Loss 8.2950 LearningRate 0.0693 Epoch: 3 Global Step: 55990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:47:04,162-Speed 9454.38 samples/sec Loss 8.2364 LearningRate 0.0693 Epoch: 3 Global Step: 56000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:47:26,217-[lfw][56000]XNorm: 12.584653 Training: 2022-04-11 13:47:26,218-[lfw][56000]Accuracy-Flip: 0.99583+-0.00250 Training: 2022-04-11 13:47:26,218-[lfw][56000]Accuracy-Highest: 0.99583 Training: 2022-04-11 13:47:51,659-[cfp_fp][56000]XNorm: 10.578777 Training: 2022-04-11 13:47:51,660-[cfp_fp][56000]Accuracy-Flip: 0.95157+-0.00902 Training: 2022-04-11 13:47:51,660-[cfp_fp][56000]Accuracy-Highest: 0.95157 Training: 2022-04-11 13:48:13,659-[agedb_30][56000]XNorm: 12.140642 Training: 2022-04-11 13:48:13,659-[agedb_30][56000]Accuracy-Flip: 0.95767+-0.01146 Training: 2022-04-11 13:48:13,659-[agedb_30][56000]Accuracy-Highest: 0.95767 Training: 2022-04-11 13:48:14,722-Speed 145.13 samples/sec Loss 8.2175 LearningRate 0.0693 Epoch: 3 Global Step: 56010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:15,856-Speed 9036.18 samples/sec Loss 8.3754 LearningRate 0.0693 Epoch: 3 Global Step: 56020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:16,967-Speed 9222.54 samples/sec Loss 8.2073 LearningRate 0.0692 Epoch: 3 Global Step: 56030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:18,027-Speed 9671.62 samples/sec Loss 8.2540 LearningRate 0.0692 Epoch: 3 Global Step: 56040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:19,084-Speed 9693.01 samples/sec Loss 8.2931 LearningRate 0.0692 Epoch: 3 Global Step: 56050 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:48:20,154-Speed 9570.17 samples/sec Loss 8.1814 LearningRate 0.0692 Epoch: 3 Global Step: 56060 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:48:21,227-Speed 9548.65 samples/sec Loss 8.2391 LearningRate 0.0692 Epoch: 3 Global Step: 56070 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:48:22,288-Speed 9666.62 samples/sec Loss 8.1901 LearningRate 0.0692 Epoch: 3 Global Step: 56080 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:48:23,344-Speed 9703.54 samples/sec Loss 8.3303 LearningRate 0.0692 Epoch: 3 Global Step: 56090 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:48:24,415-Speed 9562.66 samples/sec Loss 8.3603 LearningRate 0.0692 Epoch: 3 Global Step: 56100 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:48:25,516-Speed 9309.70 samples/sec Loss 8.3050 LearningRate 0.0692 Epoch: 3 Global Step: 56110 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:48:26,552-Speed 9890.35 samples/sec Loss 8.2943 LearningRate 0.0692 Epoch: 3 Global Step: 56120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:27,631-Speed 9489.56 samples/sec Loss 8.2912 LearningRate 0.0692 Epoch: 3 Global Step: 56130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:28,708-Speed 9516.76 samples/sec Loss 8.2772 LearningRate 0.0692 Epoch: 3 Global Step: 56140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:29,780-Speed 9557.89 samples/sec Loss 8.1796 LearningRate 0.0692 Epoch: 3 Global Step: 56150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:30,863-Speed 9455.02 samples/sec Loss 8.2259 LearningRate 0.0692 Epoch: 3 Global Step: 56160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:31,917-Speed 9726.69 samples/sec Loss 8.0919 LearningRate 0.0692 Epoch: 3 Global Step: 56170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:32,982-Speed 9615.60 samples/sec Loss 8.1671 LearningRate 0.0692 Epoch: 3 Global Step: 56180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:34,047-Speed 9621.53 samples/sec Loss 8.2387 LearningRate 0.0692 Epoch: 3 Global Step: 56190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:35,076-Speed 9963.15 samples/sec Loss 8.2407 LearningRate 0.0692 Epoch: 3 Global Step: 56200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:36,128-Speed 9737.52 samples/sec Loss 8.3123 LearningRate 0.0692 Epoch: 3 Global Step: 56210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:37,199-Speed 9568.44 samples/sec Loss 8.3494 LearningRate 0.0692 Epoch: 3 Global Step: 56220 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:48:38,289-Speed 9402.57 samples/sec Loss 8.2229 LearningRate 0.0691 Epoch: 3 Global Step: 56230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:39,349-Speed 9662.47 samples/sec Loss 8.3374 LearningRate 0.0691 Epoch: 3 Global Step: 56240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:40,437-Speed 9415.76 samples/sec Loss 8.1772 LearningRate 0.0691 Epoch: 3 Global Step: 56250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:41,526-Speed 9405.69 samples/sec Loss 8.2230 LearningRate 0.0691 Epoch: 3 Global Step: 56260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:42,620-Speed 9370.35 samples/sec Loss 8.1382 LearningRate 0.0691 Epoch: 3 Global Step: 56270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:43,707-Speed 9422.24 samples/sec Loss 8.2392 LearningRate 0.0691 Epoch: 3 Global Step: 56280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:48:44,773-Speed 9613.34 samples/sec Loss 8.2373 LearningRate 0.0691 Epoch: 3 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:48:45,856-Speed 9467.54 samples/sec Loss 8.2678 LearningRate 0.0691 Epoch: 3 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:48:46,904-Speed 9779.22 samples/sec Loss 8.2167 LearningRate 0.0691 Epoch: 3 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:48:47,966-Speed 9644.00 samples/sec Loss 8.3252 LearningRate 0.0691 Epoch: 3 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:48:49,064-Speed 9331.13 samples/sec Loss 8.2782 LearningRate 0.0691 Epoch: 3 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:48:50,131-Speed 9599.78 samples/sec Loss 8.2287 LearningRate 0.0691 Epoch: 3 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:48:51,194-Speed 9638.74 samples/sec Loss 8.2681 LearningRate 0.0691 Epoch: 3 Global Step: 56350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:48:52,298-Speed 9282.50 samples/sec Loss 8.1541 LearningRate 0.0691 Epoch: 3 Global Step: 56360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:48:53,389-Speed 9395.06 samples/sec Loss 8.2121 LearningRate 0.0691 Epoch: 3 Global Step: 56370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:48:54,441-Speed 9739.34 samples/sec Loss 8.2134 LearningRate 0.0691 Epoch: 3 Global Step: 56380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:55,477-Speed 9887.49 samples/sec Loss 8.3420 LearningRate 0.0691 Epoch: 3 Global Step: 56390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:56,517-Speed 9850.58 samples/sec Loss 8.1431 LearningRate 0.0691 Epoch: 3 Global Step: 56400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:57,657-Speed 8992.75 samples/sec Loss 8.3403 LearningRate 0.0691 Epoch: 3 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:58,730-Speed 9545.76 samples/sec Loss 8.2620 LearningRate 0.0691 Epoch: 3 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:48:59,837-Speed 9256.22 samples/sec Loss 8.3387 LearningRate 0.0690 Epoch: 3 Global Step: 56430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:00,895-Speed 9682.54 samples/sec Loss 8.1751 LearningRate 0.0690 Epoch: 3 Global Step: 56440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:01,959-Speed 9631.07 samples/sec Loss 8.2234 LearningRate 0.0690 Epoch: 3 Global Step: 56450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:03,004-Speed 9806.20 samples/sec Loss 8.3616 LearningRate 0.0690 Epoch: 3 Global Step: 56460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:04,087-Speed 9458.42 samples/sec Loss 8.2580 LearningRate 0.0690 Epoch: 3 Global Step: 56470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:05,130-Speed 9827.60 samples/sec Loss 8.1728 LearningRate 0.0690 Epoch: 3 Global Step: 56480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:06,183-Speed 9730.00 samples/sec Loss 8.3297 LearningRate 0.0690 Epoch: 3 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:07,267-Speed 9456.45 samples/sec Loss 8.1411 LearningRate 0.0690 Epoch: 3 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:08,398-Speed 9053.72 samples/sec Loss 8.1026 LearningRate 0.0690 Epoch: 3 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:09,488-Speed 9399.91 samples/sec Loss 8.2663 LearningRate 0.0690 Epoch: 3 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:10,570-Speed 9470.15 samples/sec Loss 8.2314 LearningRate 0.0690 Epoch: 3 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:11,688-Speed 9162.75 samples/sec Loss 8.2626 LearningRate 0.0690 Epoch: 3 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:12,776-Speed 9416.88 samples/sec Loss 8.3499 LearningRate 0.0690 Epoch: 3 Global Step: 56550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:13,883-Speed 9256.44 samples/sec Loss 8.2857 LearningRate 0.0690 Epoch: 3 Global Step: 56560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:14,957-Speed 9540.78 samples/sec Loss 8.2273 LearningRate 0.0690 Epoch: 3 Global Step: 56570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:16,067-Speed 9238.23 samples/sec Loss 8.2749 LearningRate 0.0690 Epoch: 3 Global Step: 56580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:17,151-Speed 9451.74 samples/sec Loss 8.2050 LearningRate 0.0690 Epoch: 3 Global Step: 56590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:18,215-Speed 9632.76 samples/sec Loss 8.2375 LearningRate 0.0690 Epoch: 3 Global Step: 56600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:19,322-Speed 9254.80 samples/sec Loss 8.1025 LearningRate 0.0690 Epoch: 3 Global Step: 56610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:20,431-Speed 9240.96 samples/sec Loss 8.1848 LearningRate 0.0690 Epoch: 3 Global Step: 56620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:21,527-Speed 9345.29 samples/sec Loss 8.2327 LearningRate 0.0689 Epoch: 3 Global Step: 56630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:22,637-Speed 9239.36 samples/sec Loss 8.2814 LearningRate 0.0689 Epoch: 3 Global Step: 56640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:23,696-Speed 9671.46 samples/sec Loss 8.4072 LearningRate 0.0689 Epoch: 3 Global Step: 56650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:24,733-Speed 9888.82 samples/sec Loss 8.2530 LearningRate 0.0689 Epoch: 3 Global Step: 56660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:49:25,775-Speed 9827.25 samples/sec Loss 8.1619 LearningRate 0.0689 Epoch: 3 Global Step: 56670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:26,843-Speed 9598.48 samples/sec Loss 8.1726 LearningRate 0.0689 Epoch: 3 Global Step: 56680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:27,895-Speed 9737.98 samples/sec Loss 8.3184 LearningRate 0.0689 Epoch: 3 Global Step: 56690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:28,975-Speed 9480.03 samples/sec Loss 8.2833 LearningRate 0.0689 Epoch: 3 Global Step: 56700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:30,070-Speed 9362.63 samples/sec Loss 8.2683 LearningRate 0.0689 Epoch: 3 Global Step: 56710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:31,176-Speed 9258.92 samples/sec Loss 8.3372 LearningRate 0.0689 Epoch: 3 Global Step: 56720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:32,277-Speed 9310.49 samples/sec Loss 8.1456 LearningRate 0.0689 Epoch: 3 Global Step: 56730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:33,400-Speed 9119.14 samples/sec Loss 8.1361 LearningRate 0.0689 Epoch: 3 Global Step: 56740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:34,541-Speed 8981.71 samples/sec Loss 8.1742 LearningRate 0.0689 Epoch: 3 Global Step: 56750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:35,614-Speed 9552.64 samples/sec Loss 8.3604 LearningRate 0.0689 Epoch: 3 Global Step: 56760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:36,661-Speed 9789.07 samples/sec Loss 8.2959 LearningRate 0.0689 Epoch: 3 Global Step: 56770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:49:37,752-Speed 9391.50 samples/sec Loss 8.2062 LearningRate 0.0689 Epoch: 3 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:38,812-Speed 9661.11 samples/sec Loss 8.2344 LearningRate 0.0689 Epoch: 3 Global Step: 56790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:39,887-Speed 9535.13 samples/sec Loss 8.3294 LearningRate 0.0689 Epoch: 3 Global Step: 56800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:40,977-Speed 9403.72 samples/sec Loss 8.2467 LearningRate 0.0689 Epoch: 3 Global Step: 56810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:42,038-Speed 9650.44 samples/sec Loss 8.1932 LearningRate 0.0689 Epoch: 3 Global Step: 56820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:43,144-Speed 9263.35 samples/sec Loss 8.3676 LearningRate 0.0688 Epoch: 3 Global Step: 56830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:44,244-Speed 9315.34 samples/sec Loss 8.1714 LearningRate 0.0688 Epoch: 3 Global Step: 56840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:45,319-Speed 9539.85 samples/sec Loss 8.2324 LearningRate 0.0688 Epoch: 3 Global Step: 56850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:46,406-Speed 9425.47 samples/sec Loss 8.3718 LearningRate 0.0688 Epoch: 3 Global Step: 56860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:47,501-Speed 9360.63 samples/sec Loss 8.3005 LearningRate 0.0688 Epoch: 3 Global Step: 56870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:48,551-Speed 9755.79 samples/sec Loss 8.1982 LearningRate 0.0688 Epoch: 3 Global Step: 56880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:49,623-Speed 9558.68 samples/sec Loss 8.1985 LearningRate 0.0688 Epoch: 3 Global Step: 56890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:50,689-Speed 9606.28 samples/sec Loss 8.3011 LearningRate 0.0688 Epoch: 3 Global Step: 56900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:51,754-Speed 9628.01 samples/sec Loss 8.1984 LearningRate 0.0688 Epoch: 3 Global Step: 56910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:52,787-Speed 9916.72 samples/sec Loss 8.3061 LearningRate 0.0688 Epoch: 3 Global Step: 56920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:53,897-Speed 9227.65 samples/sec Loss 8.2753 LearningRate 0.0688 Epoch: 3 Global Step: 56930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:55,007-Speed 9238.65 samples/sec Loss 8.1945 LearningRate 0.0688 Epoch: 3 Global Step: 56940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:56,033-Speed 9984.03 samples/sec Loss 8.2925 LearningRate 0.0688 Epoch: 3 Global Step: 56950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:57,166-Speed 9037.73 samples/sec Loss 8.3263 LearningRate 0.0688 Epoch: 3 Global Step: 56960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:49:58,252-Speed 9438.41 samples/sec Loss 8.2449 LearningRate 0.0688 Epoch: 3 Global Step: 56970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:49:59,348-Speed 9351.96 samples/sec Loss 8.1806 LearningRate 0.0688 Epoch: 3 Global Step: 56980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:50:00,431-Speed 9462.00 samples/sec Loss 8.2993 LearningRate 0.0688 Epoch: 3 Global Step: 56990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:50:01,477-Speed 9789.50 samples/sec Loss 8.2252 LearningRate 0.0688 Epoch: 3 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:50:02,547-Speed 9573.82 samples/sec Loss 8.2488 LearningRate 0.0688 Epoch: 3 Global Step: 57010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:50:03,676-Speed 9078.53 samples/sec Loss 8.2208 LearningRate 0.0688 Epoch: 3 Global Step: 57020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:50:04,736-Speed 9671.25 samples/sec Loss 8.2910 LearningRate 0.0688 Epoch: 3 Global Step: 57030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:50:05,785-Speed 9771.96 samples/sec Loss 8.2857 LearningRate 0.0687 Epoch: 3 Global Step: 57040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:50:06,874-Speed 9400.18 samples/sec Loss 8.2635 LearningRate 0.0687 Epoch: 3 Global Step: 57050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:50:07,953-Speed 9501.53 samples/sec Loss 8.1876 LearningRate 0.0687 Epoch: 3 Global Step: 57060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:50:09,015-Speed 9647.41 samples/sec Loss 8.3294 LearningRate 0.0687 Epoch: 3 Global Step: 57070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:10,108-Speed 9374.41 samples/sec Loss 8.2449 LearningRate 0.0687 Epoch: 3 Global Step: 57080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:11,256-Speed 8924.46 samples/sec Loss 8.2571 LearningRate 0.0687 Epoch: 3 Global Step: 57090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:12,353-Speed 9340.21 samples/sec Loss 8.2616 LearningRate 0.0687 Epoch: 3 Global Step: 57100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:13,457-Speed 9274.47 samples/sec Loss 8.2397 LearningRate 0.0687 Epoch: 3 Global Step: 57110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:14,559-Speed 9295.78 samples/sec Loss 8.3236 LearningRate 0.0687 Epoch: 3 Global Step: 57120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:15,625-Speed 9618.46 samples/sec Loss 8.2456 LearningRate 0.0687 Epoch: 3 Global Step: 57130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:16,670-Speed 9799.45 samples/sec Loss 8.2730 LearningRate 0.0687 Epoch: 3 Global Step: 57140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:17,799-Speed 9082.65 samples/sec Loss 8.2903 LearningRate 0.0687 Epoch: 3 Global Step: 57150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:18,908-Speed 9240.52 samples/sec Loss 8.2137 LearningRate 0.0687 Epoch: 3 Global Step: 57160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:20,001-Speed 9369.06 samples/sec Loss 8.2945 LearningRate 0.0687 Epoch: 3 Global Step: 57170 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:50:21,128-Speed 9094.87 samples/sec Loss 8.2309 LearningRate 0.0687 Epoch: 3 Global Step: 57180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:22,264-Speed 9023.27 samples/sec Loss 8.3334 LearningRate 0.0687 Epoch: 3 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:23,328-Speed 9629.40 samples/sec Loss 8.1894 LearningRate 0.0687 Epoch: 3 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:24,385-Speed 9688.58 samples/sec Loss 8.2600 LearningRate 0.0687 Epoch: 3 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:25,457-Speed 9562.93 samples/sec Loss 8.2992 LearningRate 0.0687 Epoch: 3 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:26,539-Speed 9469.30 samples/sec Loss 8.2048 LearningRate 0.0687 Epoch: 3 Global Step: 57230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:27,603-Speed 9623.42 samples/sec Loss 8.1825 LearningRate 0.0686 Epoch: 3 Global Step: 57240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:28,673-Speed 9576.60 samples/sec Loss 8.1107 LearningRate 0.0686 Epoch: 3 Global Step: 57250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:29,748-Speed 9532.42 samples/sec Loss 8.1410 LearningRate 0.0686 Epoch: 3 Global Step: 57260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:30,825-Speed 9515.91 samples/sec Loss 8.1710 LearningRate 0.0686 Epoch: 3 Global Step: 57270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:31,894-Speed 9582.97 samples/sec Loss 8.2946 LearningRate 0.0686 Epoch: 3 Global Step: 57280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:32,975-Speed 9475.14 samples/sec Loss 8.2709 LearningRate 0.0686 Epoch: 3 Global Step: 57290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:34,043-Speed 9600.47 samples/sec Loss 8.1833 LearningRate 0.0686 Epoch: 3 Global Step: 57300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:35,082-Speed 9860.05 samples/sec Loss 8.2734 LearningRate 0.0686 Epoch: 3 Global Step: 57310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:36,186-Speed 9283.01 samples/sec Loss 8.3258 LearningRate 0.0686 Epoch: 3 Global Step: 57320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:37,240-Speed 9722.67 samples/sec Loss 8.2243 LearningRate 0.0686 Epoch: 3 Global Step: 57330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:38,320-Speed 9486.42 samples/sec Loss 8.3106 LearningRate 0.0686 Epoch: 3 Global Step: 57340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:39,377-Speed 9697.09 samples/sec Loss 8.3143 LearningRate 0.0686 Epoch: 3 Global Step: 57350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:40,469-Speed 9376.11 samples/sec Loss 8.1914 LearningRate 0.0686 Epoch: 3 Global Step: 57360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:41,566-Speed 9346.62 samples/sec Loss 8.2268 LearningRate 0.0686 Epoch: 3 Global Step: 57370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:42,613-Speed 9783.75 samples/sec Loss 8.1525 LearningRate 0.0686 Epoch: 3 Global Step: 57380 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:50:43,692-Speed 9494.64 samples/sec Loss 8.1967 LearningRate 0.0686 Epoch: 3 Global Step: 57390 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:50:44,732-Speed 9849.15 samples/sec Loss 8.1981 LearningRate 0.0686 Epoch: 3 Global Step: 57400 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:50:45,777-Speed 9811.61 samples/sec Loss 8.2706 LearningRate 0.0686 Epoch: 3 Global Step: 57410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:46,820-Speed 9826.60 samples/sec Loss 8.1864 LearningRate 0.0686 Epoch: 3 Global Step: 57420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:47,901-Speed 9472.84 samples/sec Loss 8.1682 LearningRate 0.0686 Epoch: 3 Global Step: 57430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:48,983-Speed 9470.02 samples/sec Loss 8.3930 LearningRate 0.0685 Epoch: 3 Global Step: 57440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:50,068-Speed 9441.60 samples/sec Loss 8.0900 LearningRate 0.0685 Epoch: 3 Global Step: 57450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:51,168-Speed 9315.68 samples/sec Loss 8.3599 LearningRate 0.0685 Epoch: 3 Global Step: 57460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:52,246-Speed 9506.64 samples/sec Loss 8.1893 LearningRate 0.0685 Epoch: 3 Global Step: 57470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:53,308-Speed 9651.66 samples/sec Loss 8.2281 LearningRate 0.0685 Epoch: 3 Global Step: 57480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:54,407-Speed 9321.41 samples/sec Loss 8.2125 LearningRate 0.0685 Epoch: 3 Global Step: 57490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:55,461-Speed 9727.89 samples/sec Loss 8.1828 LearningRate 0.0685 Epoch: 3 Global Step: 57500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:50:56,557-Speed 9348.90 samples/sec Loss 8.1656 LearningRate 0.0685 Epoch: 3 Global Step: 57510 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:50:57,634-Speed 9508.44 samples/sec Loss 8.2597 LearningRate 0.0685 Epoch: 3 Global Step: 57520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:50:58,712-Speed 9504.47 samples/sec Loss 8.1175 LearningRate 0.0685 Epoch: 3 Global Step: 57530 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:50:59,751-Speed 9860.36 samples/sec Loss 8.2035 LearningRate 0.0685 Epoch: 3 Global Step: 57540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:00,834-Speed 9459.57 samples/sec Loss 8.1739 LearningRate 0.0685 Epoch: 3 Global Step: 57550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:01,912-Speed 9505.14 samples/sec Loss 8.2602 LearningRate 0.0685 Epoch: 3 Global Step: 57560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:03,004-Speed 9382.03 samples/sec Loss 8.2983 LearningRate 0.0685 Epoch: 3 Global Step: 57570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:04,092-Speed 9419.29 samples/sec Loss 8.2384 LearningRate 0.0685 Epoch: 3 Global Step: 57580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:05,159-Speed 9606.39 samples/sec Loss 8.2910 LearningRate 0.0685 Epoch: 3 Global Step: 57590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:06,242-Speed 9461.82 samples/sec Loss 8.3047 LearningRate 0.0685 Epoch: 3 Global Step: 57600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:07,330-Speed 9410.03 samples/sec Loss 8.1520 LearningRate 0.0685 Epoch: 3 Global Step: 57610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:08,421-Speed 9393.94 samples/sec Loss 8.1481 LearningRate 0.0685 Epoch: 3 Global Step: 57620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:09,509-Speed 9418.23 samples/sec Loss 8.2881 LearningRate 0.0685 Epoch: 3 Global Step: 57630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:10,580-Speed 9571.30 samples/sec Loss 8.2558 LearningRate 0.0684 Epoch: 3 Global Step: 57640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:51:11,645-Speed 9618.10 samples/sec Loss 8.2118 LearningRate 0.0684 Epoch: 3 Global Step: 57650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:51:12,692-Speed 9783.79 samples/sec Loss 8.1958 LearningRate 0.0684 Epoch: 3 Global Step: 57660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:51:13,755-Speed 9637.02 samples/sec Loss 8.2653 LearningRate 0.0684 Epoch: 3 Global Step: 57670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:14,818-Speed 9643.61 samples/sec Loss 8.1787 LearningRate 0.0684 Epoch: 3 Global Step: 57680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:15,901-Speed 9466.75 samples/sec Loss 8.2074 LearningRate 0.0684 Epoch: 3 Global Step: 57690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:16,955-Speed 9716.06 samples/sec Loss 8.2620 LearningRate 0.0684 Epoch: 3 Global Step: 57700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:18,009-Speed 9723.25 samples/sec Loss 8.2156 LearningRate 0.0684 Epoch: 3 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:19,067-Speed 9686.64 samples/sec Loss 8.1959 LearningRate 0.0684 Epoch: 3 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:20,120-Speed 9723.27 samples/sec Loss 8.4180 LearningRate 0.0684 Epoch: 3 Global Step: 57730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:21,153-Speed 9921.38 samples/sec Loss 8.3363 LearningRate 0.0684 Epoch: 3 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:22,197-Speed 9820.91 samples/sec Loss 8.2204 LearningRate 0.0684 Epoch: 3 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:23,259-Speed 9646.21 samples/sec Loss 8.2027 LearningRate 0.0684 Epoch: 3 Global Step: 57760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:24,364-Speed 9273.07 samples/sec Loss 8.2323 LearningRate 0.0684 Epoch: 3 Global Step: 57770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:25,412-Speed 9779.46 samples/sec Loss 8.2760 LearningRate 0.0684 Epoch: 3 Global Step: 57780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:26,444-Speed 9928.83 samples/sec Loss 8.2825 LearningRate 0.0684 Epoch: 3 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:27,528-Speed 9452.03 samples/sec Loss 8.1519 LearningRate 0.0684 Epoch: 3 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:28,577-Speed 9760.30 samples/sec Loss 8.1819 LearningRate 0.0684 Epoch: 3 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:29,651-Speed 9545.94 samples/sec Loss 8.1758 LearningRate 0.0684 Epoch: 3 Global Step: 57820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:30,744-Speed 9367.96 samples/sec Loss 8.2411 LearningRate 0.0684 Epoch: 3 Global Step: 57830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:31,802-Speed 9687.51 samples/sec Loss 8.3019 LearningRate 0.0683 Epoch: 3 Global Step: 57840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:32,874-Speed 9558.14 samples/sec Loss 8.2496 LearningRate 0.0683 Epoch: 3 Global Step: 57850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:33,927-Speed 9727.26 samples/sec Loss 8.1924 LearningRate 0.0683 Epoch: 3 Global Step: 57860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:34,994-Speed 9609.84 samples/sec Loss 8.2486 LearningRate 0.0683 Epoch: 3 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:36,082-Speed 9420.99 samples/sec Loss 8.1880 LearningRate 0.0683 Epoch: 3 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:37,153-Speed 9560.84 samples/sec Loss 8.2408 LearningRate 0.0683 Epoch: 3 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:38,240-Speed 9433.11 samples/sec Loss 8.2585 LearningRate 0.0683 Epoch: 3 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:39,315-Speed 9527.43 samples/sec Loss 8.2087 LearningRate 0.0683 Epoch: 3 Global Step: 57910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:51:40,405-Speed 9397.29 samples/sec Loss 8.1691 LearningRate 0.0683 Epoch: 3 Global Step: 57920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:41,480-Speed 9533.96 samples/sec Loss 8.2250 LearningRate 0.0683 Epoch: 3 Global Step: 57930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:42,526-Speed 9791.80 samples/sec Loss 8.2069 LearningRate 0.0683 Epoch: 3 Global Step: 57940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:43,639-Speed 9203.16 samples/sec Loss 8.1621 LearningRate 0.0683 Epoch: 3 Global Step: 57950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:44,695-Speed 9707.89 samples/sec Loss 8.1693 LearningRate 0.0683 Epoch: 3 Global Step: 57960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:45,756-Speed 9653.64 samples/sec Loss 8.1566 LearningRate 0.0683 Epoch: 3 Global Step: 57970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:46,830-Speed 9544.88 samples/sec Loss 8.2533 LearningRate 0.0683 Epoch: 3 Global Step: 57980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:47,932-Speed 9293.37 samples/sec Loss 8.1696 LearningRate 0.0683 Epoch: 3 Global Step: 57990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:51:49,004-Speed 9556.06 samples/sec Loss 8.1338 LearningRate 0.0683 Epoch: 3 Global Step: 58000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:52:11,094-[lfw][58000]XNorm: 12.571057 Training: 2022-04-11 13:52:11,095-[lfw][58000]Accuracy-Flip: 0.99417+-0.00344 Training: 2022-04-11 13:52:11,095-[lfw][58000]Accuracy-Highest: 0.99583 Training: 2022-04-11 13:52:36,691-[cfp_fp][58000]XNorm: 10.607200 Training: 2022-04-11 13:52:36,692-[cfp_fp][58000]Accuracy-Flip: 0.94529+-0.00963 Training: 2022-04-11 13:52:36,692-[cfp_fp][58000]Accuracy-Highest: 0.95157 Training: 2022-04-11 13:52:58,682-[agedb_30][58000]XNorm: 12.151300 Training: 2022-04-11 13:52:58,682-[agedb_30][58000]Accuracy-Flip: 0.95550+-0.00860 Training: 2022-04-11 13:52:58,683-[agedb_30][58000]Accuracy-Highest: 0.95767 Training: 2022-04-11 13:52:59,750-Speed 144.75 samples/sec Loss 8.2142 LearningRate 0.0683 Epoch: 3 Global Step: 58010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:00,812-Speed 9650.21 samples/sec Loss 8.2860 LearningRate 0.0683 Epoch: 3 Global Step: 58020 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:53:01,930-Speed 9158.60 samples/sec Loss 8.1844 LearningRate 0.0683 Epoch: 3 Global Step: 58030 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:53:03,004-Speed 9538.73 samples/sec Loss 8.2519 LearningRate 0.0682 Epoch: 3 Global Step: 58040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:04,072-Speed 9596.62 samples/sec Loss 8.1966 LearningRate 0.0682 Epoch: 3 Global Step: 58050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:05,141-Speed 9584.67 samples/sec Loss 8.2792 LearningRate 0.0682 Epoch: 3 Global Step: 58060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:06,193-Speed 9740.97 samples/sec Loss 8.3177 LearningRate 0.0682 Epoch: 3 Global Step: 58070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:07,257-Speed 9629.85 samples/sec Loss 8.3574 LearningRate 0.0682 Epoch: 3 Global Step: 58080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:08,292-Speed 9900.13 samples/sec Loss 8.2578 LearningRate 0.0682 Epoch: 3 Global Step: 58090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:09,377-Speed 9440.31 samples/sec Loss 8.3774 LearningRate 0.0682 Epoch: 3 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:10,450-Speed 9548.23 samples/sec Loss 8.3156 LearningRate 0.0682 Epoch: 3 Global Step: 58110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:11,533-Speed 9466.35 samples/sec Loss 8.2913 LearningRate 0.0682 Epoch: 3 Global Step: 58120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:12,636-Speed 9288.11 samples/sec Loss 8.2234 LearningRate 0.0682 Epoch: 3 Global Step: 58130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:13,752-Speed 9179.40 samples/sec Loss 8.2083 LearningRate 0.0682 Epoch: 3 Global Step: 58140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:14,843-Speed 9391.65 samples/sec Loss 8.1485 LearningRate 0.0682 Epoch: 3 Global Step: 58150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:15,943-Speed 9318.59 samples/sec Loss 8.3310 LearningRate 0.0682 Epoch: 3 Global Step: 58160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:17,044-Speed 9303.84 samples/sec Loss 8.1786 LearningRate 0.0682 Epoch: 3 Global Step: 58170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:18,132-Speed 9418.03 samples/sec Loss 8.1466 LearningRate 0.0682 Epoch: 3 Global Step: 58180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:19,176-Speed 9806.20 samples/sec Loss 8.2116 LearningRate 0.0682 Epoch: 3 Global Step: 58190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:20,256-Speed 9492.74 samples/sec Loss 8.3131 LearningRate 0.0682 Epoch: 3 Global Step: 58200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:21,356-Speed 9316.08 samples/sec Loss 8.1196 LearningRate 0.0682 Epoch: 3 Global Step: 58210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:22,449-Speed 9373.26 samples/sec Loss 8.2109 LearningRate 0.0682 Epoch: 3 Global Step: 58220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:23,534-Speed 9442.76 samples/sec Loss 8.2694 LearningRate 0.0682 Epoch: 3 Global Step: 58230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:24,563-Speed 9955.39 samples/sec Loss 8.2508 LearningRate 0.0682 Epoch: 3 Global Step: 58240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:25,624-Speed 9658.99 samples/sec Loss 8.1522 LearningRate 0.0681 Epoch: 3 Global Step: 58250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:26,700-Speed 9525.60 samples/sec Loss 8.1460 LearningRate 0.0681 Epoch: 3 Global Step: 58260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:27,761-Speed 9669.90 samples/sec Loss 8.1902 LearningRate 0.0681 Epoch: 3 Global Step: 58270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:28,821-Speed 9668.09 samples/sec Loss 8.1854 LearningRate 0.0681 Epoch: 3 Global Step: 58280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:53:29,862-Speed 9837.42 samples/sec Loss 8.1874 LearningRate 0.0681 Epoch: 3 Global Step: 58290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:30,971-Speed 9240.35 samples/sec Loss 8.2396 LearningRate 0.0681 Epoch: 3 Global Step: 58300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:32,061-Speed 9397.72 samples/sec Loss 8.3257 LearningRate 0.0681 Epoch: 3 Global Step: 58310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:33,109-Speed 9780.51 samples/sec Loss 8.2743 LearningRate 0.0681 Epoch: 3 Global Step: 58320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:34,148-Speed 9856.46 samples/sec Loss 8.1804 LearningRate 0.0681 Epoch: 3 Global Step: 58330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:35,194-Speed 9798.08 samples/sec Loss 8.3136 LearningRate 0.0681 Epoch: 3 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:36,245-Speed 9747.80 samples/sec Loss 8.2070 LearningRate 0.0681 Epoch: 3 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:37,315-Speed 9576.12 samples/sec Loss 8.1999 LearningRate 0.0681 Epoch: 3 Global Step: 58360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:38,407-Speed 9385.25 samples/sec Loss 8.1329 LearningRate 0.0681 Epoch: 3 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:39,448-Speed 9841.47 samples/sec Loss 8.2773 LearningRate 0.0681 Epoch: 3 Global Step: 58380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:40,481-Speed 9917.78 samples/sec Loss 8.2536 LearningRate 0.0681 Epoch: 3 Global Step: 58390 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:53:41,595-Speed 9196.87 samples/sec Loss 8.1078 LearningRate 0.0681 Epoch: 3 Global Step: 58400 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:53:42,665-Speed 9576.62 samples/sec Loss 8.2312 LearningRate 0.0681 Epoch: 3 Global Step: 58410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:43,732-Speed 9609.93 samples/sec Loss 8.2873 LearningRate 0.0681 Epoch: 3 Global Step: 58420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:44,820-Speed 9416.80 samples/sec Loss 8.2808 LearningRate 0.0681 Epoch: 3 Global Step: 58430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:45,870-Speed 9755.04 samples/sec Loss 8.2142 LearningRate 0.0681 Epoch: 3 Global Step: 58440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:46,988-Speed 9164.56 samples/sec Loss 8.2592 LearningRate 0.0680 Epoch: 3 Global Step: 58450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:48,059-Speed 9567.50 samples/sec Loss 8.1102 LearningRate 0.0680 Epoch: 3 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:49,143-Speed 9455.22 samples/sec Loss 8.1939 LearningRate 0.0680 Epoch: 3 Global Step: 58470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:50,251-Speed 9244.73 samples/sec Loss 8.2053 LearningRate 0.0680 Epoch: 3 Global Step: 58480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:51,318-Speed 9611.95 samples/sec Loss 8.1196 LearningRate 0.0680 Epoch: 3 Global Step: 58490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:52,365-Speed 9780.29 samples/sec Loss 8.2036 LearningRate 0.0680 Epoch: 3 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:53,410-Speed 9811.74 samples/sec Loss 8.2012 LearningRate 0.0680 Epoch: 3 Global Step: 58510 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:53:54,484-Speed 9537.72 samples/sec Loss 8.1310 LearningRate 0.0680 Epoch: 3 Global Step: 58520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:53:55,543-Speed 9670.47 samples/sec Loss 8.2756 LearningRate 0.0680 Epoch: 3 Global Step: 58530 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:53:56,606-Speed 9643.75 samples/sec Loss 8.2380 LearningRate 0.0680 Epoch: 3 Global Step: 58540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:57,703-Speed 9346.03 samples/sec Loss 8.2105 LearningRate 0.0680 Epoch: 3 Global Step: 58550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:58,803-Speed 9315.10 samples/sec Loss 8.2918 LearningRate 0.0680 Epoch: 3 Global Step: 58560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:53:59,881-Speed 9502.46 samples/sec Loss 8.2034 LearningRate 0.0680 Epoch: 3 Global Step: 58570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:00,937-Speed 9703.32 samples/sec Loss 8.2036 LearningRate 0.0680 Epoch: 3 Global Step: 58580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:02,024-Speed 9417.47 samples/sec Loss 8.2412 LearningRate 0.0680 Epoch: 3 Global Step: 58590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:03,047-Speed 10019.63 samples/sec Loss 8.1909 LearningRate 0.0680 Epoch: 3 Global Step: 58600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:04,097-Speed 9761.10 samples/sec Loss 8.2324 LearningRate 0.0680 Epoch: 3 Global Step: 58610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:05,217-Speed 9155.15 samples/sec Loss 8.1736 LearningRate 0.0680 Epoch: 3 Global Step: 58620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:06,290-Speed 9547.82 samples/sec Loss 8.0931 LearningRate 0.0680 Epoch: 3 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:07,333-Speed 9822.17 samples/sec Loss 8.1754 LearningRate 0.0680 Epoch: 3 Global Step: 58640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:54:08,385-Speed 9742.47 samples/sec Loss 8.3889 LearningRate 0.0679 Epoch: 3 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:09,418-Speed 9914.85 samples/sec Loss 8.2307 LearningRate 0.0679 Epoch: 3 Global Step: 58660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:10,489-Speed 9572.90 samples/sec Loss 8.1945 LearningRate 0.0679 Epoch: 3 Global Step: 58670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:11,565-Speed 9526.42 samples/sec Loss 8.3482 LearningRate 0.0679 Epoch: 3 Global Step: 58680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:12,638-Speed 9547.06 samples/sec Loss 8.2956 LearningRate 0.0679 Epoch: 3 Global Step: 58690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:13,686-Speed 9773.24 samples/sec Loss 8.2432 LearningRate 0.0679 Epoch: 3 Global Step: 58700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:14,791-Speed 9270.71 samples/sec Loss 8.2989 LearningRate 0.0679 Epoch: 3 Global Step: 58710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:15,862-Speed 9571.02 samples/sec Loss 8.2022 LearningRate 0.0679 Epoch: 3 Global Step: 58720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:16,905-Speed 9819.22 samples/sec Loss 8.1376 LearningRate 0.0679 Epoch: 3 Global Step: 58730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:18,069-Speed 8806.91 samples/sec Loss 8.1356 LearningRate 0.0679 Epoch: 3 Global Step: 58740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:19,137-Speed 9589.90 samples/sec Loss 8.2520 LearningRate 0.0679 Epoch: 3 Global Step: 58750 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:54:20,167-Speed 9946.17 samples/sec Loss 8.1845 LearningRate 0.0679 Epoch: 3 Global Step: 58760 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:54:21,242-Speed 9536.69 samples/sec Loss 8.3549 LearningRate 0.0679 Epoch: 3 Global Step: 58770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:22,330-Speed 9419.43 samples/sec Loss 8.2401 LearningRate 0.0679 Epoch: 3 Global Step: 58780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:23,428-Speed 9332.05 samples/sec Loss 8.1098 LearningRate 0.0679 Epoch: 3 Global Step: 58790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:24,528-Speed 9316.41 samples/sec Loss 8.2585 LearningRate 0.0679 Epoch: 3 Global Step: 58800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:25,637-Speed 9237.09 samples/sec Loss 8.2660 LearningRate 0.0679 Epoch: 3 Global Step: 58810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:26,675-Speed 9872.97 samples/sec Loss 8.2171 LearningRate 0.0679 Epoch: 3 Global Step: 58820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:27,773-Speed 9330.70 samples/sec Loss 8.1113 LearningRate 0.0679 Epoch: 3 Global Step: 58830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:28,846-Speed 9546.30 samples/sec Loss 8.0477 LearningRate 0.0679 Epoch: 3 Global Step: 58840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:29,903-Speed 9700.49 samples/sec Loss 8.1923 LearningRate 0.0678 Epoch: 3 Global Step: 58850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:30,953-Speed 9751.67 samples/sec Loss 8.2329 LearningRate 0.0678 Epoch: 3 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:32,030-Speed 9514.01 samples/sec Loss 8.1654 LearningRate 0.0678 Epoch: 3 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:33,117-Speed 9428.89 samples/sec Loss 8.1751 LearningRate 0.0678 Epoch: 3 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:34,180-Speed 9640.47 samples/sec Loss 8.2501 LearningRate 0.0678 Epoch: 3 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:35,243-Speed 9633.30 samples/sec Loss 8.0985 LearningRate 0.0678 Epoch: 3 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:36,290-Speed 9791.79 samples/sec Loss 8.1078 LearningRate 0.0678 Epoch: 3 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:37,356-Speed 9604.70 samples/sec Loss 8.3030 LearningRate 0.0678 Epoch: 3 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:38,476-Speed 9151.38 samples/sec Loss 8.2042 LearningRate 0.0678 Epoch: 3 Global Step: 58930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:39,594-Speed 9160.57 samples/sec Loss 8.2453 LearningRate 0.0678 Epoch: 3 Global Step: 58940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:40,654-Speed 9674.83 samples/sec Loss 8.2320 LearningRate 0.0678 Epoch: 3 Global Step: 58950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:41,716-Speed 9644.71 samples/sec Loss 8.1395 LearningRate 0.0678 Epoch: 3 Global Step: 58960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:42,757-Speed 9841.65 samples/sec Loss 8.1925 LearningRate 0.0678 Epoch: 3 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:43,806-Speed 9768.06 samples/sec Loss 8.2020 LearningRate 0.0678 Epoch: 3 Global Step: 58980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:44,856-Speed 9759.97 samples/sec Loss 8.2834 LearningRate 0.0678 Epoch: 3 Global Step: 58990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:45,930-Speed 9544.54 samples/sec Loss 8.2290 LearningRate 0.0678 Epoch: 3 Global Step: 59000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:47,055-Speed 9109.17 samples/sec Loss 8.2753 LearningRate 0.0678 Epoch: 3 Global Step: 59010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:48,134-Speed 9498.31 samples/sec Loss 8.2023 LearningRate 0.0678 Epoch: 3 Global Step: 59020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:49,224-Speed 9396.53 samples/sec Loss 8.1716 LearningRate 0.0678 Epoch: 3 Global Step: 59030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:50,311-Speed 9427.64 samples/sec Loss 8.3371 LearningRate 0.0678 Epoch: 3 Global Step: 59040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:51,384-Speed 9545.73 samples/sec Loss 8.1204 LearningRate 0.0678 Epoch: 3 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:52,472-Speed 9417.81 samples/sec Loss 8.2890 LearningRate 0.0677 Epoch: 3 Global Step: 59060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:54:53,550-Speed 9504.73 samples/sec Loss 8.2348 LearningRate 0.0677 Epoch: 3 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:54,616-Speed 9609.24 samples/sec Loss 8.2282 LearningRate 0.0677 Epoch: 3 Global Step: 59080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:55,688-Speed 9557.45 samples/sec Loss 8.3381 LearningRate 0.0677 Epoch: 3 Global Step: 59090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:56,810-Speed 9134.36 samples/sec Loss 8.0720 LearningRate 0.0677 Epoch: 3 Global Step: 59100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:57,914-Speed 9281.71 samples/sec Loss 8.1578 LearningRate 0.0677 Epoch: 3 Global Step: 59110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:54:59,013-Speed 9320.50 samples/sec Loss 8.2260 LearningRate 0.0677 Epoch: 3 Global Step: 59120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:00,084-Speed 9569.43 samples/sec Loss 8.3199 LearningRate 0.0677 Epoch: 3 Global Step: 59130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:01,195-Speed 9224.56 samples/sec Loss 8.0697 LearningRate 0.0677 Epoch: 3 Global Step: 59140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:02,277-Speed 9468.78 samples/sec Loss 8.1813 LearningRate 0.0677 Epoch: 3 Global Step: 59150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:03,384-Speed 9259.93 samples/sec Loss 8.2070 LearningRate 0.0677 Epoch: 3 Global Step: 59160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:04,415-Speed 9933.83 samples/sec Loss 8.1996 LearningRate 0.0677 Epoch: 3 Global Step: 59170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:05,457-Speed 9835.92 samples/sec Loss 8.2304 LearningRate 0.0677 Epoch: 3 Global Step: 59180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:06,511-Speed 9722.56 samples/sec Loss 8.2512 LearningRate 0.0677 Epoch: 3 Global Step: 59190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:07,558-Speed 9785.60 samples/sec Loss 8.2318 LearningRate 0.0677 Epoch: 3 Global Step: 59200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:55:08,644-Speed 9434.07 samples/sec Loss 8.2829 LearningRate 0.0677 Epoch: 3 Global Step: 59210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:55:09,693-Speed 9765.39 samples/sec Loss 8.2146 LearningRate 0.0677 Epoch: 3 Global Step: 59220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:55:10,782-Speed 9408.64 samples/sec Loss 8.2648 LearningRate 0.0677 Epoch: 3 Global Step: 59230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:55:11,882-Speed 9312.07 samples/sec Loss 8.2018 LearningRate 0.0677 Epoch: 3 Global Step: 59240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:55:12,970-Speed 9418.52 samples/sec Loss 8.2239 LearningRate 0.0677 Epoch: 3 Global Step: 59250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:55:14,061-Speed 9390.37 samples/sec Loss 8.2146 LearningRate 0.0676 Epoch: 3 Global Step: 59260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:55:15,136-Speed 9531.31 samples/sec Loss 8.3030 LearningRate 0.0676 Epoch: 3 Global Step: 59270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:55:16,183-Speed 9789.53 samples/sec Loss 8.3157 LearningRate 0.0676 Epoch: 3 Global Step: 59280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:55:17,243-Speed 9666.89 samples/sec Loss 8.2096 LearningRate 0.0676 Epoch: 3 Global Step: 59290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:55:18,288-Speed 9804.28 samples/sec Loss 8.1299 LearningRate 0.0676 Epoch: 3 Global Step: 59300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:19,350-Speed 9655.42 samples/sec Loss 8.2640 LearningRate 0.0676 Epoch: 3 Global Step: 59310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:20,446-Speed 9341.66 samples/sec Loss 8.1626 LearningRate 0.0676 Epoch: 3 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:21,496-Speed 9763.28 samples/sec Loss 8.2417 LearningRate 0.0676 Epoch: 3 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:22,577-Speed 9474.98 samples/sec Loss 8.1994 LearningRate 0.0676 Epoch: 3 Global Step: 59340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:23,655-Speed 9505.28 samples/sec Loss 8.1792 LearningRate 0.0676 Epoch: 3 Global Step: 59350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:24,719-Speed 9629.20 samples/sec Loss 8.1219 LearningRate 0.0676 Epoch: 3 Global Step: 59360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:25,775-Speed 9712.87 samples/sec Loss 8.1714 LearningRate 0.0676 Epoch: 3 Global Step: 59370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:26,850-Speed 9522.41 samples/sec Loss 8.2284 LearningRate 0.0676 Epoch: 3 Global Step: 59380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:27,917-Speed 9608.13 samples/sec Loss 8.2127 LearningRate 0.0676 Epoch: 3 Global Step: 59390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:28,949-Speed 9923.51 samples/sec Loss 8.2608 LearningRate 0.0676 Epoch: 3 Global Step: 59400 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:55:29,986-Speed 9878.09 samples/sec Loss 8.2129 LearningRate 0.0676 Epoch: 3 Global Step: 59410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:31,098-Speed 9214.47 samples/sec Loss 8.1594 LearningRate 0.0676 Epoch: 3 Global Step: 59420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:32,178-Speed 9489.67 samples/sec Loss 8.2327 LearningRate 0.0676 Epoch: 3 Global Step: 59430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:33,269-Speed 9393.74 samples/sec Loss 8.3130 LearningRate 0.0676 Epoch: 3 Global Step: 59440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:34,397-Speed 9089.54 samples/sec Loss 8.2503 LearningRate 0.0676 Epoch: 3 Global Step: 59450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:35,519-Speed 9129.08 samples/sec Loss 8.0988 LearningRate 0.0675 Epoch: 3 Global Step: 59460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:36,630-Speed 9223.18 samples/sec Loss 8.2605 LearningRate 0.0675 Epoch: 3 Global Step: 59470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:37,704-Speed 9540.60 samples/sec Loss 8.0923 LearningRate 0.0675 Epoch: 3 Global Step: 59480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:38,832-Speed 9079.72 samples/sec Loss 8.1701 LearningRate 0.0675 Epoch: 3 Global Step: 59490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:39,910-Speed 9510.43 samples/sec Loss 8.3050 LearningRate 0.0675 Epoch: 3 Global Step: 59500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:40,996-Speed 9435.39 samples/sec Loss 8.4186 LearningRate 0.0675 Epoch: 3 Global Step: 59510 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:55:42,054-Speed 9682.25 samples/sec Loss 8.1598 LearningRate 0.0675 Epoch: 3 Global Step: 59520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:55:43,127-Speed 9543.46 samples/sec Loss 8.2410 LearningRate 0.0675 Epoch: 3 Global Step: 59530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:44,150-Speed 10024.30 samples/sec Loss 8.1485 LearningRate 0.0675 Epoch: 3 Global Step: 59540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:45,232-Speed 9466.40 samples/sec Loss 8.1954 LearningRate 0.0675 Epoch: 3 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:46,323-Speed 9391.79 samples/sec Loss 8.1898 LearningRate 0.0675 Epoch: 3 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:47,381-Speed 9684.76 samples/sec Loss 8.2012 LearningRate 0.0675 Epoch: 3 Global Step: 59570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:48,465-Speed 9450.72 samples/sec Loss 8.2984 LearningRate 0.0675 Epoch: 3 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:49,556-Speed 9392.61 samples/sec Loss 8.1415 LearningRate 0.0675 Epoch: 3 Global Step: 59590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:50,679-Speed 9122.60 samples/sec Loss 8.2184 LearningRate 0.0675 Epoch: 3 Global Step: 59600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:51,758-Speed 9501.08 samples/sec Loss 8.2612 LearningRate 0.0675 Epoch: 3 Global Step: 59610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:52,829-Speed 9563.46 samples/sec Loss 8.0902 LearningRate 0.0675 Epoch: 3 Global Step: 59620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:53,947-Speed 9165.15 samples/sec Loss 8.2069 LearningRate 0.0675 Epoch: 3 Global Step: 59630 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:55:55,046-Speed 9319.85 samples/sec Loss 8.1649 LearningRate 0.0675 Epoch: 3 Global Step: 59640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:55:56,139-Speed 9380.22 samples/sec Loss 8.2938 LearningRate 0.0675 Epoch: 3 Global Step: 59650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:55:57,259-Speed 9144.41 samples/sec Loss 8.1878 LearningRate 0.0675 Epoch: 3 Global Step: 59660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:58,318-Speed 9680.99 samples/sec Loss 8.0856 LearningRate 0.0674 Epoch: 3 Global Step: 59670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:55:59,402-Speed 9454.37 samples/sec Loss 8.2644 LearningRate 0.0674 Epoch: 3 Global Step: 59680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:00,463-Speed 9653.46 samples/sec Loss 8.3345 LearningRate 0.0674 Epoch: 3 Global Step: 59690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:01,510-Speed 9789.77 samples/sec Loss 8.1195 LearningRate 0.0674 Epoch: 3 Global Step: 59700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:02,597-Speed 9429.33 samples/sec Loss 8.1911 LearningRate 0.0674 Epoch: 3 Global Step: 59710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:03,623-Speed 9982.97 samples/sec Loss 8.1607 LearningRate 0.0674 Epoch: 3 Global Step: 59720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:04,740-Speed 9177.76 samples/sec Loss 8.0736 LearningRate 0.0674 Epoch: 3 Global Step: 59730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:05,843-Speed 9287.69 samples/sec Loss 8.0594 LearningRate 0.0674 Epoch: 3 Global Step: 59740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:06,972-Speed 9069.91 samples/sec Loss 8.2433 LearningRate 0.0674 Epoch: 3 Global Step: 59750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:08,048-Speed 9524.94 samples/sec Loss 8.2186 LearningRate 0.0674 Epoch: 3 Global Step: 59760 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:56:09,161-Speed 9206.12 samples/sec Loss 8.1919 LearningRate 0.0674 Epoch: 3 Global Step: 59770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:10,251-Speed 9397.33 samples/sec Loss 8.1611 LearningRate 0.0674 Epoch: 3 Global Step: 59780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:11,357-Speed 9264.46 samples/sec Loss 8.3718 LearningRate 0.0674 Epoch: 3 Global Step: 59790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:12,460-Speed 9292.52 samples/sec Loss 8.2131 LearningRate 0.0674 Epoch: 3 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:56:13,565-Speed 9270.31 samples/sec Loss 8.2430 LearningRate 0.0674 Epoch: 3 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:56:14,606-Speed 9841.78 samples/sec Loss 8.2228 LearningRate 0.0674 Epoch: 3 Global Step: 59820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:56:15,683-Speed 9519.49 samples/sec Loss 8.2522 LearningRate 0.0674 Epoch: 3 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:56:16,794-Speed 9219.73 samples/sec Loss 8.1686 LearningRate 0.0674 Epoch: 3 Global Step: 59840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:56:17,871-Speed 9518.16 samples/sec Loss 8.2845 LearningRate 0.0674 Epoch: 3 Global Step: 59850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:56:18,955-Speed 9451.99 samples/sec Loss 8.2407 LearningRate 0.0674 Epoch: 3 Global Step: 59860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:56:20,023-Speed 9590.92 samples/sec Loss 8.1989 LearningRate 0.0673 Epoch: 3 Global Step: 59870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:56:21,076-Speed 9731.42 samples/sec Loss 8.1332 LearningRate 0.0673 Epoch: 3 Global Step: 59880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:56:22,172-Speed 9344.99 samples/sec Loss 8.1788 LearningRate 0.0673 Epoch: 3 Global Step: 59890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 13:56:23,282-Speed 9231.34 samples/sec Loss 8.1157 LearningRate 0.0673 Epoch: 3 Global Step: 59900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:24,338-Speed 9698.41 samples/sec Loss 8.2149 LearningRate 0.0673 Epoch: 3 Global Step: 59910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:25,397-Speed 9680.47 samples/sec Loss 8.1941 LearningRate 0.0673 Epoch: 3 Global Step: 59920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:26,473-Speed 9526.66 samples/sec Loss 8.2784 LearningRate 0.0673 Epoch: 3 Global Step: 59930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:27,606-Speed 9042.57 samples/sec Loss 8.2506 LearningRate 0.0673 Epoch: 3 Global Step: 59940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:28,682-Speed 9522.19 samples/sec Loss 8.2880 LearningRate 0.0673 Epoch: 3 Global Step: 59950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:29,818-Speed 9016.16 samples/sec Loss 8.2768 LearningRate 0.0673 Epoch: 3 Global Step: 59960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:30,869-Speed 9745.46 samples/sec Loss 8.3197 LearningRate 0.0673 Epoch: 3 Global Step: 59970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:31,922-Speed 9735.02 samples/sec Loss 8.2464 LearningRate 0.0673 Epoch: 3 Global Step: 59980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:33,051-Speed 9074.79 samples/sec Loss 8.2042 LearningRate 0.0673 Epoch: 3 Global Step: 59990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:56:34,121-Speed 9580.05 samples/sec Loss 8.0757 LearningRate 0.0673 Epoch: 3 Global Step: 60000 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:56:55,829-[lfw][60000]XNorm: 12.664305 Training: 2022-04-11 13:56:55,830-[lfw][60000]Accuracy-Flip: 0.99500+-0.00269 Training: 2022-04-11 13:56:55,830-[lfw][60000]Accuracy-Highest: 0.99583 Training: 2022-04-11 13:57:20,917-[cfp_fp][60000]XNorm: 10.744781 Training: 2022-04-11 13:57:20,918-[cfp_fp][60000]Accuracy-Flip: 0.94186+-0.01319 Training: 2022-04-11 13:57:20,918-[cfp_fp][60000]Accuracy-Highest: 0.95157 Training: 2022-04-11 13:57:42,536-[agedb_30][60000]XNorm: 12.319594 Training: 2022-04-11 13:57:42,537-[agedb_30][60000]Accuracy-Flip: 0.95683+-0.01026 Training: 2022-04-11 13:57:42,537-[agedb_30][60000]Accuracy-Highest: 0.95767 Training: 2022-04-11 13:57:43,602-Speed 147.38 samples/sec Loss 8.1429 LearningRate 0.0673 Epoch: 3 Global Step: 60010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:44,695-Speed 9373.62 samples/sec Loss 8.0796 LearningRate 0.0673 Epoch: 3 Global Step: 60020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:45,781-Speed 9436.70 samples/sec Loss 8.1655 LearningRate 0.0673 Epoch: 3 Global Step: 60030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:46,863-Speed 9466.61 samples/sec Loss 8.2024 LearningRate 0.0673 Epoch: 3 Global Step: 60040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:47,964-Speed 9309.11 samples/sec Loss 8.1506 LearningRate 0.0673 Epoch: 3 Global Step: 60050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:49,034-Speed 9575.62 samples/sec Loss 8.2054 LearningRate 0.0673 Epoch: 3 Global Step: 60060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:50,093-Speed 9681.49 samples/sec Loss 8.3502 LearningRate 0.0672 Epoch: 3 Global Step: 60070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:51,158-Speed 9616.51 samples/sec Loss 7.9952 LearningRate 0.0672 Epoch: 3 Global Step: 60080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:52,235-Speed 9512.61 samples/sec Loss 8.2210 LearningRate 0.0672 Epoch: 3 Global Step: 60090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:53,322-Speed 9426.74 samples/sec Loss 8.0356 LearningRate 0.0672 Epoch: 3 Global Step: 60100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:54,358-Speed 9887.01 samples/sec Loss 8.2132 LearningRate 0.0672 Epoch: 3 Global Step: 60110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:55,385-Speed 9979.40 samples/sec Loss 8.0572 LearningRate 0.0672 Epoch: 3 Global Step: 60120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:56,444-Speed 9683.28 samples/sec Loss 8.0905 LearningRate 0.0672 Epoch: 3 Global Step: 60130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:57,513-Speed 9577.74 samples/sec Loss 8.1944 LearningRate 0.0672 Epoch: 3 Global Step: 60140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:58,620-Speed 9259.48 samples/sec Loss 8.0852 LearningRate 0.0672 Epoch: 3 Global Step: 60150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:57:59,707-Speed 9423.64 samples/sec Loss 8.2073 LearningRate 0.0672 Epoch: 3 Global Step: 60160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:00,818-Speed 9223.20 samples/sec Loss 8.2171 LearningRate 0.0672 Epoch: 3 Global Step: 60170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:01,900-Speed 9472.68 samples/sec Loss 8.2912 LearningRate 0.0672 Epoch: 3 Global Step: 60180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:02,985-Speed 9448.43 samples/sec Loss 8.1360 LearningRate 0.0672 Epoch: 3 Global Step: 60190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:04,026-Speed 9840.52 samples/sec Loss 8.1143 LearningRate 0.0672 Epoch: 3 Global Step: 60200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:05,131-Speed 9273.59 samples/sec Loss 8.1945 LearningRate 0.0672 Epoch: 3 Global Step: 60210 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:58:06,181-Speed 9763.27 samples/sec Loss 8.1031 LearningRate 0.0672 Epoch: 3 Global Step: 60220 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:58:07,257-Speed 9520.20 samples/sec Loss 8.1236 LearningRate 0.0672 Epoch: 3 Global Step: 60230 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:58:08,319-Speed 9649.64 samples/sec Loss 8.2286 LearningRate 0.0672 Epoch: 3 Global Step: 60240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:09,426-Speed 9249.67 samples/sec Loss 8.1865 LearningRate 0.0672 Epoch: 3 Global Step: 60250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:10,498-Speed 9564.82 samples/sec Loss 8.2261 LearningRate 0.0672 Epoch: 3 Global Step: 60260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:11,567-Speed 9581.86 samples/sec Loss 8.1787 LearningRate 0.0672 Epoch: 3 Global Step: 60270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:12,651-Speed 9452.14 samples/sec Loss 8.1810 LearningRate 0.0671 Epoch: 3 Global Step: 60280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:13,722-Speed 9560.71 samples/sec Loss 8.1051 LearningRate 0.0671 Epoch: 3 Global Step: 60290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:14,778-Speed 9703.97 samples/sec Loss 8.0494 LearningRate 0.0671 Epoch: 3 Global Step: 60300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:15,855-Speed 9514.33 samples/sec Loss 8.0704 LearningRate 0.0671 Epoch: 3 Global Step: 60310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:16,940-Speed 9446.94 samples/sec Loss 8.0352 LearningRate 0.0671 Epoch: 3 Global Step: 60320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:18,049-Speed 9237.70 samples/sec Loss 8.1444 LearningRate 0.0671 Epoch: 3 Global Step: 60330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:19,102-Speed 9732.58 samples/sec Loss 8.1090 LearningRate 0.0671 Epoch: 3 Global Step: 60340 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:58:20,152-Speed 9761.77 samples/sec Loss 8.3264 LearningRate 0.0671 Epoch: 3 Global Step: 60350 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:58:21,243-Speed 9385.91 samples/sec Loss 8.0773 LearningRate 0.0671 Epoch: 3 Global Step: 60360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:22,386-Speed 8970.33 samples/sec Loss 8.1654 LearningRate 0.0671 Epoch: 3 Global Step: 60370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:23,474-Speed 9411.94 samples/sec Loss 8.0406 LearningRate 0.0671 Epoch: 3 Global Step: 60380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:24,548-Speed 9546.22 samples/sec Loss 8.1600 LearningRate 0.0671 Epoch: 3 Global Step: 60390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:25,599-Speed 9744.27 samples/sec Loss 8.1617 LearningRate 0.0671 Epoch: 3 Global Step: 60400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:26,672-Speed 9556.51 samples/sec Loss 8.1701 LearningRate 0.0671 Epoch: 3 Global Step: 60410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:27,752-Speed 9482.38 samples/sec Loss 8.1743 LearningRate 0.0671 Epoch: 3 Global Step: 60420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:28,847-Speed 9361.46 samples/sec Loss 8.2349 LearningRate 0.0671 Epoch: 3 Global Step: 60430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:29,913-Speed 9611.09 samples/sec Loss 8.1078 LearningRate 0.0671 Epoch: 3 Global Step: 60440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:30,959-Speed 9796.95 samples/sec Loss 8.0715 LearningRate 0.0671 Epoch: 3 Global Step: 60450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:32,032-Speed 9541.18 samples/sec Loss 8.1914 LearningRate 0.0671 Epoch: 3 Global Step: 60460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:33,159-Speed 9096.49 samples/sec Loss 8.2338 LearningRate 0.0671 Epoch: 3 Global Step: 60470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:34,275-Speed 9188.26 samples/sec Loss 8.0229 LearningRate 0.0670 Epoch: 3 Global Step: 60480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:35,365-Speed 9398.05 samples/sec Loss 8.2337 LearningRate 0.0670 Epoch: 3 Global Step: 60490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:36,443-Speed 9505.19 samples/sec Loss 8.1606 LearningRate 0.0670 Epoch: 3 Global Step: 60500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:37,554-Speed 9219.85 samples/sec Loss 8.1952 LearningRate 0.0670 Epoch: 3 Global Step: 60510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:38,660-Speed 9265.29 samples/sec Loss 8.1749 LearningRate 0.0670 Epoch: 3 Global Step: 60520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:39,764-Speed 9281.21 samples/sec Loss 8.1341 LearningRate 0.0670 Epoch: 3 Global Step: 60530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:40,833-Speed 9579.66 samples/sec Loss 8.1027 LearningRate 0.0670 Epoch: 3 Global Step: 60540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:41,906-Speed 9547.52 samples/sec Loss 8.1940 LearningRate 0.0670 Epoch: 3 Global Step: 60550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:42,989-Speed 9463.70 samples/sec Loss 8.2067 LearningRate 0.0670 Epoch: 3 Global Step: 60560 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:58:44,058-Speed 9578.37 samples/sec Loss 8.0477 LearningRate 0.0670 Epoch: 3 Global Step: 60570 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:58:45,105-Speed 9796.70 samples/sec Loss 8.2121 LearningRate 0.0670 Epoch: 3 Global Step: 60580 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:58:46,219-Speed 9197.76 samples/sec Loss 8.2032 LearningRate 0.0670 Epoch: 3 Global Step: 60590 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:58:47,270-Speed 9743.56 samples/sec Loss 8.2739 LearningRate 0.0670 Epoch: 3 Global Step: 60600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:48,327-Speed 9692.75 samples/sec Loss 8.1615 LearningRate 0.0670 Epoch: 3 Global Step: 60610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:49,425-Speed 9336.23 samples/sec Loss 8.0659 LearningRate 0.0670 Epoch: 3 Global Step: 60620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:50,515-Speed 9401.56 samples/sec Loss 8.1599 LearningRate 0.0670 Epoch: 3 Global Step: 60630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:51,597-Speed 9465.50 samples/sec Loss 8.2498 LearningRate 0.0670 Epoch: 3 Global Step: 60640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:52,662-Speed 9626.75 samples/sec Loss 8.0753 LearningRate 0.0670 Epoch: 3 Global Step: 60650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:53,731-Speed 9582.85 samples/sec Loss 8.1831 LearningRate 0.0670 Epoch: 3 Global Step: 60660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:54,833-Speed 9298.57 samples/sec Loss 8.1414 LearningRate 0.0670 Epoch: 3 Global Step: 60670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:55,899-Speed 9606.15 samples/sec Loss 8.1479 LearningRate 0.0669 Epoch: 3 Global Step: 60680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:56,949-Speed 9766.46 samples/sec Loss 8.0906 LearningRate 0.0669 Epoch: 3 Global Step: 60690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:58:58,040-Speed 9394.26 samples/sec Loss 8.1200 LearningRate 0.0669 Epoch: 3 Global Step: 60700 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:58:59,123-Speed 9456.40 samples/sec Loss 8.3083 LearningRate 0.0669 Epoch: 3 Global Step: 60710 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:00,183-Speed 9667.48 samples/sec Loss 8.1575 LearningRate 0.0669 Epoch: 3 Global Step: 60720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:01,274-Speed 9398.43 samples/sec Loss 8.0523 LearningRate 0.0669 Epoch: 3 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:02,388-Speed 9199.27 samples/sec Loss 8.1453 LearningRate 0.0669 Epoch: 3 Global Step: 60740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:03,468-Speed 9480.79 samples/sec Loss 8.0855 LearningRate 0.0669 Epoch: 3 Global Step: 60750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:04,575-Speed 9263.21 samples/sec Loss 8.1884 LearningRate 0.0669 Epoch: 3 Global Step: 60760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:05,660-Speed 9443.64 samples/sec Loss 8.1720 LearningRate 0.0669 Epoch: 3 Global Step: 60770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:06,729-Speed 9582.97 samples/sec Loss 7.9977 LearningRate 0.0669 Epoch: 3 Global Step: 60780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:07,789-Speed 9664.13 samples/sec Loss 8.1233 LearningRate 0.0669 Epoch: 3 Global Step: 60790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:08,839-Speed 9760.02 samples/sec Loss 8.2105 LearningRate 0.0669 Epoch: 3 Global Step: 60800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:09,911-Speed 9561.32 samples/sec Loss 8.1970 LearningRate 0.0669 Epoch: 3 Global Step: 60810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:10,957-Speed 9793.94 samples/sec Loss 8.0403 LearningRate 0.0669 Epoch: 3 Global Step: 60820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:12,031-Speed 9538.28 samples/sec Loss 8.0926 LearningRate 0.0669 Epoch: 3 Global Step: 60830 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:13,110-Speed 9500.43 samples/sec Loss 8.2031 LearningRate 0.0669 Epoch: 3 Global Step: 60840 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:14,160-Speed 9755.32 samples/sec Loss 8.2000 LearningRate 0.0669 Epoch: 3 Global Step: 60850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:15,197-Speed 9878.63 samples/sec Loss 8.1339 LearningRate 0.0669 Epoch: 3 Global Step: 60860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:16,266-Speed 9583.47 samples/sec Loss 8.1877 LearningRate 0.0669 Epoch: 3 Global Step: 60870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:17,327-Speed 9660.71 samples/sec Loss 8.2553 LearningRate 0.0669 Epoch: 3 Global Step: 60880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:18,420-Speed 9368.35 samples/sec Loss 8.1248 LearningRate 0.0668 Epoch: 3 Global Step: 60890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:19,525-Speed 9272.86 samples/sec Loss 8.0579 LearningRate 0.0668 Epoch: 3 Global Step: 60900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:20,611-Speed 9438.74 samples/sec Loss 8.1268 LearningRate 0.0668 Epoch: 3 Global Step: 60910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:21,679-Speed 9595.31 samples/sec Loss 8.1240 LearningRate 0.0668 Epoch: 3 Global Step: 60920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:22,724-Speed 9805.27 samples/sec Loss 8.1440 LearningRate 0.0668 Epoch: 3 Global Step: 60930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:23,801-Speed 9507.94 samples/sec Loss 8.0551 LearningRate 0.0668 Epoch: 3 Global Step: 60940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:24,876-Speed 9537.79 samples/sec Loss 8.2239 LearningRate 0.0668 Epoch: 3 Global Step: 60950 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:25,955-Speed 9490.41 samples/sec Loss 8.2043 LearningRate 0.0668 Epoch: 3 Global Step: 60960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:27,044-Speed 9410.70 samples/sec Loss 8.1374 LearningRate 0.0668 Epoch: 3 Global Step: 60970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:28,108-Speed 9631.59 samples/sec Loss 8.3644 LearningRate 0.0668 Epoch: 3 Global Step: 60980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:29,179-Speed 9569.85 samples/sec Loss 8.0790 LearningRate 0.0668 Epoch: 3 Global Step: 60990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:30,259-Speed 9481.24 samples/sec Loss 8.0779 LearningRate 0.0668 Epoch: 3 Global Step: 61000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:31,329-Speed 9580.55 samples/sec Loss 8.1224 LearningRate 0.0668 Epoch: 3 Global Step: 61010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:32,412-Speed 9458.14 samples/sec Loss 8.1707 LearningRate 0.0668 Epoch: 3 Global Step: 61020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:33,483-Speed 9570.14 samples/sec Loss 8.1162 LearningRate 0.0668 Epoch: 3 Global Step: 61030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:34,577-Speed 9360.04 samples/sec Loss 8.0674 LearningRate 0.0668 Epoch: 3 Global Step: 61040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:35,634-Speed 9692.94 samples/sec Loss 8.2991 LearningRate 0.0668 Epoch: 3 Global Step: 61050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:36,684-Speed 9764.43 samples/sec Loss 8.1536 LearningRate 0.0668 Epoch: 3 Global Step: 61060 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:37,757-Speed 9550.28 samples/sec Loss 8.1055 LearningRate 0.0668 Epoch: 3 Global Step: 61070 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:38,824-Speed 9605.32 samples/sec Loss 8.2215 LearningRate 0.0668 Epoch: 3 Global Step: 61080 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:39,918-Speed 9358.54 samples/sec Loss 8.1713 LearningRate 0.0667 Epoch: 3 Global Step: 61090 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:40,983-Speed 9620.02 samples/sec Loss 8.2883 LearningRate 0.0667 Epoch: 3 Global Step: 61100 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:42,030-Speed 9794.31 samples/sec Loss 8.1534 LearningRate 0.0667 Epoch: 3 Global Step: 61110 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:43,110-Speed 9482.02 samples/sec Loss 8.0619 LearningRate 0.0667 Epoch: 3 Global Step: 61120 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:44,180-Speed 9580.10 samples/sec Loss 8.1360 LearningRate 0.0667 Epoch: 3 Global Step: 61130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:45,257-Speed 9513.09 samples/sec Loss 8.1360 LearningRate 0.0667 Epoch: 3 Global Step: 61140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:46,356-Speed 9322.65 samples/sec Loss 8.1594 LearningRate 0.0667 Epoch: 3 Global Step: 61150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:47,427-Speed 9571.88 samples/sec Loss 8.1403 LearningRate 0.0667 Epoch: 3 Global Step: 61160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:48,490-Speed 9640.42 samples/sec Loss 8.1740 LearningRate 0.0667 Epoch: 3 Global Step: 61170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:49,602-Speed 9207.34 samples/sec Loss 8.0768 LearningRate 0.0667 Epoch: 3 Global Step: 61180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:50,654-Speed 9746.80 samples/sec Loss 8.2546 LearningRate 0.0667 Epoch: 3 Global Step: 61190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:51,753-Speed 9318.58 samples/sec Loss 8.0764 LearningRate 0.0667 Epoch: 3 Global Step: 61200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:52,830-Speed 9521.69 samples/sec Loss 8.0808 LearningRate 0.0667 Epoch: 3 Global Step: 61210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:53,929-Speed 9322.99 samples/sec Loss 8.0491 LearningRate 0.0667 Epoch: 3 Global Step: 61220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:55,038-Speed 9239.28 samples/sec Loss 8.1429 LearningRate 0.0667 Epoch: 3 Global Step: 61230 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 13:59:56,093-Speed 9707.64 samples/sec Loss 8.2563 LearningRate 0.0667 Epoch: 3 Global Step: 61240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:57,245-Speed 8896.86 samples/sec Loss 8.0928 LearningRate 0.0667 Epoch: 3 Global Step: 61250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:58,321-Speed 9517.79 samples/sec Loss 8.0921 LearningRate 0.0667 Epoch: 3 Global Step: 61260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 13:59:59,375-Speed 9723.54 samples/sec Loss 8.1132 LearningRate 0.0667 Epoch: 3 Global Step: 61270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:00,477-Speed 9299.64 samples/sec Loss 8.0698 LearningRate 0.0667 Epoch: 3 Global Step: 61280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:01,554-Speed 9512.22 samples/sec Loss 8.0154 LearningRate 0.0667 Epoch: 3 Global Step: 61290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:02,623-Speed 9585.01 samples/sec Loss 8.0801 LearningRate 0.0666 Epoch: 3 Global Step: 61300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:03,739-Speed 9185.54 samples/sec Loss 8.1681 LearningRate 0.0666 Epoch: 3 Global Step: 61310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:04,822-Speed 9461.23 samples/sec Loss 8.3041 LearningRate 0.0666 Epoch: 3 Global Step: 61320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:05,891-Speed 9585.83 samples/sec Loss 8.2590 LearningRate 0.0666 Epoch: 3 Global Step: 61330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:07,005-Speed 9196.84 samples/sec Loss 8.1364 LearningRate 0.0666 Epoch: 3 Global Step: 61340 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:00:08,093-Speed 9423.35 samples/sec Loss 8.0723 LearningRate 0.0666 Epoch: 3 Global Step: 61350 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:00:09,155-Speed 9648.11 samples/sec Loss 8.1451 LearningRate 0.0666 Epoch: 3 Global Step: 61360 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:00:10,221-Speed 9614.85 samples/sec Loss 8.2209 LearningRate 0.0666 Epoch: 3 Global Step: 61370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:11,309-Speed 9417.49 samples/sec Loss 8.0918 LearningRate 0.0666 Epoch: 3 Global Step: 61380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:12,411-Speed 9296.79 samples/sec Loss 8.0769 LearningRate 0.0666 Epoch: 3 Global Step: 61390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:13,496-Speed 9441.99 samples/sec Loss 8.1623 LearningRate 0.0666 Epoch: 3 Global Step: 61400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:14,563-Speed 9601.14 samples/sec Loss 8.0910 LearningRate 0.0666 Epoch: 3 Global Step: 61410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:15,632-Speed 9583.66 samples/sec Loss 8.2304 LearningRate 0.0666 Epoch: 3 Global Step: 61420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:16,705-Speed 9546.28 samples/sec Loss 8.1340 LearningRate 0.0666 Epoch: 3 Global Step: 61430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:17,759-Speed 9722.96 samples/sec Loss 8.2323 LearningRate 0.0666 Epoch: 3 Global Step: 61440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:18,845-Speed 9432.40 samples/sec Loss 8.0576 LearningRate 0.0666 Epoch: 3 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:19,895-Speed 9761.07 samples/sec Loss 8.0011 LearningRate 0.0666 Epoch: 3 Global Step: 61460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:20,971-Speed 9524.62 samples/sec Loss 8.1380 LearningRate 0.0666 Epoch: 3 Global Step: 61470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:00:22,053-Speed 9476.26 samples/sec Loss 8.0904 LearningRate 0.0666 Epoch: 3 Global Step: 61480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:23,104-Speed 9741.85 samples/sec Loss 8.1162 LearningRate 0.0666 Epoch: 3 Global Step: 61490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:00:24,158-Speed 9726.57 samples/sec Loss 8.0278 LearningRate 0.0665 Epoch: 3 Global Step: 61500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:00:25,235-Speed 9511.04 samples/sec Loss 8.1546 LearningRate 0.0665 Epoch: 3 Global Step: 61510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:00:26,330-Speed 9358.54 samples/sec Loss 8.1378 LearningRate 0.0665 Epoch: 3 Global Step: 61520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:00:27,369-Speed 9862.01 samples/sec Loss 8.0883 LearningRate 0.0665 Epoch: 3 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:00:28,426-Speed 9692.19 samples/sec Loss 8.1439 LearningRate 0.0665 Epoch: 3 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:00:29,514-Speed 9417.95 samples/sec Loss 8.1972 LearningRate 0.0665 Epoch: 3 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:00:30,591-Speed 9506.41 samples/sec Loss 8.1121 LearningRate 0.0665 Epoch: 3 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:00:31,666-Speed 9533.74 samples/sec Loss 8.0406 LearningRate 0.0665 Epoch: 3 Global Step: 61570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:00:32,755-Speed 9415.63 samples/sec Loss 8.2305 LearningRate 0.0665 Epoch: 3 Global Step: 61580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:00:33,817-Speed 9643.73 samples/sec Loss 8.0419 LearningRate 0.0665 Epoch: 3 Global Step: 61590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:34,915-Speed 9331.11 samples/sec Loss 8.1358 LearningRate 0.0665 Epoch: 3 Global Step: 61600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:36,001-Speed 9436.17 samples/sec Loss 8.0533 LearningRate 0.0665 Epoch: 3 Global Step: 61610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:37,087-Speed 9432.80 samples/sec Loss 8.1674 LearningRate 0.0665 Epoch: 3 Global Step: 61620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:38,166-Speed 9505.81 samples/sec Loss 8.1983 LearningRate 0.0665 Epoch: 3 Global Step: 61630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:39,228-Speed 9647.58 samples/sec Loss 8.1667 LearningRate 0.0665 Epoch: 3 Global Step: 61640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:40,308-Speed 9485.01 samples/sec Loss 8.2306 LearningRate 0.0665 Epoch: 3 Global Step: 61650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:41,394-Speed 9434.96 samples/sec Loss 8.1200 LearningRate 0.0665 Epoch: 3 Global Step: 61660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:42,526-Speed 9053.08 samples/sec Loss 8.2916 LearningRate 0.0665 Epoch: 3 Global Step: 61670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:43,621-Speed 9356.05 samples/sec Loss 8.2578 LearningRate 0.0665 Epoch: 3 Global Step: 61680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:44,686-Speed 9619.30 samples/sec Loss 8.1695 LearningRate 0.0665 Epoch: 3 Global Step: 61690 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:00:45,791-Speed 9272.15 samples/sec Loss 8.0865 LearningRate 0.0665 Epoch: 3 Global Step: 61700 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:00:46,866-Speed 9535.51 samples/sec Loss 8.1397 LearningRate 0.0664 Epoch: 3 Global Step: 61710 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:00:47,927-Speed 9649.64 samples/sec Loss 8.1878 LearningRate 0.0664 Epoch: 3 Global Step: 61720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:48,987-Speed 9673.98 samples/sec Loss 8.1552 LearningRate 0.0664 Epoch: 3 Global Step: 61730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:50,108-Speed 9138.15 samples/sec Loss 8.0100 LearningRate 0.0664 Epoch: 3 Global Step: 61740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:51,185-Speed 9515.12 samples/sec Loss 8.2545 LearningRate 0.0664 Epoch: 3 Global Step: 61750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:52,252-Speed 9604.06 samples/sec Loss 8.1881 LearningRate 0.0664 Epoch: 3 Global Step: 61760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:53,370-Speed 9157.60 samples/sec Loss 8.2490 LearningRate 0.0664 Epoch: 3 Global Step: 61770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:54,437-Speed 9610.18 samples/sec Loss 8.2192 LearningRate 0.0664 Epoch: 3 Global Step: 61780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:55,484-Speed 9785.32 samples/sec Loss 8.1950 LearningRate 0.0664 Epoch: 3 Global Step: 61790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:56,523-Speed 9862.82 samples/sec Loss 8.0447 LearningRate 0.0664 Epoch: 3 Global Step: 61800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:57,608-Speed 9442.45 samples/sec Loss 8.1328 LearningRate 0.0664 Epoch: 3 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:00:58,728-Speed 9143.67 samples/sec Loss 8.1037 LearningRate 0.0664 Epoch: 3 Global Step: 61820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:00:59,830-Speed 9296.68 samples/sec Loss 8.2718 LearningRate 0.0664 Epoch: 3 Global Step: 61830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:01:00,892-Speed 9645.85 samples/sec Loss 8.0425 LearningRate 0.0664 Epoch: 3 Global Step: 61840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:01,967-Speed 9538.11 samples/sec Loss 8.0752 LearningRate 0.0664 Epoch: 3 Global Step: 61850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:03,075-Speed 9252.51 samples/sec Loss 8.1950 LearningRate 0.0664 Epoch: 3 Global Step: 61860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:04,151-Speed 9518.81 samples/sec Loss 8.1432 LearningRate 0.0664 Epoch: 3 Global Step: 61870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:05,226-Speed 9526.73 samples/sec Loss 8.1614 LearningRate 0.0664 Epoch: 3 Global Step: 61880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:06,280-Speed 9727.05 samples/sec Loss 8.1873 LearningRate 0.0664 Epoch: 3 Global Step: 61890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:07,316-Speed 9882.97 samples/sec Loss 8.1182 LearningRate 0.0664 Epoch: 3 Global Step: 61900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:08,412-Speed 9353.68 samples/sec Loss 8.1712 LearningRate 0.0663 Epoch: 3 Global Step: 61910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:09,506-Speed 9363.62 samples/sec Loss 8.0398 LearningRate 0.0663 Epoch: 3 Global Step: 61920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:10,591-Speed 9444.12 samples/sec Loss 8.1154 LearningRate 0.0663 Epoch: 3 Global Step: 61930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:11,688-Speed 9337.76 samples/sec Loss 8.1352 LearningRate 0.0663 Epoch: 3 Global Step: 61940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:01:12,751-Speed 9638.45 samples/sec Loss 8.0922 LearningRate 0.0663 Epoch: 3 Global Step: 61950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:13,840-Speed 9415.64 samples/sec Loss 7.9529 LearningRate 0.0663 Epoch: 3 Global Step: 61960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:14,914-Speed 9542.32 samples/sec Loss 8.0005 LearningRate 0.0663 Epoch: 3 Global Step: 61970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:16,033-Speed 9148.58 samples/sec Loss 8.1050 LearningRate 0.0663 Epoch: 3 Global Step: 61980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:17,101-Speed 9602.08 samples/sec Loss 8.0619 LearningRate 0.0663 Epoch: 3 Global Step: 61990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:18,180-Speed 9486.86 samples/sec Loss 8.2262 LearningRate 0.0663 Epoch: 3 Global Step: 62000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:01:39,848-[lfw][62000]XNorm: 12.535790 Training: 2022-04-11 14:01:39,849-[lfw][62000]Accuracy-Flip: 0.99467+-0.00267 Training: 2022-04-11 14:01:39,849-[lfw][62000]Accuracy-Highest: 0.99583 Training: 2022-04-11 14:02:04,907-[cfp_fp][62000]XNorm: 10.490901 Training: 2022-04-11 14:02:04,908-[cfp_fp][62000]Accuracy-Flip: 0.94614+-0.01275 Training: 2022-04-11 14:02:04,908-[cfp_fp][62000]Accuracy-Highest: 0.95157 Training: 2022-04-11 14:02:26,534-[agedb_30][62000]XNorm: 12.107767 Training: 2022-04-11 14:02:26,535-[agedb_30][62000]Accuracy-Flip: 0.95617+-0.01600 Training: 2022-04-11 14:02:26,535-[agedb_30][62000]Accuracy-Highest: 0.95767 Training: 2022-04-11 14:02:27,621-Speed 147.47 samples/sec Loss 8.0962 LearningRate 0.0663 Epoch: 3 Global Step: 62010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:28,663-Speed 9829.99 samples/sec Loss 8.0633 LearningRate 0.0663 Epoch: 3 Global Step: 62020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:29,739-Speed 9521.79 samples/sec Loss 7.9930 LearningRate 0.0663 Epoch: 3 Global Step: 62030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:30,803-Speed 9633.26 samples/sec Loss 8.1779 LearningRate 0.0663 Epoch: 3 Global Step: 62040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:31,940-Speed 9017.89 samples/sec Loss 8.0082 LearningRate 0.0663 Epoch: 3 Global Step: 62050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:33,032-Speed 9376.42 samples/sec Loss 8.1522 LearningRate 0.0663 Epoch: 3 Global Step: 62060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:34,139-Speed 9253.94 samples/sec Loss 8.1681 LearningRate 0.0663 Epoch: 3 Global Step: 62070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:35,270-Speed 9061.51 samples/sec Loss 8.0441 LearningRate 0.0663 Epoch: 3 Global Step: 62080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:36,334-Speed 9632.14 samples/sec Loss 8.0274 LearningRate 0.0663 Epoch: 3 Global Step: 62090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:37,410-Speed 9522.15 samples/sec Loss 8.2168 LearningRate 0.0663 Epoch: 3 Global Step: 62100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:38,482-Speed 9558.33 samples/sec Loss 8.0907 LearningRate 0.0663 Epoch: 3 Global Step: 62110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:39,585-Speed 9292.04 samples/sec Loss 8.1532 LearningRate 0.0662 Epoch: 3 Global Step: 62120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:40,655-Speed 9573.74 samples/sec Loss 8.1692 LearningRate 0.0662 Epoch: 3 Global Step: 62130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:41,773-Speed 9166.54 samples/sec Loss 8.0625 LearningRate 0.0662 Epoch: 3 Global Step: 62140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:42,848-Speed 9531.21 samples/sec Loss 8.2612 LearningRate 0.0662 Epoch: 3 Global Step: 62150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:02:43,931-Speed 9464.27 samples/sec Loss 8.1555 LearningRate 0.0662 Epoch: 3 Global Step: 62160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:44,976-Speed 9801.08 samples/sec Loss 8.0017 LearningRate 0.0662 Epoch: 3 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:46,052-Speed 9520.16 samples/sec Loss 8.1947 LearningRate 0.0662 Epoch: 3 Global Step: 62180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:47,134-Speed 9472.50 samples/sec Loss 8.0616 LearningRate 0.0662 Epoch: 3 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:48,226-Speed 9377.00 samples/sec Loss 8.1016 LearningRate 0.0662 Epoch: 3 Global Step: 62200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:49,298-Speed 9560.93 samples/sec Loss 8.1361 LearningRate 0.0662 Epoch: 3 Global Step: 62210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:50,348-Speed 9757.54 samples/sec Loss 8.1979 LearningRate 0.0662 Epoch: 3 Global Step: 62220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:51,435-Speed 9432.14 samples/sec Loss 8.0354 LearningRate 0.0662 Epoch: 3 Global Step: 62230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:52,544-Speed 9233.62 samples/sec Loss 8.1337 LearningRate 0.0662 Epoch: 3 Global Step: 62240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:53,623-Speed 9501.48 samples/sec Loss 8.2102 LearningRate 0.0662 Epoch: 3 Global Step: 62250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:54,725-Speed 9294.05 samples/sec Loss 8.2173 LearningRate 0.0662 Epoch: 3 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:02:55,811-Speed 9431.56 samples/sec Loss 8.2374 LearningRate 0.0662 Epoch: 3 Global Step: 62270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:56,915-Speed 9286.47 samples/sec Loss 8.1744 LearningRate 0.0662 Epoch: 3 Global Step: 62280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:57,958-Speed 9823.58 samples/sec Loss 8.0251 LearningRate 0.0662 Epoch: 3 Global Step: 62290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:02:59,033-Speed 9530.81 samples/sec Loss 8.1215 LearningRate 0.0662 Epoch: 3 Global Step: 62300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:00,102-Speed 9579.67 samples/sec Loss 8.2160 LearningRate 0.0662 Epoch: 3 Global Step: 62310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:01,156-Speed 9718.91 samples/sec Loss 8.0433 LearningRate 0.0661 Epoch: 3 Global Step: 62320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:02,265-Speed 9244.18 samples/sec Loss 8.0693 LearningRate 0.0661 Epoch: 3 Global Step: 62330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:03,366-Speed 9304.49 samples/sec Loss 8.2141 LearningRate 0.0661 Epoch: 3 Global Step: 62340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:04,460-Speed 9363.61 samples/sec Loss 8.1204 LearningRate 0.0661 Epoch: 3 Global Step: 62350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:05,581-Speed 9145.64 samples/sec Loss 8.1539 LearningRate 0.0661 Epoch: 3 Global Step: 62360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:06,668-Speed 9424.58 samples/sec Loss 8.1399 LearningRate 0.0661 Epoch: 3 Global Step: 62370 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:07,772-Speed 9283.06 samples/sec Loss 8.0674 LearningRate 0.0661 Epoch: 3 Global Step: 62380 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:08,832-Speed 9674.36 samples/sec Loss 8.1754 LearningRate 0.0661 Epoch: 3 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:03:09,877-Speed 9805.46 samples/sec Loss 8.1413 LearningRate 0.0661 Epoch: 3 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:03:10,934-Speed 9686.84 samples/sec Loss 8.1704 LearningRate 0.0661 Epoch: 3 Global Step: 62410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:03:12,021-Speed 9427.64 samples/sec Loss 8.2325 LearningRate 0.0661 Epoch: 3 Global Step: 62420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:03:13,119-Speed 9333.72 samples/sec Loss 8.0003 LearningRate 0.0661 Epoch: 3 Global Step: 62430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:03:14,201-Speed 9467.95 samples/sec Loss 8.0435 LearningRate 0.0661 Epoch: 3 Global Step: 62440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:03:15,282-Speed 9472.11 samples/sec Loss 8.0719 LearningRate 0.0661 Epoch: 3 Global Step: 62450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:03:16,374-Speed 9386.27 samples/sec Loss 8.2025 LearningRate 0.0661 Epoch: 3 Global Step: 62460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:03:17,476-Speed 9294.47 samples/sec Loss 8.0890 LearningRate 0.0661 Epoch: 3 Global Step: 62470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:03:18,560-Speed 9454.18 samples/sec Loss 8.1051 LearningRate 0.0661 Epoch: 3 Global Step: 62480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:03:19,644-Speed 9450.80 samples/sec Loss 8.2192 LearningRate 0.0661 Epoch: 3 Global Step: 62490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:20,756-Speed 9214.81 samples/sec Loss 8.1217 LearningRate 0.0661 Epoch: 3 Global Step: 62500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:21,852-Speed 9346.55 samples/sec Loss 8.1370 LearningRate 0.0661 Epoch: 3 Global Step: 62510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:22,908-Speed 9705.02 samples/sec Loss 8.1773 LearningRate 0.0661 Epoch: 3 Global Step: 62520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:24,000-Speed 9382.38 samples/sec Loss 8.0510 LearningRate 0.0660 Epoch: 3 Global Step: 62530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:25,066-Speed 9615.61 samples/sec Loss 8.1701 LearningRate 0.0660 Epoch: 3 Global Step: 62540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:26,136-Speed 9575.56 samples/sec Loss 8.1021 LearningRate 0.0660 Epoch: 3 Global Step: 62550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:27,202-Speed 9610.23 samples/sec Loss 8.1574 LearningRate 0.0660 Epoch: 3 Global Step: 62560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:28,275-Speed 9553.60 samples/sec Loss 8.0768 LearningRate 0.0660 Epoch: 3 Global Step: 62570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:29,356-Speed 9476.43 samples/sec Loss 7.9883 LearningRate 0.0660 Epoch: 3 Global Step: 62580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:30,418-Speed 9647.15 samples/sec Loss 8.0346 LearningRate 0.0660 Epoch: 3 Global Step: 62590 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:31,521-Speed 9290.11 samples/sec Loss 8.0134 LearningRate 0.0660 Epoch: 3 Global Step: 62600 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:32,586-Speed 9624.10 samples/sec Loss 8.1275 LearningRate 0.0660 Epoch: 3 Global Step: 62610 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:33,679-Speed 9373.93 samples/sec Loss 8.1455 LearningRate 0.0660 Epoch: 3 Global Step: 62620 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:34,723-Speed 9812.99 samples/sec Loss 8.0717 LearningRate 0.0660 Epoch: 3 Global Step: 62630 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:35,810-Speed 9421.21 samples/sec Loss 8.0536 LearningRate 0.0660 Epoch: 3 Global Step: 62640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:36,892-Speed 9472.48 samples/sec Loss 8.1023 LearningRate 0.0660 Epoch: 3 Global Step: 62650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:37,946-Speed 9719.43 samples/sec Loss 8.1461 LearningRate 0.0660 Epoch: 3 Global Step: 62660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:39,016-Speed 9582.64 samples/sec Loss 8.1025 LearningRate 0.0660 Epoch: 3 Global Step: 62670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:40,129-Speed 9200.53 samples/sec Loss 8.1334 LearningRate 0.0660 Epoch: 3 Global Step: 62680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:41,196-Speed 9605.31 samples/sec Loss 8.0829 LearningRate 0.0660 Epoch: 3 Global Step: 62690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:42,291-Speed 9359.41 samples/sec Loss 8.2012 LearningRate 0.0660 Epoch: 3 Global Step: 62700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:43,414-Speed 9119.06 samples/sec Loss 8.0792 LearningRate 0.0660 Epoch: 3 Global Step: 62710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:44,538-Speed 9113.78 samples/sec Loss 8.0695 LearningRate 0.0660 Epoch: 3 Global Step: 62720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:45,593-Speed 9712.87 samples/sec Loss 8.2932 LearningRate 0.0659 Epoch: 3 Global Step: 62730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:46,642-Speed 9773.52 samples/sec Loss 8.1341 LearningRate 0.0659 Epoch: 3 Global Step: 62740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:47,732-Speed 9404.37 samples/sec Loss 8.0218 LearningRate 0.0659 Epoch: 3 Global Step: 62750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:48,836-Speed 9275.45 samples/sec Loss 8.0970 LearningRate 0.0659 Epoch: 3 Global Step: 62760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:49,983-Speed 8933.83 samples/sec Loss 8.1289 LearningRate 0.0659 Epoch: 3 Global Step: 62770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:03:51,021-Speed 9870.42 samples/sec Loss 8.1156 LearningRate 0.0659 Epoch: 3 Global Step: 62780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:52,134-Speed 9206.63 samples/sec Loss 8.1907 LearningRate 0.0659 Epoch: 3 Global Step: 62790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:53,215-Speed 9476.17 samples/sec Loss 8.1228 LearningRate 0.0659 Epoch: 3 Global Step: 62800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:54,280-Speed 9627.28 samples/sec Loss 8.1857 LearningRate 0.0659 Epoch: 3 Global Step: 62810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:55,323-Speed 9818.71 samples/sec Loss 8.1692 LearningRate 0.0659 Epoch: 3 Global Step: 62820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:56,406-Speed 9461.05 samples/sec Loss 8.1778 LearningRate 0.0659 Epoch: 3 Global Step: 62830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:57,521-Speed 9185.38 samples/sec Loss 8.0686 LearningRate 0.0659 Epoch: 3 Global Step: 62840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:58,617-Speed 9354.51 samples/sec Loss 8.1596 LearningRate 0.0659 Epoch: 3 Global Step: 62850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:03:59,680-Speed 9640.87 samples/sec Loss 8.0860 LearningRate 0.0659 Epoch: 3 Global Step: 62860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:00,768-Speed 9412.49 samples/sec Loss 8.1127 LearningRate 0.0659 Epoch: 3 Global Step: 62870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:01,851-Speed 9462.53 samples/sec Loss 8.1398 LearningRate 0.0659 Epoch: 3 Global Step: 62880 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:04:02,935-Speed 9455.10 samples/sec Loss 8.1203 LearningRate 0.0659 Epoch: 3 Global Step: 62890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:03,985-Speed 9760.67 samples/sec Loss 8.1116 LearningRate 0.0659 Epoch: 3 Global Step: 62900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:05,081-Speed 9345.75 samples/sec Loss 8.1645 LearningRate 0.0659 Epoch: 3 Global Step: 62910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:06,178-Speed 9343.38 samples/sec Loss 8.0598 LearningRate 0.0659 Epoch: 3 Global Step: 62920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:07,265-Speed 9429.66 samples/sec Loss 7.8965 LearningRate 0.0659 Epoch: 3 Global Step: 62930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:08,334-Speed 9589.84 samples/sec Loss 8.0228 LearningRate 0.0658 Epoch: 3 Global Step: 62940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:09,431-Speed 9336.19 samples/sec Loss 8.0011 LearningRate 0.0658 Epoch: 3 Global Step: 62950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:10,496-Speed 9618.97 samples/sec Loss 8.0410 LearningRate 0.0658 Epoch: 3 Global Step: 62960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:11,569-Speed 9551.09 samples/sec Loss 8.1970 LearningRate 0.0658 Epoch: 3 Global Step: 62970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:12,640-Speed 9562.72 samples/sec Loss 8.1439 LearningRate 0.0658 Epoch: 3 Global Step: 62980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:13,714-Speed 9544.02 samples/sec Loss 8.0004 LearningRate 0.0658 Epoch: 3 Global Step: 62990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:14,778-Speed 9630.78 samples/sec Loss 8.0898 LearningRate 0.0658 Epoch: 3 Global Step: 63000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:15,856-Speed 9505.09 samples/sec Loss 8.0806 LearningRate 0.0658 Epoch: 3 Global Step: 63010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:16,925-Speed 9581.00 samples/sec Loss 7.9764 LearningRate 0.0658 Epoch: 3 Global Step: 63020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:18,039-Speed 9196.11 samples/sec Loss 8.0529 LearningRate 0.0658 Epoch: 3 Global Step: 63030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:19,139-Speed 9315.14 samples/sec Loss 8.0894 LearningRate 0.0658 Epoch: 3 Global Step: 63040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:20,270-Speed 9061.56 samples/sec Loss 8.0737 LearningRate 0.0658 Epoch: 3 Global Step: 63050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:21,366-Speed 9345.24 samples/sec Loss 8.0601 LearningRate 0.0658 Epoch: 3 Global Step: 63060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:22,473-Speed 9256.54 samples/sec Loss 8.0484 LearningRate 0.0658 Epoch: 3 Global Step: 63070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:23,602-Speed 9079.18 samples/sec Loss 8.0660 LearningRate 0.0658 Epoch: 3 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:24,684-Speed 9466.43 samples/sec Loss 8.1680 LearningRate 0.0658 Epoch: 3 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:25,763-Speed 9498.87 samples/sec Loss 8.1682 LearningRate 0.0658 Epoch: 3 Global Step: 63100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:26,865-Speed 9305.37 samples/sec Loss 8.0447 LearningRate 0.0658 Epoch: 3 Global Step: 63110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:27,979-Speed 9197.01 samples/sec Loss 8.1414 LearningRate 0.0658 Epoch: 3 Global Step: 63120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:29,088-Speed 9235.29 samples/sec Loss 8.1192 LearningRate 0.0658 Epoch: 3 Global Step: 63130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:30,217-Speed 9071.56 samples/sec Loss 8.0559 LearningRate 0.0657 Epoch: 3 Global Step: 63140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:31,259-Speed 9834.21 samples/sec Loss 8.1445 LearningRate 0.0657 Epoch: 3 Global Step: 63150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:32,316-Speed 9693.84 samples/sec Loss 8.1010 LearningRate 0.0657 Epoch: 3 Global Step: 63160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:33,410-Speed 9373.77 samples/sec Loss 8.1407 LearningRate 0.0657 Epoch: 3 Global Step: 63170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:34,471-Speed 9648.31 samples/sec Loss 8.0964 LearningRate 0.0657 Epoch: 3 Global Step: 63180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:35,527-Speed 9709.39 samples/sec Loss 8.0271 LearningRate 0.0657 Epoch: 3 Global Step: 63190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:36,621-Speed 9359.41 samples/sec Loss 8.0819 LearningRate 0.0657 Epoch: 3 Global Step: 63200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:37,712-Speed 9392.80 samples/sec Loss 7.9839 LearningRate 0.0657 Epoch: 3 Global Step: 63210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:38,808-Speed 9354.81 samples/sec Loss 8.0921 LearningRate 0.0657 Epoch: 3 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:39,844-Speed 9897.38 samples/sec Loss 8.1054 LearningRate 0.0657 Epoch: 3 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:40,884-Speed 9851.90 samples/sec Loss 7.9855 LearningRate 0.0657 Epoch: 3 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:41,966-Speed 9468.39 samples/sec Loss 8.0502 LearningRate 0.0657 Epoch: 3 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:43,071-Speed 9267.48 samples/sec Loss 8.1213 LearningRate 0.0657 Epoch: 3 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:44,109-Speed 9873.57 samples/sec Loss 7.9884 LearningRate 0.0657 Epoch: 3 Global Step: 63270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:45,197-Speed 9412.76 samples/sec Loss 8.0888 LearningRate 0.0657 Epoch: 3 Global Step: 63280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:46,236-Speed 9861.88 samples/sec Loss 8.0803 LearningRate 0.0657 Epoch: 3 Global Step: 63290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:47,280-Speed 9817.52 samples/sec Loss 8.0684 LearningRate 0.0657 Epoch: 3 Global Step: 63300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:48,328-Speed 9782.27 samples/sec Loss 8.1316 LearningRate 0.0657 Epoch: 3 Global Step: 63310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:49,385-Speed 9686.96 samples/sec Loss 8.1295 LearningRate 0.0657 Epoch: 3 Global Step: 63320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:50,509-Speed 9118.40 samples/sec Loss 8.1158 LearningRate 0.0657 Epoch: 3 Global Step: 63330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:51,570-Speed 9662.86 samples/sec Loss 8.0966 LearningRate 0.0657 Epoch: 3 Global Step: 63340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:52,636-Speed 9606.98 samples/sec Loss 8.1720 LearningRate 0.0656 Epoch: 3 Global Step: 63350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:53,676-Speed 9856.98 samples/sec Loss 8.0711 LearningRate 0.0656 Epoch: 3 Global Step: 63360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:54,736-Speed 9658.57 samples/sec Loss 7.9997 LearningRate 0.0656 Epoch: 3 Global Step: 63370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:55,767-Speed 9940.03 samples/sec Loss 8.1925 LearningRate 0.0656 Epoch: 3 Global Step: 63380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:56,789-Speed 10025.60 samples/sec Loss 8.0205 LearningRate 0.0656 Epoch: 3 Global Step: 63390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:04:57,926-Speed 9008.32 samples/sec Loss 8.1010 LearningRate 0.0656 Epoch: 3 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:04:58,997-Speed 9572.97 samples/sec Loss 8.0714 LearningRate 0.0656 Epoch: 3 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:00,086-Speed 9408.25 samples/sec Loss 8.1929 LearningRate 0.0656 Epoch: 3 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:01,141-Speed 9708.23 samples/sec Loss 8.1075 LearningRate 0.0656 Epoch: 3 Global Step: 63430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:02,205-Speed 9636.95 samples/sec Loss 8.1317 LearningRate 0.0656 Epoch: 3 Global Step: 63440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:03,294-Speed 9406.05 samples/sec Loss 8.0468 LearningRate 0.0656 Epoch: 3 Global Step: 63450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:04,405-Speed 9221.96 samples/sec Loss 7.9587 LearningRate 0.0656 Epoch: 3 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:05,497-Speed 9385.28 samples/sec Loss 8.0678 LearningRate 0.0656 Epoch: 3 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:06,544-Speed 9788.51 samples/sec Loss 8.0582 LearningRate 0.0656 Epoch: 3 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:07,574-Speed 9942.34 samples/sec Loss 8.0661 LearningRate 0.0656 Epoch: 3 Global Step: 63490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:08,629-Speed 9718.95 samples/sec Loss 8.0227 LearningRate 0.0656 Epoch: 3 Global Step: 63500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:09,692-Speed 9640.64 samples/sec Loss 8.1288 LearningRate 0.0656 Epoch: 3 Global Step: 63510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:10,785-Speed 9375.02 samples/sec Loss 8.0826 LearningRate 0.0656 Epoch: 3 Global Step: 63520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:11,851-Speed 9610.76 samples/sec Loss 8.1239 LearningRate 0.0656 Epoch: 3 Global Step: 63530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:12,937-Speed 9432.69 samples/sec Loss 8.1068 LearningRate 0.0656 Epoch: 3 Global Step: 63540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:13,992-Speed 9710.74 samples/sec Loss 8.0180 LearningRate 0.0655 Epoch: 3 Global Step: 63550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:15,043-Speed 9746.10 samples/sec Loss 8.1522 LearningRate 0.0655 Epoch: 3 Global Step: 63560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:16,103-Speed 9666.31 samples/sec Loss 8.2022 LearningRate 0.0655 Epoch: 3 Global Step: 63570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:17,201-Speed 9329.66 samples/sec Loss 8.1329 LearningRate 0.0655 Epoch: 3 Global Step: 63580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:18,316-Speed 9190.49 samples/sec Loss 8.1668 LearningRate 0.0655 Epoch: 3 Global Step: 63590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:19,399-Speed 9463.66 samples/sec Loss 8.0617 LearningRate 0.0655 Epoch: 3 Global Step: 63600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:20,539-Speed 8981.47 samples/sec Loss 8.2073 LearningRate 0.0655 Epoch: 3 Global Step: 63610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:21,627-Speed 9427.48 samples/sec Loss 8.1505 LearningRate 0.0655 Epoch: 3 Global Step: 63620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:22,663-Speed 9884.47 samples/sec Loss 8.1505 LearningRate 0.0655 Epoch: 3 Global Step: 63630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:23,768-Speed 9276.60 samples/sec Loss 8.1001 LearningRate 0.0655 Epoch: 3 Global Step: 63640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:24,874-Speed 9258.82 samples/sec Loss 8.0466 LearningRate 0.0655 Epoch: 3 Global Step: 63650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:25,998-Speed 9116.99 samples/sec Loss 8.0762 LearningRate 0.0655 Epoch: 3 Global Step: 63660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:27,040-Speed 9831.16 samples/sec Loss 8.0304 LearningRate 0.0655 Epoch: 3 Global Step: 63670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:28,096-Speed 9707.02 samples/sec Loss 8.1927 LearningRate 0.0655 Epoch: 3 Global Step: 63680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:29,174-Speed 9506.65 samples/sec Loss 7.9902 LearningRate 0.0655 Epoch: 3 Global Step: 63690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:30,242-Speed 9593.51 samples/sec Loss 8.0577 LearningRate 0.0655 Epoch: 3 Global Step: 63700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:31,322-Speed 9483.55 samples/sec Loss 8.0394 LearningRate 0.0655 Epoch: 3 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:32,395-Speed 9558.34 samples/sec Loss 8.1531 LearningRate 0.0655 Epoch: 3 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:33,462-Speed 9599.46 samples/sec Loss 8.0079 LearningRate 0.0655 Epoch: 3 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:34,542-Speed 9487.50 samples/sec Loss 8.0117 LearningRate 0.0655 Epoch: 3 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:35,643-Speed 9304.58 samples/sec Loss 8.0727 LearningRate 0.0655 Epoch: 3 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:36,743-Speed 9309.60 samples/sec Loss 8.1463 LearningRate 0.0654 Epoch: 3 Global Step: 63760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:37,821-Speed 9503.14 samples/sec Loss 8.0227 LearningRate 0.0654 Epoch: 3 Global Step: 63770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:38,930-Speed 9245.11 samples/sec Loss 8.1757 LearningRate 0.0654 Epoch: 3 Global Step: 63780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:40,009-Speed 9496.19 samples/sec Loss 8.1271 LearningRate 0.0654 Epoch: 3 Global Step: 63790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:41,082-Speed 9547.47 samples/sec Loss 8.0107 LearningRate 0.0654 Epoch: 3 Global Step: 63800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:05:42,207-Speed 9106.20 samples/sec Loss 8.1254 LearningRate 0.0654 Epoch: 3 Global Step: 63810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:43,314-Speed 9260.74 samples/sec Loss 8.0651 LearningRate 0.0654 Epoch: 3 Global Step: 63820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:44,425-Speed 9215.25 samples/sec Loss 8.0203 LearningRate 0.0654 Epoch: 3 Global Step: 63830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:45,499-Speed 9548.07 samples/sec Loss 8.0333 LearningRate 0.0654 Epoch: 3 Global Step: 63840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:46,564-Speed 9618.18 samples/sec Loss 8.1330 LearningRate 0.0654 Epoch: 3 Global Step: 63850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:47,648-Speed 9449.94 samples/sec Loss 8.0104 LearningRate 0.0654 Epoch: 3 Global Step: 63860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:48,742-Speed 9374.55 samples/sec Loss 7.9858 LearningRate 0.0654 Epoch: 3 Global Step: 63870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:49,811-Speed 9585.29 samples/sec Loss 8.1236 LearningRate 0.0654 Epoch: 3 Global Step: 63880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:50,917-Speed 9261.56 samples/sec Loss 8.0638 LearningRate 0.0654 Epoch: 3 Global Step: 63890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:51,997-Speed 9491.10 samples/sec Loss 8.0112 LearningRate 0.0654 Epoch: 3 Global Step: 63900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:53,063-Speed 9604.31 samples/sec Loss 8.0760 LearningRate 0.0654 Epoch: 3 Global Step: 63910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:54,178-Speed 9192.74 samples/sec Loss 7.9444 LearningRate 0.0654 Epoch: 3 Global Step: 63920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:55,256-Speed 9500.14 samples/sec Loss 8.1093 LearningRate 0.0654 Epoch: 3 Global Step: 63930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:56,319-Speed 9640.94 samples/sec Loss 7.9435 LearningRate 0.0654 Epoch: 3 Global Step: 63940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:57,400-Speed 9474.21 samples/sec Loss 8.0957 LearningRate 0.0654 Epoch: 3 Global Step: 63950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:58,490-Speed 9399.40 samples/sec Loss 8.0984 LearningRate 0.0654 Epoch: 3 Global Step: 63960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:05:59,557-Speed 9609.32 samples/sec Loss 8.0063 LearningRate 0.0653 Epoch: 3 Global Step: 63970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:06:00,612-Speed 9712.89 samples/sec Loss 8.0726 LearningRate 0.0653 Epoch: 3 Global Step: 63980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:06:01,670-Speed 9684.04 samples/sec Loss 8.0298 LearningRate 0.0653 Epoch: 3 Global Step: 63990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:06:02,735-Speed 9617.46 samples/sec Loss 7.9546 LearningRate 0.0653 Epoch: 3 Global Step: 64000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:06:24,651-[lfw][64000]XNorm: 12.652423 Training: 2022-04-11 14:06:24,652-[lfw][64000]Accuracy-Flip: 0.99567+-0.00200 Training: 2022-04-11 14:06:24,652-[lfw][64000]Accuracy-Highest: 0.99583 Training: 2022-04-11 14:06:50,007-[cfp_fp][64000]XNorm: 10.509308 Training: 2022-04-11 14:06:50,008-[cfp_fp][64000]Accuracy-Flip: 0.94657+-0.01031 Training: 2022-04-11 14:06:50,008-[cfp_fp][64000]Accuracy-Highest: 0.95157 Training: 2022-04-11 14:07:11,890-[agedb_30][64000]XNorm: 12.163206 Training: 2022-04-11 14:07:11,891-[agedb_30][64000]Accuracy-Flip: 0.95833+-0.01121 Training: 2022-04-11 14:07:11,891-[agedb_30][64000]Accuracy-Highest: 0.95833 Training: 2022-04-11 14:07:12,945-Speed 145.85 samples/sec Loss 8.1283 LearningRate 0.0653 Epoch: 3 Global Step: 64010 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:07:14,001-Speed 9701.68 samples/sec Loss 8.0709 LearningRate 0.0653 Epoch: 3 Global Step: 64020 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:07:15,087-Speed 9435.80 samples/sec Loss 8.0700 LearningRate 0.0653 Epoch: 3 Global Step: 64030 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:07:16,151-Speed 9634.82 samples/sec Loss 8.0137 LearningRate 0.0653 Epoch: 3 Global Step: 64040 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:07:17,229-Speed 9503.94 samples/sec Loss 8.0842 LearningRate 0.0653 Epoch: 3 Global Step: 64050 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:07:18,309-Speed 9488.26 samples/sec Loss 7.9888 LearningRate 0.0653 Epoch: 3 Global Step: 64060 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:07:19,362-Speed 9733.24 samples/sec Loss 8.0175 LearningRate 0.0653 Epoch: 3 Global Step: 64070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:20,413-Speed 9746.16 samples/sec Loss 8.0968 LearningRate 0.0653 Epoch: 3 Global Step: 64080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:21,472-Speed 9678.33 samples/sec Loss 7.9254 LearningRate 0.0653 Epoch: 3 Global Step: 64090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:22,558-Speed 9435.85 samples/sec Loss 8.0795 LearningRate 0.0653 Epoch: 3 Global Step: 64100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:23,658-Speed 9315.37 samples/sec Loss 8.0666 LearningRate 0.0653 Epoch: 3 Global Step: 64110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:24,758-Speed 9311.48 samples/sec Loss 7.9737 LearningRate 0.0653 Epoch: 3 Global Step: 64120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:25,819-Speed 9652.08 samples/sec Loss 8.0760 LearningRate 0.0653 Epoch: 3 Global Step: 64130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:26,927-Speed 9244.47 samples/sec Loss 8.0761 LearningRate 0.0653 Epoch: 3 Global Step: 64140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:28,027-Speed 9320.52 samples/sec Loss 8.1723 LearningRate 0.0653 Epoch: 3 Global Step: 64150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:29,150-Speed 9123.80 samples/sec Loss 8.0231 LearningRate 0.0653 Epoch: 3 Global Step: 64160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:30,246-Speed 9341.62 samples/sec Loss 8.0031 LearningRate 0.0652 Epoch: 3 Global Step: 64170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:31,355-Speed 9244.40 samples/sec Loss 7.9814 LearningRate 0.0652 Epoch: 3 Global Step: 64180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:32,437-Speed 9465.14 samples/sec Loss 8.0609 LearningRate 0.0652 Epoch: 3 Global Step: 64190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:33,552-Speed 9194.62 samples/sec Loss 7.9305 LearningRate 0.0652 Epoch: 3 Global Step: 64200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:34,694-Speed 8970.68 samples/sec Loss 7.9421 LearningRate 0.0652 Epoch: 3 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:07:35,791-Speed 9334.39 samples/sec Loss 8.1242 LearningRate 0.0652 Epoch: 3 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:07:36,853-Speed 9650.34 samples/sec Loss 7.8784 LearningRate 0.0652 Epoch: 3 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:07:37,896-Speed 9831.23 samples/sec Loss 8.0598 LearningRate 0.0652 Epoch: 3 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:07:38,995-Speed 9327.77 samples/sec Loss 7.9615 LearningRate 0.0652 Epoch: 3 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:07:40,069-Speed 9538.23 samples/sec Loss 8.1566 LearningRate 0.0652 Epoch: 3 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:07:41,157-Speed 9419.78 samples/sec Loss 8.0671 LearningRate 0.0652 Epoch: 3 Global Step: 64270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:07:42,224-Speed 9603.29 samples/sec Loss 8.0551 LearningRate 0.0652 Epoch: 3 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:07:43,286-Speed 9642.16 samples/sec Loss 8.1008 LearningRate 0.0652 Epoch: 3 Global Step: 64290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:07:44,358-Speed 9562.02 samples/sec Loss 8.0470 LearningRate 0.0652 Epoch: 3 Global Step: 64300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:07:45,428-Speed 9574.53 samples/sec Loss 8.0667 LearningRate 0.0652 Epoch: 3 Global Step: 64310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:46,483-Speed 9711.07 samples/sec Loss 7.8928 LearningRate 0.0652 Epoch: 3 Global Step: 64320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:47,574-Speed 9388.13 samples/sec Loss 8.0775 LearningRate 0.0652 Epoch: 3 Global Step: 64330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:48,638-Speed 9633.24 samples/sec Loss 8.0966 LearningRate 0.0652 Epoch: 3 Global Step: 64340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:49,709-Speed 9564.79 samples/sec Loss 7.9076 LearningRate 0.0652 Epoch: 3 Global Step: 64350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:50,776-Speed 9599.96 samples/sec Loss 7.9930 LearningRate 0.0652 Epoch: 3 Global Step: 64360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:51,863-Speed 9433.07 samples/sec Loss 7.9832 LearningRate 0.0652 Epoch: 3 Global Step: 64370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:52,920-Speed 9693.72 samples/sec Loss 8.0771 LearningRate 0.0651 Epoch: 3 Global Step: 64380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:54,012-Speed 9376.04 samples/sec Loss 8.0375 LearningRate 0.0651 Epoch: 3 Global Step: 64390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:55,114-Speed 9296.34 samples/sec Loss 8.2211 LearningRate 0.0651 Epoch: 3 Global Step: 64400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:56,164-Speed 9764.24 samples/sec Loss 8.0199 LearningRate 0.0651 Epoch: 3 Global Step: 64410 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:07:57,265-Speed 9308.06 samples/sec Loss 8.1384 LearningRate 0.0651 Epoch: 3 Global Step: 64420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:58,349-Speed 9450.72 samples/sec Loss 8.0184 LearningRate 0.0651 Epoch: 3 Global Step: 64430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:07:59,444-Speed 9358.11 samples/sec Loss 8.0406 LearningRate 0.0651 Epoch: 3 Global Step: 64440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:00,507-Speed 9636.07 samples/sec Loss 8.0393 LearningRate 0.0651 Epoch: 3 Global Step: 64450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:01,574-Speed 9600.38 samples/sec Loss 8.0763 LearningRate 0.0651 Epoch: 3 Global Step: 64460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:02,668-Speed 9373.60 samples/sec Loss 8.1346 LearningRate 0.0651 Epoch: 3 Global Step: 64470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:03,761-Speed 9373.90 samples/sec Loss 8.1400 LearningRate 0.0651 Epoch: 3 Global Step: 64480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:04,847-Speed 9433.28 samples/sec Loss 8.0697 LearningRate 0.0651 Epoch: 3 Global Step: 64490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:05,914-Speed 9605.19 samples/sec Loss 8.0080 LearningRate 0.0651 Epoch: 3 Global Step: 64500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:06,967-Speed 9730.65 samples/sec Loss 8.0913 LearningRate 0.0651 Epoch: 3 Global Step: 64510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:08,031-Speed 9633.26 samples/sec Loss 8.0722 LearningRate 0.0651 Epoch: 3 Global Step: 64520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:09,075-Speed 9811.46 samples/sec Loss 8.0900 LearningRate 0.0651 Epoch: 3 Global Step: 64530 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:10,131-Speed 9696.20 samples/sec Loss 8.0887 LearningRate 0.0651 Epoch: 3 Global Step: 64540 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:11,185-Speed 9728.43 samples/sec Loss 7.9207 LearningRate 0.0651 Epoch: 3 Global Step: 64550 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:12,283-Speed 9325.90 samples/sec Loss 8.0005 LearningRate 0.0651 Epoch: 3 Global Step: 64560 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:13,392-Speed 9241.53 samples/sec Loss 8.0316 LearningRate 0.0651 Epoch: 3 Global Step: 64570 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:14,482-Speed 9400.37 samples/sec Loss 8.1117 LearningRate 0.0651 Epoch: 3 Global Step: 64580 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:15,572-Speed 9402.74 samples/sec Loss 8.0027 LearningRate 0.0650 Epoch: 3 Global Step: 64590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:16,683-Speed 9222.06 samples/sec Loss 8.1698 LearningRate 0.0650 Epoch: 3 Global Step: 64600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:17,745-Speed 9651.64 samples/sec Loss 8.0964 LearningRate 0.0650 Epoch: 3 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:18,830-Speed 9439.01 samples/sec Loss 8.0425 LearningRate 0.0650 Epoch: 3 Global Step: 64620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:19,883-Speed 9728.81 samples/sec Loss 8.0114 LearningRate 0.0650 Epoch: 3 Global Step: 64630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:20,935-Speed 9737.58 samples/sec Loss 8.1107 LearningRate 0.0650 Epoch: 3 Global Step: 64640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:22,037-Speed 9302.57 samples/sec Loss 7.9504 LearningRate 0.0650 Epoch: 3 Global Step: 64650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:23,118-Speed 9472.39 samples/sec Loss 8.0223 LearningRate 0.0650 Epoch: 3 Global Step: 64660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:24,215-Speed 9341.76 samples/sec Loss 8.0320 LearningRate 0.0650 Epoch: 3 Global Step: 64670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:25,309-Speed 9366.30 samples/sec Loss 8.0781 LearningRate 0.0650 Epoch: 3 Global Step: 64680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:26,390-Speed 9484.10 samples/sec Loss 8.1532 LearningRate 0.0650 Epoch: 3 Global Step: 64690 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:27,444-Speed 9718.98 samples/sec Loss 8.0808 LearningRate 0.0650 Epoch: 3 Global Step: 64700 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:28,480-Speed 9900.03 samples/sec Loss 8.0937 LearningRate 0.0650 Epoch: 3 Global Step: 64710 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:29,563-Speed 9459.24 samples/sec Loss 7.9867 LearningRate 0.0650 Epoch: 3 Global Step: 64720 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:30,633-Speed 9581.74 samples/sec Loss 7.9476 LearningRate 0.0650 Epoch: 3 Global Step: 64730 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:31,689-Speed 9701.51 samples/sec Loss 8.1415 LearningRate 0.0650 Epoch: 3 Global Step: 64740 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:32,792-Speed 9282.96 samples/sec Loss 8.1479 LearningRate 0.0650 Epoch: 3 Global Step: 64750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:33,863-Speed 9570.70 samples/sec Loss 8.0216 LearningRate 0.0650 Epoch: 3 Global Step: 64760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:34,939-Speed 9528.11 samples/sec Loss 8.0579 LearningRate 0.0650 Epoch: 3 Global Step: 64770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:36,000-Speed 9650.69 samples/sec Loss 8.0303 LearningRate 0.0650 Epoch: 3 Global Step: 64780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:37,074-Speed 9546.65 samples/sec Loss 8.1199 LearningRate 0.0649 Epoch: 3 Global Step: 64790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:38,150-Speed 9526.43 samples/sec Loss 8.0903 LearningRate 0.0649 Epoch: 3 Global Step: 64800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:39,238-Speed 9408.19 samples/sec Loss 8.0006 LearningRate 0.0649 Epoch: 3 Global Step: 64810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:40,324-Speed 9435.37 samples/sec Loss 8.0429 LearningRate 0.0649 Epoch: 3 Global Step: 64820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:41,439-Speed 9194.17 samples/sec Loss 8.0814 LearningRate 0.0649 Epoch: 3 Global Step: 64830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:42,540-Speed 9302.86 samples/sec Loss 8.0427 LearningRate 0.0649 Epoch: 3 Global Step: 64840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:43,594-Speed 9720.83 samples/sec Loss 8.1470 LearningRate 0.0649 Epoch: 3 Global Step: 64850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:44,651-Speed 9693.05 samples/sec Loss 8.1010 LearningRate 0.0649 Epoch: 3 Global Step: 64860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:45,693-Speed 9832.75 samples/sec Loss 8.0117 LearningRate 0.0649 Epoch: 3 Global Step: 64870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:46,750-Speed 9696.88 samples/sec Loss 8.0178 LearningRate 0.0649 Epoch: 3 Global Step: 64880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:47,824-Speed 9542.43 samples/sec Loss 8.0564 LearningRate 0.0649 Epoch: 3 Global Step: 64890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:48,891-Speed 9596.57 samples/sec Loss 7.9556 LearningRate 0.0649 Epoch: 3 Global Step: 64900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:49,943-Speed 9740.15 samples/sec Loss 8.0510 LearningRate 0.0649 Epoch: 3 Global Step: 64910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:51,028-Speed 9446.59 samples/sec Loss 8.0956 LearningRate 0.0649 Epoch: 3 Global Step: 64920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:52,137-Speed 9234.02 samples/sec Loss 7.9155 LearningRate 0.0649 Epoch: 3 Global Step: 64930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:53,247-Speed 9240.72 samples/sec Loss 8.0429 LearningRate 0.0649 Epoch: 3 Global Step: 64940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:08:54,306-Speed 9670.38 samples/sec Loss 7.9893 LearningRate 0.0649 Epoch: 3 Global Step: 64950 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:55,384-Speed 9506.99 samples/sec Loss 8.1080 LearningRate 0.0649 Epoch: 3 Global Step: 64960 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:56,449-Speed 9621.87 samples/sec Loss 8.0911 LearningRate 0.0649 Epoch: 3 Global Step: 64970 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:08:57,527-Speed 9502.42 samples/sec Loss 8.0179 LearningRate 0.0649 Epoch: 3 Global Step: 64980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:08:58,610-Speed 9465.49 samples/sec Loss 7.9756 LearningRate 0.0649 Epoch: 3 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:08:59,652-Speed 9836.82 samples/sec Loss 8.0422 LearningRate 0.0648 Epoch: 3 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:00,725-Speed 9553.77 samples/sec Loss 7.9897 LearningRate 0.0648 Epoch: 3 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:01,797-Speed 9561.71 samples/sec Loss 8.0830 LearningRate 0.0648 Epoch: 3 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:02,898-Speed 9303.56 samples/sec Loss 8.0507 LearningRate 0.0648 Epoch: 3 Global Step: 65030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:03,981-Speed 9458.66 samples/sec Loss 8.0073 LearningRate 0.0648 Epoch: 3 Global Step: 65040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:05,071-Speed 9399.74 samples/sec Loss 8.1053 LearningRate 0.0648 Epoch: 3 Global Step: 65050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:06,205-Speed 9035.36 samples/sec Loss 7.9310 LearningRate 0.0648 Epoch: 3 Global Step: 65060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:07,319-Speed 9197.31 samples/sec Loss 7.9385 LearningRate 0.0648 Epoch: 3 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:08,381-Speed 9653.11 samples/sec Loss 7.8984 LearningRate 0.0648 Epoch: 3 Global Step: 65080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:09,460-Speed 9492.20 samples/sec Loss 8.0434 LearningRate 0.0648 Epoch: 3 Global Step: 65090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:10,556-Speed 9346.63 samples/sec Loss 7.9382 LearningRate 0.0648 Epoch: 3 Global Step: 65100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:11,677-Speed 9140.91 samples/sec Loss 7.9864 LearningRate 0.0648 Epoch: 3 Global Step: 65110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:12,784-Speed 9254.87 samples/sec Loss 7.9311 LearningRate 0.0648 Epoch: 3 Global Step: 65120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:13,852-Speed 9602.00 samples/sec Loss 8.0205 LearningRate 0.0648 Epoch: 3 Global Step: 65130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:14,892-Speed 9851.29 samples/sec Loss 7.9877 LearningRate 0.0648 Epoch: 3 Global Step: 65140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:16,015-Speed 9118.64 samples/sec Loss 8.1050 LearningRate 0.0648 Epoch: 3 Global Step: 65150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:17,105-Speed 9402.86 samples/sec Loss 7.9897 LearningRate 0.0648 Epoch: 3 Global Step: 65160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:18,155-Speed 9759.28 samples/sec Loss 8.0740 LearningRate 0.0648 Epoch: 3 Global Step: 65170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:19,258-Speed 9289.49 samples/sec Loss 7.9487 LearningRate 0.0648 Epoch: 3 Global Step: 65180 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:09:20,327-Speed 9581.58 samples/sec Loss 7.9779 LearningRate 0.0648 Epoch: 3 Global Step: 65190 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:09:21,412-Speed 9447.90 samples/sec Loss 8.0666 LearningRate 0.0648 Epoch: 3 Global Step: 65200 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:09:22,527-Speed 9187.18 samples/sec Loss 8.1139 LearningRate 0.0647 Epoch: 3 Global Step: 65210 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:09:23,601-Speed 9533.05 samples/sec Loss 7.9022 LearningRate 0.0647 Epoch: 3 Global Step: 65220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:24,644-Speed 9829.48 samples/sec Loss 7.9829 LearningRate 0.0647 Epoch: 3 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:25,758-Speed 9192.27 samples/sec Loss 8.1018 LearningRate 0.0647 Epoch: 3 Global Step: 65240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:26,840-Speed 9478.16 samples/sec Loss 8.0775 LearningRate 0.0647 Epoch: 3 Global Step: 65250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:27,914-Speed 9534.98 samples/sec Loss 8.0466 LearningRate 0.0647 Epoch: 3 Global Step: 65260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:28,954-Speed 9851.22 samples/sec Loss 8.0221 LearningRate 0.0647 Epoch: 3 Global Step: 65270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:30,012-Speed 9687.09 samples/sec Loss 7.9893 LearningRate 0.0647 Epoch: 3 Global Step: 65280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:31,049-Speed 9884.51 samples/sec Loss 8.0942 LearningRate 0.0647 Epoch: 3 Global Step: 65290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:32,114-Speed 9614.82 samples/sec Loss 8.0797 LearningRate 0.0647 Epoch: 3 Global Step: 65300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:33,231-Speed 9176.64 samples/sec Loss 8.0160 LearningRate 0.0647 Epoch: 3 Global Step: 65310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:34,297-Speed 9617.60 samples/sec Loss 8.0231 LearningRate 0.0647 Epoch: 3 Global Step: 65320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:35,371-Speed 9537.04 samples/sec Loss 7.9912 LearningRate 0.0647 Epoch: 3 Global Step: 65330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:36,414-Speed 9820.12 samples/sec Loss 8.1156 LearningRate 0.0647 Epoch: 3 Global Step: 65340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:37,453-Speed 9867.31 samples/sec Loss 8.1842 LearningRate 0.0647 Epoch: 3 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:38,543-Speed 9405.22 samples/sec Loss 7.8719 LearningRate 0.0647 Epoch: 3 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:39,638-Speed 9349.82 samples/sec Loss 7.9718 LearningRate 0.0647 Epoch: 3 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:40,736-Speed 9331.38 samples/sec Loss 7.9527 LearningRate 0.0647 Epoch: 3 Global Step: 65380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:41,827-Speed 9396.74 samples/sec Loss 8.1064 LearningRate 0.0647 Epoch: 3 Global Step: 65390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:42,916-Speed 9405.48 samples/sec Loss 7.9215 LearningRate 0.0647 Epoch: 3 Global Step: 65400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:44,003-Speed 9431.29 samples/sec Loss 7.9619 LearningRate 0.0647 Epoch: 3 Global Step: 65410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:45,055-Speed 9739.87 samples/sec Loss 7.9937 LearningRate 0.0646 Epoch: 3 Global Step: 65420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:46,149-Speed 9362.02 samples/sec Loss 8.0421 LearningRate 0.0646 Epoch: 3 Global Step: 65430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:47,243-Speed 9366.90 samples/sec Loss 8.0044 LearningRate 0.0646 Epoch: 3 Global Step: 65440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:09:48,322-Speed 9490.83 samples/sec Loss 7.9631 LearningRate 0.0646 Epoch: 3 Global Step: 65450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:49,443-Speed 9141.65 samples/sec Loss 8.0425 LearningRate 0.0646 Epoch: 3 Global Step: 65460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:50,565-Speed 9131.83 samples/sec Loss 8.1015 LearningRate 0.0646 Epoch: 3 Global Step: 65470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:51,615-Speed 9759.67 samples/sec Loss 8.1062 LearningRate 0.0646 Epoch: 3 Global Step: 65480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:52,683-Speed 9592.92 samples/sec Loss 8.0420 LearningRate 0.0646 Epoch: 3 Global Step: 65490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:53,772-Speed 9409.26 samples/sec Loss 7.8611 LearningRate 0.0646 Epoch: 3 Global Step: 65500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:54,867-Speed 9364.57 samples/sec Loss 8.0058 LearningRate 0.0646 Epoch: 3 Global Step: 65510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:55,912-Speed 9796.43 samples/sec Loss 7.9199 LearningRate 0.0646 Epoch: 3 Global Step: 65520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:56,982-Speed 9582.82 samples/sec Loss 8.0225 LearningRate 0.0646 Epoch: 3 Global Step: 65530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:58,049-Speed 9601.51 samples/sec Loss 7.9812 LearningRate 0.0646 Epoch: 3 Global Step: 65540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:09:59,098-Speed 9770.65 samples/sec Loss 8.0134 LearningRate 0.0646 Epoch: 3 Global Step: 65550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:00,190-Speed 9377.83 samples/sec Loss 7.9252 LearningRate 0.0646 Epoch: 3 Global Step: 65560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:01,259-Speed 9588.04 samples/sec Loss 7.9848 LearningRate 0.0646 Epoch: 3 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:02,364-Speed 9275.06 samples/sec Loss 8.0447 LearningRate 0.0646 Epoch: 3 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:03,428-Speed 9622.75 samples/sec Loss 7.9897 LearningRate 0.0646 Epoch: 3 Global Step: 65590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:04,472-Speed 9812.79 samples/sec Loss 8.0237 LearningRate 0.0646 Epoch: 3 Global Step: 65600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:05,553-Speed 9482.08 samples/sec Loss 8.0374 LearningRate 0.0646 Epoch: 3 Global Step: 65610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:06,655-Speed 9296.64 samples/sec Loss 8.0417 LearningRate 0.0645 Epoch: 3 Global Step: 65620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:07,703-Speed 9779.86 samples/sec Loss 8.0344 LearningRate 0.0645 Epoch: 3 Global Step: 65630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:08,751-Speed 9778.16 samples/sec Loss 7.9577 LearningRate 0.0645 Epoch: 3 Global Step: 65640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:09,856-Speed 9273.79 samples/sec Loss 8.0463 LearningRate 0.0645 Epoch: 3 Global Step: 65650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:10,928-Speed 9567.11 samples/sec Loss 8.0432 LearningRate 0.0645 Epoch: 3 Global Step: 65660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:11,998-Speed 9573.60 samples/sec Loss 8.0637 LearningRate 0.0645 Epoch: 3 Global Step: 65670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:13,066-Speed 9589.17 samples/sec Loss 7.9667 LearningRate 0.0645 Epoch: 3 Global Step: 65680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:14,129-Speed 9648.00 samples/sec Loss 8.1192 LearningRate 0.0645 Epoch: 3 Global Step: 65690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:15,241-Speed 9217.76 samples/sec Loss 8.0885 LearningRate 0.0645 Epoch: 3 Global Step: 65700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:16,297-Speed 9701.51 samples/sec Loss 8.0942 LearningRate 0.0645 Epoch: 3 Global Step: 65710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:17,343-Speed 9791.93 samples/sec Loss 8.0800 LearningRate 0.0645 Epoch: 3 Global Step: 65720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:18,417-Speed 9539.86 samples/sec Loss 8.0560 LearningRate 0.0645 Epoch: 3 Global Step: 65730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:19,528-Speed 9218.76 samples/sec Loss 8.1357 LearningRate 0.0645 Epoch: 3 Global Step: 65740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:20,573-Speed 9808.78 samples/sec Loss 7.9669 LearningRate 0.0645 Epoch: 3 Global Step: 65750 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:10:21,660-Speed 9422.97 samples/sec Loss 7.9796 LearningRate 0.0645 Epoch: 3 Global Step: 65760 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:10:22,744-Speed 9449.96 samples/sec Loss 7.9333 LearningRate 0.0645 Epoch: 3 Global Step: 65770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:23,870-Speed 9104.26 samples/sec Loss 8.0033 LearningRate 0.0645 Epoch: 3 Global Step: 65780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:24,950-Speed 9480.14 samples/sec Loss 7.9725 LearningRate 0.0645 Epoch: 3 Global Step: 65790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:25,979-Speed 9956.96 samples/sec Loss 8.0022 LearningRate 0.0645 Epoch: 3 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:27,038-Speed 9682.47 samples/sec Loss 7.9188 LearningRate 0.0645 Epoch: 3 Global Step: 65810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:28,136-Speed 9326.94 samples/sec Loss 8.0085 LearningRate 0.0645 Epoch: 3 Global Step: 65820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:29,226-Speed 9404.14 samples/sec Loss 7.9952 LearningRate 0.0644 Epoch: 3 Global Step: 65830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:30,319-Speed 9375.85 samples/sec Loss 8.1596 LearningRate 0.0644 Epoch: 3 Global Step: 65840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:31,394-Speed 9532.70 samples/sec Loss 8.1137 LearningRate 0.0644 Epoch: 3 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:32,479-Speed 9444.87 samples/sec Loss 8.0761 LearningRate 0.0644 Epoch: 3 Global Step: 65860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:33,550-Speed 9568.61 samples/sec Loss 7.9996 LearningRate 0.0644 Epoch: 3 Global Step: 65870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:34,658-Speed 9245.20 samples/sec Loss 8.0205 LearningRate 0.0644 Epoch: 3 Global Step: 65880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:35,763-Speed 9270.01 samples/sec Loss 8.0969 LearningRate 0.0644 Epoch: 3 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:10:36,840-Speed 9512.48 samples/sec Loss 7.9807 LearningRate 0.0644 Epoch: 3 Global Step: 65900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:37,914-Speed 9545.11 samples/sec Loss 7.9965 LearningRate 0.0644 Epoch: 3 Global Step: 65910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:38,978-Speed 9622.91 samples/sec Loss 7.9219 LearningRate 0.0644 Epoch: 3 Global Step: 65920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:40,040-Speed 9654.84 samples/sec Loss 7.9535 LearningRate 0.0644 Epoch: 3 Global Step: 65930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:41,101-Speed 9652.48 samples/sec Loss 7.9754 LearningRate 0.0644 Epoch: 3 Global Step: 65940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:42,175-Speed 9540.18 samples/sec Loss 8.0615 LearningRate 0.0644 Epoch: 3 Global Step: 65950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:43,266-Speed 9394.28 samples/sec Loss 7.8430 LearningRate 0.0644 Epoch: 3 Global Step: 65960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:44,354-Speed 9419.56 samples/sec Loss 8.0604 LearningRate 0.0644 Epoch: 3 Global Step: 65970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:45,421-Speed 9603.78 samples/sec Loss 8.0138 LearningRate 0.0644 Epoch: 3 Global Step: 65980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:46,546-Speed 9101.94 samples/sec Loss 8.0194 LearningRate 0.0644 Epoch: 3 Global Step: 65990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:10:47,642-Speed 9357.12 samples/sec Loss 7.9265 LearningRate 0.0644 Epoch: 3 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:11:09,384-[lfw][66000]XNorm: 12.267007 Training: 2022-04-11 14:11:09,385-[lfw][66000]Accuracy-Flip: 0.99517+-0.00302 Training: 2022-04-11 14:11:09,385-[lfw][66000]Accuracy-Highest: 0.99583 Training: 2022-04-11 14:11:34,523-[cfp_fp][66000]XNorm: 10.425723 Training: 2022-04-11 14:11:34,523-[cfp_fp][66000]Accuracy-Flip: 0.95171+-0.01142 Training: 2022-04-11 14:11:34,524-[cfp_fp][66000]Accuracy-Highest: 0.95171 Training: 2022-04-11 14:11:56,238-[agedb_30][66000]XNorm: 11.943447 Training: 2022-04-11 14:11:56,239-[agedb_30][66000]Accuracy-Flip: 0.96033+-0.01130 Training: 2022-04-11 14:11:56,239-[agedb_30][66000]Accuracy-Highest: 0.96033 Training: 2022-04-11 14:11:57,329-Speed 146.94 samples/sec Loss 8.0687 LearningRate 0.0644 Epoch: 3 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:11:58,406-Speed 9510.36 samples/sec Loss 8.0460 LearningRate 0.0644 Epoch: 3 Global Step: 66020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:11:59,522-Speed 9184.75 samples/sec Loss 8.0151 LearningRate 0.0644 Epoch: 3 Global Step: 66030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:00,624-Speed 9292.57 samples/sec Loss 7.9699 LearningRate 0.0643 Epoch: 3 Global Step: 66040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:01,717-Speed 9377.69 samples/sec Loss 7.9313 LearningRate 0.0643 Epoch: 3 Global Step: 66050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:02,779-Speed 9647.95 samples/sec Loss 8.0798 LearningRate 0.0643 Epoch: 3 Global Step: 66060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:03,818-Speed 9865.92 samples/sec Loss 8.0049 LearningRate 0.0643 Epoch: 3 Global Step: 66070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:04,901-Speed 9461.22 samples/sec Loss 8.1133 LearningRate 0.0643 Epoch: 3 Global Step: 66080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:05,970-Speed 9581.56 samples/sec Loss 7.9738 LearningRate 0.0643 Epoch: 3 Global Step: 66090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:07,070-Speed 9314.56 samples/sec Loss 8.1078 LearningRate 0.0643 Epoch: 3 Global Step: 66100 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:12:08,120-Speed 9766.04 samples/sec Loss 7.9608 LearningRate 0.0643 Epoch: 3 Global Step: 66110 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:12:09,221-Speed 9299.68 samples/sec Loss 7.9501 LearningRate 0.0643 Epoch: 3 Global Step: 66120 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:12:10,286-Speed 9626.65 samples/sec Loss 7.9491 LearningRate 0.0643 Epoch: 3 Global Step: 66130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:11,335-Speed 9767.52 samples/sec Loss 8.0248 LearningRate 0.0643 Epoch: 3 Global Step: 66140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:12,424-Speed 9406.48 samples/sec Loss 8.0469 LearningRate 0.0643 Epoch: 3 Global Step: 66150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:13,524-Speed 9312.57 samples/sec Loss 7.8567 LearningRate 0.0643 Epoch: 3 Global Step: 66160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:14,571-Speed 9787.90 samples/sec Loss 7.9600 LearningRate 0.0643 Epoch: 3 Global Step: 66170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:15,656-Speed 9448.47 samples/sec Loss 8.0852 LearningRate 0.0643 Epoch: 3 Global Step: 66180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:16,739-Speed 9457.41 samples/sec Loss 7.9802 LearningRate 0.0643 Epoch: 3 Global Step: 66190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:17,796-Speed 9691.78 samples/sec Loss 7.8552 LearningRate 0.0643 Epoch: 3 Global Step: 66200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:18,857-Speed 9664.42 samples/sec Loss 7.9142 LearningRate 0.0643 Epoch: 3 Global Step: 66210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:19,942-Speed 9437.62 samples/sec Loss 8.0451 LearningRate 0.0643 Epoch: 3 Global Step: 66220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:20,995-Speed 9735.25 samples/sec Loss 7.9973 LearningRate 0.0643 Epoch: 3 Global Step: 66230 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:12:22,103-Speed 9246.55 samples/sec Loss 8.0846 LearningRate 0.0643 Epoch: 3 Global Step: 66240 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:12:23,191-Speed 9412.03 samples/sec Loss 7.9105 LearningRate 0.0642 Epoch: 3 Global Step: 66250 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:12:24,258-Speed 9610.54 samples/sec Loss 7.8883 LearningRate 0.0642 Epoch: 3 Global Step: 66260 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:12:25,349-Speed 9384.62 samples/sec Loss 7.8899 LearningRate 0.0642 Epoch: 3 Global Step: 66270 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:12:26,387-Speed 9875.32 samples/sec Loss 8.0198 LearningRate 0.0642 Epoch: 3 Global Step: 66280 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:12:27,464-Speed 9515.41 samples/sec Loss 8.0110 LearningRate 0.0642 Epoch: 3 Global Step: 66290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:28,554-Speed 9398.50 samples/sec Loss 7.9022 LearningRate 0.0642 Epoch: 3 Global Step: 66300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:29,701-Speed 8935.45 samples/sec Loss 8.0031 LearningRate 0.0642 Epoch: 3 Global Step: 66310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:30,802-Speed 9303.36 samples/sec Loss 7.9696 LearningRate 0.0642 Epoch: 3 Global Step: 66320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:31,951-Speed 8914.56 samples/sec Loss 8.0097 LearningRate 0.0642 Epoch: 3 Global Step: 66330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:33,029-Speed 9505.55 samples/sec Loss 7.9180 LearningRate 0.0642 Epoch: 3 Global Step: 66340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:34,123-Speed 9369.78 samples/sec Loss 7.9581 LearningRate 0.0642 Epoch: 3 Global Step: 66350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:35,175-Speed 9740.50 samples/sec Loss 8.0000 LearningRate 0.0642 Epoch: 3 Global Step: 66360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:36,270-Speed 9355.14 samples/sec Loss 7.9176 LearningRate 0.0642 Epoch: 3 Global Step: 66370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:37,372-Speed 9298.01 samples/sec Loss 8.0473 LearningRate 0.0642 Epoch: 3 Global Step: 66380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:38,437-Speed 9622.50 samples/sec Loss 8.0029 LearningRate 0.0642 Epoch: 3 Global Step: 66390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:39,488-Speed 9750.47 samples/sec Loss 7.8504 LearningRate 0.0642 Epoch: 3 Global Step: 66400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:40,584-Speed 9345.98 samples/sec Loss 8.0468 LearningRate 0.0642 Epoch: 3 Global Step: 66410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:41,694-Speed 9235.33 samples/sec Loss 7.9567 LearningRate 0.0642 Epoch: 3 Global Step: 66420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:42,771-Speed 9506.37 samples/sec Loss 8.0576 LearningRate 0.0642 Epoch: 3 Global Step: 66430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:43,884-Speed 9211.56 samples/sec Loss 7.9668 LearningRate 0.0642 Epoch: 3 Global Step: 66440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:44,989-Speed 9273.10 samples/sec Loss 7.9933 LearningRate 0.0642 Epoch: 3 Global Step: 66450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:46,089-Speed 9312.29 samples/sec Loss 8.0357 LearningRate 0.0641 Epoch: 3 Global Step: 66460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:47,170-Speed 9474.76 samples/sec Loss 7.9581 LearningRate 0.0641 Epoch: 3 Global Step: 66470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:48,203-Speed 9923.65 samples/sec Loss 8.0215 LearningRate 0.0641 Epoch: 3 Global Step: 66480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:49,262-Speed 9676.35 samples/sec Loss 7.9720 LearningRate 0.0641 Epoch: 3 Global Step: 66490 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:12:50,314-Speed 9733.24 samples/sec Loss 7.9653 LearningRate 0.0641 Epoch: 3 Global Step: 66500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:51,365-Speed 9750.17 samples/sec Loss 7.9464 LearningRate 0.0641 Epoch: 3 Global Step: 66510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:52,431-Speed 9615.49 samples/sec Loss 8.0044 LearningRate 0.0641 Epoch: 3 Global Step: 66520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:53,502-Speed 9569.03 samples/sec Loss 7.9836 LearningRate 0.0641 Epoch: 3 Global Step: 66530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:12:54,577-Speed 9532.06 samples/sec Loss 7.9938 LearningRate 0.0641 Epoch: 3 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:12:55,662-Speed 9438.34 samples/sec Loss 7.9005 LearningRate 0.0641 Epoch: 3 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:12:56,721-Speed 9676.88 samples/sec Loss 7.9611 LearningRate 0.0641 Epoch: 3 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:12:57,824-Speed 9296.24 samples/sec Loss 7.9384 LearningRate 0.0641 Epoch: 3 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:12:58,891-Speed 9602.14 samples/sec Loss 7.9913 LearningRate 0.0641 Epoch: 3 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:12:59,926-Speed 9894.57 samples/sec Loss 7.9702 LearningRate 0.0641 Epoch: 3 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:13:01,003-Speed 9513.01 samples/sec Loss 8.1107 LearningRate 0.0641 Epoch: 3 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:13:02,094-Speed 9393.28 samples/sec Loss 7.9806 LearningRate 0.0641 Epoch: 3 Global Step: 66610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:13:03,155-Speed 9657.22 samples/sec Loss 8.0496 LearningRate 0.0641 Epoch: 3 Global Step: 66620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:13:04,239-Speed 9451.47 samples/sec Loss 7.9840 LearningRate 0.0641 Epoch: 3 Global Step: 66630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:13:05,363-Speed 9109.99 samples/sec Loss 8.1048 LearningRate 0.0641 Epoch: 3 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:06,431-Speed 9599.31 samples/sec Loss 7.9175 LearningRate 0.0641 Epoch: 3 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:07,543-Speed 9215.41 samples/sec Loss 7.9790 LearningRate 0.0640 Epoch: 3 Global Step: 66660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:08,644-Speed 9305.37 samples/sec Loss 7.7761 LearningRate 0.0640 Epoch: 3 Global Step: 66670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:09,758-Speed 9196.82 samples/sec Loss 7.9774 LearningRate 0.0640 Epoch: 3 Global Step: 66680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:10,858-Speed 9314.47 samples/sec Loss 7.9859 LearningRate 0.0640 Epoch: 3 Global Step: 66690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:11,921-Speed 9639.94 samples/sec Loss 8.0377 LearningRate 0.0640 Epoch: 3 Global Step: 66700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:13,031-Speed 9235.51 samples/sec Loss 8.0559 LearningRate 0.0640 Epoch: 3 Global Step: 66710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:14,098-Speed 9598.49 samples/sec Loss 8.0074 LearningRate 0.0640 Epoch: 3 Global Step: 66720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:15,185-Speed 9428.92 samples/sec Loss 8.0217 LearningRate 0.0640 Epoch: 3 Global Step: 66730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:16,251-Speed 9609.80 samples/sec Loss 8.0229 LearningRate 0.0640 Epoch: 3 Global Step: 66740 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:13:17,295-Speed 9811.11 samples/sec Loss 7.8852 LearningRate 0.0640 Epoch: 3 Global Step: 66750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:18,562-Speed 8085.91 samples/sec Loss 8.1504 LearningRate 0.0640 Epoch: 3 Global Step: 66760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:53,232-Speed 295.37 samples/sec Loss 7.4094 LearningRate 0.0640 Epoch: 4 Global Step: 66770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:54,852-Speed 6328.04 samples/sec Loss 7.0900 LearningRate 0.0640 Epoch: 4 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:56,269-Speed 7231.13 samples/sec Loss 7.2590 LearningRate 0.0640 Epoch: 4 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:57,391-Speed 9135.87 samples/sec Loss 7.1894 LearningRate 0.0640 Epoch: 4 Global Step: 66800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:13:58,829-Speed 7121.11 samples/sec Loss 7.2561 LearningRate 0.0640 Epoch: 4 Global Step: 66810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:00,278-Speed 7073.57 samples/sec Loss 7.1453 LearningRate 0.0640 Epoch: 4 Global Step: 66820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:01,599-Speed 7758.31 samples/sec Loss 7.1766 LearningRate 0.0640 Epoch: 4 Global Step: 66830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:02,713-Speed 9194.97 samples/sec Loss 7.2115 LearningRate 0.0640 Epoch: 4 Global Step: 66840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:03,799-Speed 9438.58 samples/sec Loss 7.1589 LearningRate 0.0640 Epoch: 4 Global Step: 66850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:05,362-Speed 6553.61 samples/sec Loss 7.2286 LearningRate 0.0640 Epoch: 4 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:06,735-Speed 7461.28 samples/sec Loss 7.2590 LearningRate 0.0639 Epoch: 4 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:07,833-Speed 9327.17 samples/sec Loss 7.2228 LearningRate 0.0639 Epoch: 4 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:08,939-Speed 9269.98 samples/sec Loss 7.2232 LearningRate 0.0639 Epoch: 4 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:10,017-Speed 9497.22 samples/sec Loss 7.1971 LearningRate 0.0639 Epoch: 4 Global Step: 66900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:11,101-Speed 9452.43 samples/sec Loss 7.2997 LearningRate 0.0639 Epoch: 4 Global Step: 66910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:12,212-Speed 9225.10 samples/sec Loss 7.1636 LearningRate 0.0639 Epoch: 4 Global Step: 66920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:13,303-Speed 9394.22 samples/sec Loss 7.0216 LearningRate 0.0639 Epoch: 4 Global Step: 66930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:14,391-Speed 9414.75 samples/sec Loss 7.2075 LearningRate 0.0639 Epoch: 4 Global Step: 66940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:15,523-Speed 9057.62 samples/sec Loss 7.2688 LearningRate 0.0639 Epoch: 4 Global Step: 66950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:16,593-Speed 9578.22 samples/sec Loss 7.1929 LearningRate 0.0639 Epoch: 4 Global Step: 66960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:17,654-Speed 9652.21 samples/sec Loss 7.1791 LearningRate 0.0639 Epoch: 4 Global Step: 66970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:18,735-Speed 9480.91 samples/sec Loss 7.2726 LearningRate 0.0639 Epoch: 4 Global Step: 66980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:19,853-Speed 9160.87 samples/sec Loss 7.1900 LearningRate 0.0639 Epoch: 4 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:20,945-Speed 9384.00 samples/sec Loss 7.2463 LearningRate 0.0639 Epoch: 4 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:22,023-Speed 9508.18 samples/sec Loss 7.1563 LearningRate 0.0639 Epoch: 4 Global Step: 67010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:23,302-Speed 8006.81 samples/sec Loss 7.2220 LearningRate 0.0639 Epoch: 4 Global Step: 67020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:24,504-Speed 8523.52 samples/sec Loss 7.2982 LearningRate 0.0639 Epoch: 4 Global Step: 67030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:25,590-Speed 9433.81 samples/sec Loss 7.1182 LearningRate 0.0639 Epoch: 4 Global Step: 67040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:26,719-Speed 9075.32 samples/sec Loss 7.2586 LearningRate 0.0639 Epoch: 4 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:27,812-Speed 9382.68 samples/sec Loss 7.1760 LearningRate 0.0639 Epoch: 4 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:28,901-Speed 9411.62 samples/sec Loss 7.1952 LearningRate 0.0639 Epoch: 4 Global Step: 67070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:29,968-Speed 9601.28 samples/sec Loss 7.2969 LearningRate 0.0638 Epoch: 4 Global Step: 67080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:31,057-Speed 9404.77 samples/sec Loss 7.3231 LearningRate 0.0638 Epoch: 4 Global Step: 67090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:32,173-Speed 9184.47 samples/sec Loss 7.2055 LearningRate 0.0638 Epoch: 4 Global Step: 67100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:33,228-Speed 9708.40 samples/sec Loss 7.3040 LearningRate 0.0638 Epoch: 4 Global Step: 67110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:34,322-Speed 9370.14 samples/sec Loss 7.2275 LearningRate 0.0638 Epoch: 4 Global Step: 67120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:35,436-Speed 9198.08 samples/sec Loss 7.1912 LearningRate 0.0638 Epoch: 4 Global Step: 67130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:36,525-Speed 9410.05 samples/sec Loss 7.2021 LearningRate 0.0638 Epoch: 4 Global Step: 67140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:37,602-Speed 9517.88 samples/sec Loss 7.2906 LearningRate 0.0638 Epoch: 4 Global Step: 67150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:38,732-Speed 9062.69 samples/sec Loss 7.3565 LearningRate 0.0638 Epoch: 4 Global Step: 67160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:39,801-Speed 9585.10 samples/sec Loss 7.3511 LearningRate 0.0638 Epoch: 4 Global Step: 67170 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:14:40,868-Speed 9608.86 samples/sec Loss 7.2775 LearningRate 0.0638 Epoch: 4 Global Step: 67180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:41,961-Speed 9369.07 samples/sec Loss 7.2377 LearningRate 0.0638 Epoch: 4 Global Step: 67190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:43,104-Speed 8966.75 samples/sec Loss 7.1536 LearningRate 0.0638 Epoch: 4 Global Step: 67200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:44,187-Speed 9460.90 samples/sec Loss 7.2879 LearningRate 0.0638 Epoch: 4 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:45,270-Speed 9456.52 samples/sec Loss 7.2592 LearningRate 0.0638 Epoch: 4 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:46,342-Speed 9561.65 samples/sec Loss 7.4487 LearningRate 0.0638 Epoch: 4 Global Step: 67230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:47,429-Speed 9430.11 samples/sec Loss 7.3110 LearningRate 0.0638 Epoch: 4 Global Step: 67240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:48,500-Speed 9569.88 samples/sec Loss 7.3158 LearningRate 0.0638 Epoch: 4 Global Step: 67250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:49,581-Speed 9479.81 samples/sec Loss 7.2538 LearningRate 0.0638 Epoch: 4 Global Step: 67260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:50,657-Speed 9519.14 samples/sec Loss 7.2195 LearningRate 0.0638 Epoch: 4 Global Step: 67270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:51,751-Speed 9368.17 samples/sec Loss 7.3283 LearningRate 0.0638 Epoch: 4 Global Step: 67280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:52,850-Speed 9321.19 samples/sec Loss 7.2468 LearningRate 0.0637 Epoch: 4 Global Step: 67290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:53,963-Speed 9203.19 samples/sec Loss 7.1727 LearningRate 0.0637 Epoch: 4 Global Step: 67300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:14:55,054-Speed 9395.49 samples/sec Loss 7.3411 LearningRate 0.0637 Epoch: 4 Global Step: 67310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:56,511-Speed 7034.17 samples/sec Loss 7.3170 LearningRate 0.0637 Epoch: 4 Global Step: 67320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:58,368-Speed 5514.76 samples/sec Loss 7.2364 LearningRate 0.0637 Epoch: 4 Global Step: 67330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:14:59,693-Speed 7735.77 samples/sec Loss 7.3485 LearningRate 0.0637 Epoch: 4 Global Step: 67340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:00,777-Speed 9450.85 samples/sec Loss 7.2279 LearningRate 0.0637 Epoch: 4 Global Step: 67350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:01,881-Speed 9283.60 samples/sec Loss 7.4170 LearningRate 0.0637 Epoch: 4 Global Step: 67360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:03,000-Speed 9150.17 samples/sec Loss 7.2235 LearningRate 0.0637 Epoch: 4 Global Step: 67370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:04,150-Speed 8914.70 samples/sec Loss 7.2001 LearningRate 0.0637 Epoch: 4 Global Step: 67380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:05,277-Speed 9089.91 samples/sec Loss 7.3621 LearningRate 0.0637 Epoch: 4 Global Step: 67390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:06,356-Speed 9504.02 samples/sec Loss 7.2358 LearningRate 0.0637 Epoch: 4 Global Step: 67400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:07,428-Speed 9554.60 samples/sec Loss 7.3059 LearningRate 0.0637 Epoch: 4 Global Step: 67410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:08,491-Speed 9657.99 samples/sec Loss 7.3097 LearningRate 0.0637 Epoch: 4 Global Step: 67420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:09,584-Speed 9367.92 samples/sec Loss 7.3080 LearningRate 0.0637 Epoch: 4 Global Step: 67430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:10,657-Speed 9548.91 samples/sec Loss 7.3150 LearningRate 0.0637 Epoch: 4 Global Step: 67440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:11,739-Speed 9474.18 samples/sec Loss 7.4047 LearningRate 0.0637 Epoch: 4 Global Step: 67450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:12,861-Speed 9129.83 samples/sec Loss 7.3297 LearningRate 0.0637 Epoch: 4 Global Step: 67460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:13,972-Speed 9222.96 samples/sec Loss 7.2538 LearningRate 0.0637 Epoch: 4 Global Step: 67470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:15,057-Speed 9445.44 samples/sec Loss 7.3564 LearningRate 0.0637 Epoch: 4 Global Step: 67480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:16,113-Speed 9703.85 samples/sec Loss 7.3896 LearningRate 0.0637 Epoch: 4 Global Step: 67490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:17,194-Speed 9480.15 samples/sec Loss 7.3555 LearningRate 0.0636 Epoch: 4 Global Step: 67500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:18,274-Speed 9481.97 samples/sec Loss 7.4023 LearningRate 0.0636 Epoch: 4 Global Step: 67510 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:15:19,338-Speed 9630.59 samples/sec Loss 7.3544 LearningRate 0.0636 Epoch: 4 Global Step: 67520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:15:20,399-Speed 9656.04 samples/sec Loss 7.4549 LearningRate 0.0636 Epoch: 4 Global Step: 67530 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:15:21,487-Speed 9418.97 samples/sec Loss 7.2568 LearningRate 0.0636 Epoch: 4 Global Step: 67540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:22,608-Speed 9144.62 samples/sec Loss 7.2584 LearningRate 0.0636 Epoch: 4 Global Step: 67550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:23,665-Speed 9687.99 samples/sec Loss 7.3908 LearningRate 0.0636 Epoch: 4 Global Step: 67560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:24,701-Speed 9896.76 samples/sec Loss 7.3087 LearningRate 0.0636 Epoch: 4 Global Step: 67570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:25,774-Speed 9544.33 samples/sec Loss 7.3013 LearningRate 0.0636 Epoch: 4 Global Step: 67580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:26,868-Speed 9367.94 samples/sec Loss 7.3206 LearningRate 0.0636 Epoch: 4 Global Step: 67590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:27,940-Speed 9556.80 samples/sec Loss 7.3031 LearningRate 0.0636 Epoch: 4 Global Step: 67600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:28,975-Speed 9902.30 samples/sec Loss 7.3324 LearningRate 0.0636 Epoch: 4 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:30,063-Speed 9424.55 samples/sec Loss 7.3607 LearningRate 0.0636 Epoch: 4 Global Step: 67620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:31,135-Speed 9554.06 samples/sec Loss 7.3618 LearningRate 0.0636 Epoch: 4 Global Step: 67630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:32,219-Speed 9450.14 samples/sec Loss 7.4759 LearningRate 0.0636 Epoch: 4 Global Step: 67640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:15:33,269-Speed 9767.43 samples/sec Loss 7.3442 LearningRate 0.0636 Epoch: 4 Global Step: 67650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:34,360-Speed 9395.95 samples/sec Loss 7.3519 LearningRate 0.0636 Epoch: 4 Global Step: 67660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:35,443-Speed 9461.64 samples/sec Loss 7.2099 LearningRate 0.0636 Epoch: 4 Global Step: 67670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:36,520-Speed 9515.61 samples/sec Loss 7.2891 LearningRate 0.0636 Epoch: 4 Global Step: 67680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:37,598-Speed 9503.83 samples/sec Loss 7.4198 LearningRate 0.0636 Epoch: 4 Global Step: 67690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:38,683-Speed 9441.15 samples/sec Loss 7.3173 LearningRate 0.0636 Epoch: 4 Global Step: 67700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:39,732-Speed 9762.12 samples/sec Loss 7.2977 LearningRate 0.0635 Epoch: 4 Global Step: 67710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:40,793-Speed 9655.01 samples/sec Loss 7.3151 LearningRate 0.0635 Epoch: 4 Global Step: 67720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:41,911-Speed 9167.46 samples/sec Loss 7.3396 LearningRate 0.0635 Epoch: 4 Global Step: 67730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:43,016-Speed 9274.71 samples/sec Loss 7.3969 LearningRate 0.0635 Epoch: 4 Global Step: 67740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:44,089-Speed 9546.36 samples/sec Loss 7.4908 LearningRate 0.0635 Epoch: 4 Global Step: 67750 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:15:45,159-Speed 9574.97 samples/sec Loss 7.4151 LearningRate 0.0635 Epoch: 4 Global Step: 67760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:46,231-Speed 9556.12 samples/sec Loss 7.3862 LearningRate 0.0635 Epoch: 4 Global Step: 67770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:47,330-Speed 9327.83 samples/sec Loss 7.2714 LearningRate 0.0635 Epoch: 4 Global Step: 67780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:48,427-Speed 9342.06 samples/sec Loss 7.4462 LearningRate 0.0635 Epoch: 4 Global Step: 67790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:49,525-Speed 9327.68 samples/sec Loss 7.4281 LearningRate 0.0635 Epoch: 4 Global Step: 67800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:50,605-Speed 9489.51 samples/sec Loss 7.3758 LearningRate 0.0635 Epoch: 4 Global Step: 67810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:51,739-Speed 9036.96 samples/sec Loss 7.3580 LearningRate 0.0635 Epoch: 4 Global Step: 67820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:52,808-Speed 9579.30 samples/sec Loss 7.4268 LearningRate 0.0635 Epoch: 4 Global Step: 67830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:53,894-Speed 9433.78 samples/sec Loss 7.3728 LearningRate 0.0635 Epoch: 4 Global Step: 67840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:54,970-Speed 9528.74 samples/sec Loss 7.4795 LearningRate 0.0635 Epoch: 4 Global Step: 67850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:56,036-Speed 9610.75 samples/sec Loss 7.5150 LearningRate 0.0635 Epoch: 4 Global Step: 67860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:57,214-Speed 8692.01 samples/sec Loss 7.4593 LearningRate 0.0635 Epoch: 4 Global Step: 67870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:58,287-Speed 9557.54 samples/sec Loss 7.3248 LearningRate 0.0635 Epoch: 4 Global Step: 67880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:15:59,373-Speed 9433.75 samples/sec Loss 7.4443 LearningRate 0.0635 Epoch: 4 Global Step: 67890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:16:00,453-Speed 9482.22 samples/sec Loss 7.3617 LearningRate 0.0635 Epoch: 4 Global Step: 67900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:16:01,511-Speed 9688.54 samples/sec Loss 7.4159 LearningRate 0.0635 Epoch: 4 Global Step: 67910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:16:02,590-Speed 9492.86 samples/sec Loss 7.3818 LearningRate 0.0634 Epoch: 4 Global Step: 67920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:16:03,667-Speed 9511.11 samples/sec Loss 7.3483 LearningRate 0.0634 Epoch: 4 Global Step: 67930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:16:04,747-Speed 9496.14 samples/sec Loss 7.3625 LearningRate 0.0634 Epoch: 4 Global Step: 67940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:16:05,848-Speed 9304.17 samples/sec Loss 7.3534 LearningRate 0.0634 Epoch: 4 Global Step: 67950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:16:06,937-Speed 9409.17 samples/sec Loss 7.4258 LearningRate 0.0634 Epoch: 4 Global Step: 67960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:16:08,003-Speed 9615.80 samples/sec Loss 7.4294 LearningRate 0.0634 Epoch: 4 Global Step: 67970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:16:09,106-Speed 9286.53 samples/sec Loss 7.3742 LearningRate 0.0634 Epoch: 4 Global Step: 67980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:16:10,170-Speed 9626.25 samples/sec Loss 7.4652 LearningRate 0.0634 Epoch: 4 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:16:11,237-Speed 9606.36 samples/sec Loss 7.4492 LearningRate 0.0634 Epoch: 4 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:16:33,360-[lfw][68000]XNorm: 12.272347 Training: 2022-04-11 14:16:33,361-[lfw][68000]Accuracy-Flip: 0.99533+-0.00267 Training: 2022-04-11 14:16:33,361-[lfw][68000]Accuracy-Highest: 0.99583 Training: 2022-04-11 14:16:58,876-[cfp_fp][68000]XNorm: 10.460671 Training: 2022-04-11 14:16:58,877-[cfp_fp][68000]Accuracy-Flip: 0.95014+-0.01296 Training: 2022-04-11 14:16:58,877-[cfp_fp][68000]Accuracy-Highest: 0.95171 Training: 2022-04-11 14:17:20,897-[agedb_30][68000]XNorm: 11.886236 Training: 2022-04-11 14:17:20,898-[agedb_30][68000]Accuracy-Flip: 0.95700+-0.00710 Training: 2022-04-11 14:17:20,898-[agedb_30][68000]Accuracy-Highest: 0.96033 Training: 2022-04-11 14:17:21,965-Speed 144.78 samples/sec Loss 7.4270 LearningRate 0.0634 Epoch: 4 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:17:23,063-Speed 9327.98 samples/sec Loss 7.4832 LearningRate 0.0634 Epoch: 4 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:17:24,165-Speed 9296.68 samples/sec Loss 7.5107 LearningRate 0.0634 Epoch: 4 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:17:25,259-Speed 9366.83 samples/sec Loss 7.4646 LearningRate 0.0634 Epoch: 4 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:17:26,395-Speed 9026.60 samples/sec Loss 7.5099 LearningRate 0.0634 Epoch: 4 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:17:27,454-Speed 9672.35 samples/sec Loss 7.5626 LearningRate 0.0634 Epoch: 4 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:17:28,550-Speed 9353.29 samples/sec Loss 7.4349 LearningRate 0.0634 Epoch: 4 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:17:29,671-Speed 9140.16 samples/sec Loss 7.4486 LearningRate 0.0634 Epoch: 4 Global Step: 68080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:30,773-Speed 9295.43 samples/sec Loss 7.4644 LearningRate 0.0634 Epoch: 4 Global Step: 68090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:31,901-Speed 9087.28 samples/sec Loss 7.5522 LearningRate 0.0634 Epoch: 4 Global Step: 68100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:32,982-Speed 9471.84 samples/sec Loss 7.5020 LearningRate 0.0634 Epoch: 4 Global Step: 68110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:34,041-Speed 9678.88 samples/sec Loss 7.4806 LearningRate 0.0634 Epoch: 4 Global Step: 68120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:35,143-Speed 9304.13 samples/sec Loss 7.4374 LearningRate 0.0633 Epoch: 4 Global Step: 68130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:36,225-Speed 9464.99 samples/sec Loss 7.5626 LearningRate 0.0633 Epoch: 4 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:37,279-Speed 9722.99 samples/sec Loss 7.4898 LearningRate 0.0633 Epoch: 4 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:38,418-Speed 8995.48 samples/sec Loss 7.3994 LearningRate 0.0633 Epoch: 4 Global Step: 68160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:39,542-Speed 9115.97 samples/sec Loss 7.4380 LearningRate 0.0633 Epoch: 4 Global Step: 68170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:40,642-Speed 9313.10 samples/sec Loss 7.5164 LearningRate 0.0633 Epoch: 4 Global Step: 68180 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:17:41,766-Speed 9118.28 samples/sec Loss 7.4364 LearningRate 0.0633 Epoch: 4 Global Step: 68190 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:17:42,844-Speed 9499.53 samples/sec Loss 7.4790 LearningRate 0.0633 Epoch: 4 Global Step: 68200 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:17:43,944-Speed 9315.47 samples/sec Loss 7.4580 LearningRate 0.0633 Epoch: 4 Global Step: 68210 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:17:45,056-Speed 9214.68 samples/sec Loss 7.5037 LearningRate 0.0633 Epoch: 4 Global Step: 68220 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:17:46,136-Speed 9487.27 samples/sec Loss 7.4229 LearningRate 0.0633 Epoch: 4 Global Step: 68230 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:17:47,217-Speed 9480.68 samples/sec Loss 7.5954 LearningRate 0.0633 Epoch: 4 Global Step: 68240 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:17:48,294-Speed 9513.98 samples/sec Loss 7.3525 LearningRate 0.0633 Epoch: 4 Global Step: 68250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:49,365-Speed 9559.87 samples/sec Loss 7.4272 LearningRate 0.0633 Epoch: 4 Global Step: 68260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:50,403-Speed 9874.76 samples/sec Loss 7.4124 LearningRate 0.0633 Epoch: 4 Global Step: 68270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:51,514-Speed 9226.01 samples/sec Loss 7.5711 LearningRate 0.0633 Epoch: 4 Global Step: 68280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:52,641-Speed 9084.48 samples/sec Loss 7.4383 LearningRate 0.0633 Epoch: 4 Global Step: 68290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:53,745-Speed 9288.61 samples/sec Loss 7.5615 LearningRate 0.0633 Epoch: 4 Global Step: 68300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:54,838-Speed 9375.61 samples/sec Loss 7.3592 LearningRate 0.0633 Epoch: 4 Global Step: 68310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:55,951-Speed 9199.91 samples/sec Loss 7.5768 LearningRate 0.0633 Epoch: 4 Global Step: 68320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:57,055-Speed 9282.38 samples/sec Loss 7.4562 LearningRate 0.0633 Epoch: 4 Global Step: 68330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:58,127-Speed 9565.81 samples/sec Loss 7.5039 LearningRate 0.0632 Epoch: 4 Global Step: 68340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:17:59,202-Speed 9526.50 samples/sec Loss 7.4781 LearningRate 0.0632 Epoch: 4 Global Step: 68350 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:18:00,282-Speed 9486.09 samples/sec Loss 7.4768 LearningRate 0.0632 Epoch: 4 Global Step: 68360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:01,378-Speed 9350.69 samples/sec Loss 7.5977 LearningRate 0.0632 Epoch: 4 Global Step: 68370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:02,450-Speed 9558.17 samples/sec Loss 7.4301 LearningRate 0.0632 Epoch: 4 Global Step: 68380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:03,558-Speed 9248.45 samples/sec Loss 7.5740 LearningRate 0.0632 Epoch: 4 Global Step: 68390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:04,654-Speed 9348.25 samples/sec Loss 7.5275 LearningRate 0.0632 Epoch: 4 Global Step: 68400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:05,775-Speed 9135.31 samples/sec Loss 7.5092 LearningRate 0.0632 Epoch: 4 Global Step: 68410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:06,907-Speed 9054.48 samples/sec Loss 7.4854 LearningRate 0.0632 Epoch: 4 Global Step: 68420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:08,013-Speed 9260.76 samples/sec Loss 7.5379 LearningRate 0.0632 Epoch: 4 Global Step: 68430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:09,093-Speed 9494.68 samples/sec Loss 7.4739 LearningRate 0.0632 Epoch: 4 Global Step: 68440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:10,176-Speed 9453.50 samples/sec Loss 7.4928 LearningRate 0.0632 Epoch: 4 Global Step: 68450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:11,281-Speed 9276.99 samples/sec Loss 7.4593 LearningRate 0.0632 Epoch: 4 Global Step: 68460 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:18:12,346-Speed 9621.96 samples/sec Loss 7.5984 LearningRate 0.0632 Epoch: 4 Global Step: 68470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:18:13,433-Speed 9422.01 samples/sec Loss 7.5428 LearningRate 0.0632 Epoch: 4 Global Step: 68480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:14,487-Speed 9728.78 samples/sec Loss 7.5258 LearningRate 0.0632 Epoch: 4 Global Step: 68490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:15,571-Speed 9450.87 samples/sec Loss 7.5249 LearningRate 0.0632 Epoch: 4 Global Step: 68500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:16,644-Speed 9549.05 samples/sec Loss 7.5697 LearningRate 0.0632 Epoch: 4 Global Step: 68510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:17,742-Speed 9329.66 samples/sec Loss 7.4553 LearningRate 0.0632 Epoch: 4 Global Step: 68520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:18,823-Speed 9480.09 samples/sec Loss 7.5288 LearningRate 0.0632 Epoch: 4 Global Step: 68530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:19,928-Speed 9266.49 samples/sec Loss 7.4767 LearningRate 0.0632 Epoch: 4 Global Step: 68540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:20,971-Speed 9830.42 samples/sec Loss 7.4960 LearningRate 0.0631 Epoch: 4 Global Step: 68550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:22,045-Speed 9537.92 samples/sec Loss 7.4474 LearningRate 0.0631 Epoch: 4 Global Step: 68560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:23,130-Speed 9440.01 samples/sec Loss 7.3279 LearningRate 0.0631 Epoch: 4 Global Step: 68570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:24,217-Speed 9426.07 samples/sec Loss 7.4676 LearningRate 0.0631 Epoch: 4 Global Step: 68580 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:18:25,290-Speed 9548.68 samples/sec Loss 7.5069 LearningRate 0.0631 Epoch: 4 Global Step: 68590 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:18:26,359-Speed 9587.07 samples/sec Loss 7.6188 LearningRate 0.0631 Epoch: 4 Global Step: 68600 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:18:27,487-Speed 9082.00 samples/sec Loss 7.5660 LearningRate 0.0631 Epoch: 4 Global Step: 68610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:28,627-Speed 8991.15 samples/sec Loss 7.4796 LearningRate 0.0631 Epoch: 4 Global Step: 68620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:29,735-Speed 9248.02 samples/sec Loss 7.5151 LearningRate 0.0631 Epoch: 4 Global Step: 68630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:30,804-Speed 9586.75 samples/sec Loss 7.5127 LearningRate 0.0631 Epoch: 4 Global Step: 68640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:31,871-Speed 9601.36 samples/sec Loss 7.5097 LearningRate 0.0631 Epoch: 4 Global Step: 68650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:32,931-Speed 9669.19 samples/sec Loss 7.5846 LearningRate 0.0631 Epoch: 4 Global Step: 68660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:34,013-Speed 9468.30 samples/sec Loss 7.4948 LearningRate 0.0631 Epoch: 4 Global Step: 68670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:35,094-Speed 9480.88 samples/sec Loss 7.5854 LearningRate 0.0631 Epoch: 4 Global Step: 68680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:36,174-Speed 9482.38 samples/sec Loss 7.6166 LearningRate 0.0631 Epoch: 4 Global Step: 68690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:37,263-Speed 9412.94 samples/sec Loss 7.5778 LearningRate 0.0631 Epoch: 4 Global Step: 68700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:38,382-Speed 9157.21 samples/sec Loss 7.4836 LearningRate 0.0631 Epoch: 4 Global Step: 68710 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:18:39,451-Speed 9583.58 samples/sec Loss 7.4658 LearningRate 0.0631 Epoch: 4 Global Step: 68720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:40,499-Speed 9775.97 samples/sec Loss 7.4602 LearningRate 0.0631 Epoch: 4 Global Step: 68730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:41,616-Speed 9169.53 samples/sec Loss 7.6168 LearningRate 0.0631 Epoch: 4 Global Step: 68740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:42,716-Speed 9316.03 samples/sec Loss 7.5211 LearningRate 0.0631 Epoch: 4 Global Step: 68750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:43,850-Speed 9032.83 samples/sec Loss 7.6591 LearningRate 0.0630 Epoch: 4 Global Step: 68760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:44,953-Speed 9286.77 samples/sec Loss 7.6212 LearningRate 0.0630 Epoch: 4 Global Step: 68770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:46,021-Speed 9601.84 samples/sec Loss 7.4394 LearningRate 0.0630 Epoch: 4 Global Step: 68780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:47,076-Speed 9708.86 samples/sec Loss 7.5392 LearningRate 0.0630 Epoch: 4 Global Step: 68790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:48,187-Speed 9222.07 samples/sec Loss 7.4433 LearningRate 0.0630 Epoch: 4 Global Step: 68800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:49,267-Speed 9490.02 samples/sec Loss 7.4628 LearningRate 0.0630 Epoch: 4 Global Step: 68810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:50,336-Speed 9591.46 samples/sec Loss 7.5026 LearningRate 0.0630 Epoch: 4 Global Step: 68820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:51,442-Speed 9258.45 samples/sec Loss 7.6104 LearningRate 0.0630 Epoch: 4 Global Step: 68830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:52,561-Speed 9155.21 samples/sec Loss 7.5644 LearningRate 0.0630 Epoch: 4 Global Step: 68840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:53,675-Speed 9199.21 samples/sec Loss 7.5680 LearningRate 0.0630 Epoch: 4 Global Step: 68850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:54,774-Speed 9323.16 samples/sec Loss 7.4379 LearningRate 0.0630 Epoch: 4 Global Step: 68860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:55,854-Speed 9482.84 samples/sec Loss 7.5547 LearningRate 0.0630 Epoch: 4 Global Step: 68870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:56,941-Speed 9431.39 samples/sec Loss 7.5673 LearningRate 0.0630 Epoch: 4 Global Step: 68880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:58,054-Speed 9203.26 samples/sec Loss 7.5145 LearningRate 0.0630 Epoch: 4 Global Step: 68890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:18:59,165-Speed 9226.42 samples/sec Loss 7.4619 LearningRate 0.0630 Epoch: 4 Global Step: 68900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:00,234-Speed 9579.68 samples/sec Loss 7.6014 LearningRate 0.0630 Epoch: 4 Global Step: 68910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:01,326-Speed 9388.75 samples/sec Loss 7.6029 LearningRate 0.0630 Epoch: 4 Global Step: 68920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:02,425-Speed 9315.11 samples/sec Loss 7.5063 LearningRate 0.0630 Epoch: 4 Global Step: 68930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:03,515-Speed 9403.12 samples/sec Loss 7.4975 LearningRate 0.0630 Epoch: 4 Global Step: 68940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:04,619-Speed 9282.63 samples/sec Loss 7.6351 LearningRate 0.0630 Epoch: 4 Global Step: 68950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:05,699-Speed 9486.38 samples/sec Loss 7.5386 LearningRate 0.0630 Epoch: 4 Global Step: 68960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:06,854-Speed 8873.05 samples/sec Loss 7.6090 LearningRate 0.0629 Epoch: 4 Global Step: 68970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:07,961-Speed 9256.14 samples/sec Loss 7.6085 LearningRate 0.0629 Epoch: 4 Global Step: 68980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:09,087-Speed 9100.69 samples/sec Loss 7.6638 LearningRate 0.0629 Epoch: 4 Global Step: 68990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:10,192-Speed 9275.54 samples/sec Loss 7.6065 LearningRate 0.0629 Epoch: 4 Global Step: 69000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:11,252-Speed 9658.65 samples/sec Loss 7.6496 LearningRate 0.0629 Epoch: 4 Global Step: 69010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:12,312-Speed 9669.05 samples/sec Loss 7.5110 LearningRate 0.0629 Epoch: 4 Global Step: 69020 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:19:13,359-Speed 9786.86 samples/sec Loss 7.5995 LearningRate 0.0629 Epoch: 4 Global Step: 69030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:14,456-Speed 9341.84 samples/sec Loss 7.5597 LearningRate 0.0629 Epoch: 4 Global Step: 69040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:15,528-Speed 9555.91 samples/sec Loss 7.5414 LearningRate 0.0629 Epoch: 4 Global Step: 69050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:16,594-Speed 9611.99 samples/sec Loss 7.6662 LearningRate 0.0629 Epoch: 4 Global Step: 69060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:17,673-Speed 9491.27 samples/sec Loss 7.6274 LearningRate 0.0629 Epoch: 4 Global Step: 69070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:18,817-Speed 8959.97 samples/sec Loss 7.4905 LearningRate 0.0629 Epoch: 4 Global Step: 69080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:19,868-Speed 9751.27 samples/sec Loss 7.5885 LearningRate 0.0629 Epoch: 4 Global Step: 69090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:20,926-Speed 9684.83 samples/sec Loss 7.5626 LearningRate 0.0629 Epoch: 4 Global Step: 69100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:22,006-Speed 9490.33 samples/sec Loss 7.5536 LearningRate 0.0629 Epoch: 4 Global Step: 69110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:23,083-Speed 9510.67 samples/sec Loss 7.5777 LearningRate 0.0629 Epoch: 4 Global Step: 69120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:24,173-Speed 9395.48 samples/sec Loss 7.4710 LearningRate 0.0629 Epoch: 4 Global Step: 69130 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:19:25,229-Speed 9705.51 samples/sec Loss 7.5705 LearningRate 0.0629 Epoch: 4 Global Step: 69140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:26,306-Speed 9518.16 samples/sec Loss 7.6924 LearningRate 0.0629 Epoch: 4 Global Step: 69150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:27,376-Speed 9573.10 samples/sec Loss 7.5629 LearningRate 0.0629 Epoch: 4 Global Step: 69160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:28,425-Speed 9767.55 samples/sec Loss 7.6028 LearningRate 0.0629 Epoch: 4 Global Step: 69170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:29,499-Speed 9542.06 samples/sec Loss 7.5713 LearningRate 0.0628 Epoch: 4 Global Step: 69180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:30,582-Speed 9458.77 samples/sec Loss 7.6258 LearningRate 0.0628 Epoch: 4 Global Step: 69190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:31,681-Speed 9327.77 samples/sec Loss 7.5294 LearningRate 0.0628 Epoch: 4 Global Step: 69200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:32,761-Speed 9485.40 samples/sec Loss 7.5397 LearningRate 0.0628 Epoch: 4 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:33,832-Speed 9561.61 samples/sec Loss 7.5343 LearningRate 0.0628 Epoch: 4 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:34,867-Speed 9911.26 samples/sec Loss 7.5995 LearningRate 0.0628 Epoch: 4 Global Step: 69230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:35,935-Speed 9588.68 samples/sec Loss 7.6603 LearningRate 0.0628 Epoch: 4 Global Step: 69240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:37,037-Speed 9295.30 samples/sec Loss 7.5417 LearningRate 0.0628 Epoch: 4 Global Step: 69250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:38,128-Speed 9392.66 samples/sec Loss 7.6068 LearningRate 0.0628 Epoch: 4 Global Step: 69260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:39,219-Speed 9388.75 samples/sec Loss 7.6875 LearningRate 0.0628 Epoch: 4 Global Step: 69270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:40,305-Speed 9434.73 samples/sec Loss 7.6174 LearningRate 0.0628 Epoch: 4 Global Step: 69280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:41,404-Speed 9327.79 samples/sec Loss 7.6269 LearningRate 0.0628 Epoch: 4 Global Step: 69290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:42,509-Speed 9270.25 samples/sec Loss 7.5751 LearningRate 0.0628 Epoch: 4 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:43,627-Speed 9165.64 samples/sec Loss 7.6579 LearningRate 0.0628 Epoch: 4 Global Step: 69310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:44,713-Speed 9426.99 samples/sec Loss 7.7556 LearningRate 0.0628 Epoch: 4 Global Step: 69320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:45,789-Speed 9526.01 samples/sec Loss 7.6199 LearningRate 0.0628 Epoch: 4 Global Step: 69330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:46,885-Speed 9351.83 samples/sec Loss 7.5693 LearningRate 0.0628 Epoch: 4 Global Step: 69340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:47,996-Speed 9228.24 samples/sec Loss 7.5823 LearningRate 0.0628 Epoch: 4 Global Step: 69350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:19:49,071-Speed 9530.42 samples/sec Loss 7.7412 LearningRate 0.0628 Epoch: 4 Global Step: 69360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:50,175-Speed 9273.51 samples/sec Loss 7.5735 LearningRate 0.0628 Epoch: 4 Global Step: 69370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:51,288-Speed 9207.61 samples/sec Loss 7.6462 LearningRate 0.0628 Epoch: 4 Global Step: 69380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:52,413-Speed 9107.57 samples/sec Loss 7.5508 LearningRate 0.0627 Epoch: 4 Global Step: 69390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:53,511-Speed 9329.06 samples/sec Loss 7.5786 LearningRate 0.0627 Epoch: 4 Global Step: 69400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:54,563-Speed 9745.58 samples/sec Loss 7.5313 LearningRate 0.0627 Epoch: 4 Global Step: 69410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:55,640-Speed 9507.16 samples/sec Loss 7.5835 LearningRate 0.0627 Epoch: 4 Global Step: 69420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:56,675-Speed 9902.29 samples/sec Loss 7.6263 LearningRate 0.0627 Epoch: 4 Global Step: 69430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:57,715-Speed 9852.25 samples/sec Loss 7.6617 LearningRate 0.0627 Epoch: 4 Global Step: 69440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:58,785-Speed 9578.63 samples/sec Loss 7.6312 LearningRate 0.0627 Epoch: 4 Global Step: 69450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:19:59,834-Speed 9768.83 samples/sec Loss 7.6030 LearningRate 0.0627 Epoch: 4 Global Step: 69460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:00,935-Speed 9306.55 samples/sec Loss 7.5645 LearningRate 0.0627 Epoch: 4 Global Step: 69470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:02,029-Speed 9364.78 samples/sec Loss 7.6210 LearningRate 0.0627 Epoch: 4 Global Step: 69480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:03,112-Speed 9455.34 samples/sec Loss 7.5404 LearningRate 0.0627 Epoch: 4 Global Step: 69490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:04,178-Speed 9619.94 samples/sec Loss 7.6116 LearningRate 0.0627 Epoch: 4 Global Step: 69500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:05,304-Speed 9104.07 samples/sec Loss 7.5665 LearningRate 0.0627 Epoch: 4 Global Step: 69510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:06,413-Speed 9234.37 samples/sec Loss 7.6820 LearningRate 0.0627 Epoch: 4 Global Step: 69520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:07,503-Speed 9399.48 samples/sec Loss 7.6664 LearningRate 0.0627 Epoch: 4 Global Step: 69530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:08,615-Speed 9216.52 samples/sec Loss 7.5358 LearningRate 0.0627 Epoch: 4 Global Step: 69540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:09,676-Speed 9652.03 samples/sec Loss 7.6243 LearningRate 0.0627 Epoch: 4 Global Step: 69550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:10,740-Speed 9639.97 samples/sec Loss 7.6886 LearningRate 0.0627 Epoch: 4 Global Step: 69560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:11,817-Speed 9513.54 samples/sec Loss 7.5907 LearningRate 0.0627 Epoch: 4 Global Step: 69570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:12,931-Speed 9192.66 samples/sec Loss 7.6224 LearningRate 0.0627 Epoch: 4 Global Step: 69580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:13,991-Speed 9663.39 samples/sec Loss 7.5346 LearningRate 0.0627 Epoch: 4 Global Step: 69590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:15,051-Speed 9671.69 samples/sec Loss 7.6025 LearningRate 0.0626 Epoch: 4 Global Step: 69600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:16,159-Speed 9242.28 samples/sec Loss 7.7378 LearningRate 0.0626 Epoch: 4 Global Step: 69610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:17,246-Speed 9422.82 samples/sec Loss 7.6973 LearningRate 0.0626 Epoch: 4 Global Step: 69620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:18,285-Speed 9862.82 samples/sec Loss 7.5819 LearningRate 0.0626 Epoch: 4 Global Step: 69630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:19,377-Speed 9388.79 samples/sec Loss 7.5340 LearningRate 0.0626 Epoch: 4 Global Step: 69640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:20,499-Speed 9123.56 samples/sec Loss 7.6254 LearningRate 0.0626 Epoch: 4 Global Step: 69650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:21,629-Speed 9068.63 samples/sec Loss 7.6272 LearningRate 0.0626 Epoch: 4 Global Step: 69660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:22,687-Speed 9684.27 samples/sec Loss 7.5885 LearningRate 0.0626 Epoch: 4 Global Step: 69670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:23,774-Speed 9435.00 samples/sec Loss 7.6477 LearningRate 0.0626 Epoch: 4 Global Step: 69680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:20:24,835-Speed 9661.88 samples/sec Loss 7.4916 LearningRate 0.0626 Epoch: 4 Global Step: 69690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:25,919-Speed 9448.67 samples/sec Loss 7.6743 LearningRate 0.0626 Epoch: 4 Global Step: 69700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:27,002-Speed 9454.32 samples/sec Loss 7.6678 LearningRate 0.0626 Epoch: 4 Global Step: 69710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:28,123-Speed 9146.07 samples/sec Loss 7.6243 LearningRate 0.0626 Epoch: 4 Global Step: 69720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:29,228-Speed 9269.08 samples/sec Loss 7.6914 LearningRate 0.0626 Epoch: 4 Global Step: 69730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:30,330-Speed 9299.71 samples/sec Loss 7.6608 LearningRate 0.0626 Epoch: 4 Global Step: 69740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:31,372-Speed 9825.55 samples/sec Loss 7.6537 LearningRate 0.0626 Epoch: 4 Global Step: 69750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:32,480-Speed 9249.51 samples/sec Loss 7.6006 LearningRate 0.0626 Epoch: 4 Global Step: 69760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:33,558-Speed 9511.09 samples/sec Loss 7.7208 LearningRate 0.0626 Epoch: 4 Global Step: 69770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:34,616-Speed 9684.35 samples/sec Loss 7.7663 LearningRate 0.0626 Epoch: 4 Global Step: 69780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:35,688-Speed 9554.46 samples/sec Loss 7.7717 LearningRate 0.0626 Epoch: 4 Global Step: 69790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:20:36,758-Speed 9580.34 samples/sec Loss 7.6585 LearningRate 0.0626 Epoch: 4 Global Step: 69800 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:20:37,834-Speed 9514.34 samples/sec Loss 7.6605 LearningRate 0.0625 Epoch: 4 Global Step: 69810 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:20:38,869-Speed 9903.12 samples/sec Loss 7.6171 LearningRate 0.0625 Epoch: 4 Global Step: 69820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:20:39,932-Speed 9644.71 samples/sec Loss 7.6100 LearningRate 0.0625 Epoch: 4 Global Step: 69830 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:20:40,978-Speed 9794.85 samples/sec Loss 7.6373 LearningRate 0.0625 Epoch: 4 Global Step: 69840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:42,040-Speed 9645.12 samples/sec Loss 7.6550 LearningRate 0.0625 Epoch: 4 Global Step: 69850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:43,119-Speed 9507.25 samples/sec Loss 7.5974 LearningRate 0.0625 Epoch: 4 Global Step: 69860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:44,184-Speed 9624.06 samples/sec Loss 7.6766 LearningRate 0.0625 Epoch: 4 Global Step: 69870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:45,274-Speed 9393.64 samples/sec Loss 7.6902 LearningRate 0.0625 Epoch: 4 Global Step: 69880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:46,313-Speed 9868.24 samples/sec Loss 7.6025 LearningRate 0.0625 Epoch: 4 Global Step: 69890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:47,414-Speed 9297.64 samples/sec Loss 7.6302 LearningRate 0.0625 Epoch: 4 Global Step: 69900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:48,493-Speed 9496.79 samples/sec Loss 7.6005 LearningRate 0.0625 Epoch: 4 Global Step: 69910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:49,584-Speed 9397.00 samples/sec Loss 7.6347 LearningRate 0.0625 Epoch: 4 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:50,691-Speed 9251.18 samples/sec Loss 7.6276 LearningRate 0.0625 Epoch: 4 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:20:51,763-Speed 9558.34 samples/sec Loss 7.5177 LearningRate 0.0625 Epoch: 4 Global Step: 69940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:52,853-Speed 9402.98 samples/sec Loss 7.6405 LearningRate 0.0625 Epoch: 4 Global Step: 69950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:53,947-Speed 9367.26 samples/sec Loss 7.6385 LearningRate 0.0625 Epoch: 4 Global Step: 69960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:55,000-Speed 9724.75 samples/sec Loss 7.5679 LearningRate 0.0625 Epoch: 4 Global Step: 69970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:56,114-Speed 9198.19 samples/sec Loss 7.5227 LearningRate 0.0625 Epoch: 4 Global Step: 69980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:57,175-Speed 9657.25 samples/sec Loss 7.6414 LearningRate 0.0625 Epoch: 4 Global Step: 69990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:20:58,278-Speed 9292.13 samples/sec Loss 7.6339 LearningRate 0.0625 Epoch: 4 Global Step: 70000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:21:20,236-[lfw][70000]XNorm: 12.313847 Training: 2022-04-11 14:21:20,237-[lfw][70000]Accuracy-Flip: 0.99550+-0.00236 Training: 2022-04-11 14:21:20,237-[lfw][70000]Accuracy-Highest: 0.99583 Training: 2022-04-11 14:21:45,411-[cfp_fp][70000]XNorm: 10.456402 Training: 2022-04-11 14:21:45,412-[cfp_fp][70000]Accuracy-Flip: 0.95143+-0.00996 Training: 2022-04-11 14:21:45,412-[cfp_fp][70000]Accuracy-Highest: 0.95171 Training: 2022-04-11 14:22:07,101-[agedb_30][70000]XNorm: 11.907823 Training: 2022-04-11 14:22:07,102-[agedb_30][70000]Accuracy-Flip: 0.95767+-0.00904 Training: 2022-04-11 14:22:07,102-[agedb_30][70000]Accuracy-Highest: 0.96033 Training: 2022-04-11 14:22:08,216-Speed 146.42 samples/sec Loss 7.6580 LearningRate 0.0625 Epoch: 4 Global Step: 70010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:09,327-Speed 9225.05 samples/sec Loss 7.6722 LearningRate 0.0624 Epoch: 4 Global Step: 70020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:10,378-Speed 9747.67 samples/sec Loss 7.6835 LearningRate 0.0624 Epoch: 4 Global Step: 70030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:11,462-Speed 9450.62 samples/sec Loss 7.6394 LearningRate 0.0624 Epoch: 4 Global Step: 70040 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:22:12,541-Speed 9499.71 samples/sec Loss 7.6854 LearningRate 0.0624 Epoch: 4 Global Step: 70050 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:22:13,639-Speed 9325.69 samples/sec Loss 7.5419 LearningRate 0.0624 Epoch: 4 Global Step: 70060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:14,721-Speed 9477.25 samples/sec Loss 7.7276 LearningRate 0.0624 Epoch: 4 Global Step: 70070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:15,768-Speed 9781.42 samples/sec Loss 7.7028 LearningRate 0.0624 Epoch: 4 Global Step: 70080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:16,816-Speed 9779.31 samples/sec Loss 7.6910 LearningRate 0.0624 Epoch: 4 Global Step: 70090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:17,919-Speed 9290.79 samples/sec Loss 7.7285 LearningRate 0.0624 Epoch: 4 Global Step: 70100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:19,007-Speed 9412.94 samples/sec Loss 7.5877 LearningRate 0.0624 Epoch: 4 Global Step: 70110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:20,130-Speed 9131.59 samples/sec Loss 7.5968 LearningRate 0.0624 Epoch: 4 Global Step: 70120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:21,231-Speed 9303.45 samples/sec Loss 7.7328 LearningRate 0.0624 Epoch: 4 Global Step: 70130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:22,284-Speed 9726.18 samples/sec Loss 7.6851 LearningRate 0.0624 Epoch: 4 Global Step: 70140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:23,363-Speed 9495.58 samples/sec Loss 7.6286 LearningRate 0.0624 Epoch: 4 Global Step: 70150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:24,476-Speed 9207.06 samples/sec Loss 7.6216 LearningRate 0.0624 Epoch: 4 Global Step: 70160 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:22:25,620-Speed 8960.55 samples/sec Loss 7.8567 LearningRate 0.0624 Epoch: 4 Global Step: 70170 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:22:26,694-Speed 9540.66 samples/sec Loss 7.5610 LearningRate 0.0624 Epoch: 4 Global Step: 70180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:27,784-Speed 9426.87 samples/sec Loss 7.5531 LearningRate 0.0624 Epoch: 4 Global Step: 70190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:28,902-Speed 9166.91 samples/sec Loss 7.6241 LearningRate 0.0624 Epoch: 4 Global Step: 70200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:30,012-Speed 9225.57 samples/sec Loss 7.7356 LearningRate 0.0624 Epoch: 4 Global Step: 70210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:31,120-Speed 9245.42 samples/sec Loss 7.7479 LearningRate 0.0624 Epoch: 4 Global Step: 70220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:32,227-Speed 9256.54 samples/sec Loss 7.6894 LearningRate 0.0623 Epoch: 4 Global Step: 70230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:33,330-Speed 9290.87 samples/sec Loss 7.6973 LearningRate 0.0623 Epoch: 4 Global Step: 70240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:34,429-Speed 9319.84 samples/sec Loss 7.6879 LearningRate 0.0623 Epoch: 4 Global Step: 70250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:35,567-Speed 9009.69 samples/sec Loss 7.5463 LearningRate 0.0623 Epoch: 4 Global Step: 70260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:36,639-Speed 9550.01 samples/sec Loss 7.5227 LearningRate 0.0623 Epoch: 4 Global Step: 70270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:37,727-Speed 9422.74 samples/sec Loss 7.7614 LearningRate 0.0623 Epoch: 4 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:38,803-Speed 9516.31 samples/sec Loss 7.6475 LearningRate 0.0623 Epoch: 4 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:39,878-Speed 9531.74 samples/sec Loss 7.6472 LearningRate 0.0623 Epoch: 4 Global Step: 70300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:40,979-Speed 9315.24 samples/sec Loss 7.5824 LearningRate 0.0623 Epoch: 4 Global Step: 70310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:42,080-Speed 9300.34 samples/sec Loss 7.7547 LearningRate 0.0623 Epoch: 4 Global Step: 70320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:43,214-Speed 9040.00 samples/sec Loss 7.6512 LearningRate 0.0623 Epoch: 4 Global Step: 70330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:44,258-Speed 9809.16 samples/sec Loss 7.7607 LearningRate 0.0623 Epoch: 4 Global Step: 70340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:45,330-Speed 9553.51 samples/sec Loss 7.7148 LearningRate 0.0623 Epoch: 4 Global Step: 70350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:46,396-Speed 9621.31 samples/sec Loss 7.6979 LearningRate 0.0623 Epoch: 4 Global Step: 70360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:47,491-Speed 9354.71 samples/sec Loss 7.7005 LearningRate 0.0623 Epoch: 4 Global Step: 70370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:48,567-Speed 9521.96 samples/sec Loss 7.5061 LearningRate 0.0623 Epoch: 4 Global Step: 70380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:49,651-Speed 9454.18 samples/sec Loss 7.6248 LearningRate 0.0623 Epoch: 4 Global Step: 70390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:50,764-Speed 9200.68 samples/sec Loss 7.7108 LearningRate 0.0623 Epoch: 4 Global Step: 70400 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:22:51,848-Speed 9455.47 samples/sec Loss 7.5916 LearningRate 0.0623 Epoch: 4 Global Step: 70410 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:22:52,984-Speed 9017.45 samples/sec Loss 7.6787 LearningRate 0.0623 Epoch: 4 Global Step: 70420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:22:54,067-Speed 9466.49 samples/sec Loss 7.5438 LearningRate 0.0623 Epoch: 4 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:55,175-Speed 9244.80 samples/sec Loss 7.6729 LearningRate 0.0623 Epoch: 4 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:56,260-Speed 9442.97 samples/sec Loss 7.6586 LearningRate 0.0622 Epoch: 4 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:57,390-Speed 9069.37 samples/sec Loss 7.6598 LearningRate 0.0622 Epoch: 4 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:58,441-Speed 9747.22 samples/sec Loss 7.6151 LearningRate 0.0622 Epoch: 4 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:22:59,548-Speed 9249.69 samples/sec Loss 7.6811 LearningRate 0.0622 Epoch: 4 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:23:00,655-Speed 9261.55 samples/sec Loss 7.5633 LearningRate 0.0622 Epoch: 4 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:23:01,772-Speed 9171.52 samples/sec Loss 7.6966 LearningRate 0.0622 Epoch: 4 Global Step: 70500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:23:02,852-Speed 9483.50 samples/sec Loss 7.5721 LearningRate 0.0622 Epoch: 4 Global Step: 70510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:23:03,927-Speed 9535.83 samples/sec Loss 7.7301 LearningRate 0.0622 Epoch: 4 Global Step: 70520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:23:05,031-Speed 9284.79 samples/sec Loss 7.5533 LearningRate 0.0622 Epoch: 4 Global Step: 70530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:06,130-Speed 9323.52 samples/sec Loss 7.6423 LearningRate 0.0622 Epoch: 4 Global Step: 70540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:07,189-Speed 9669.69 samples/sec Loss 7.6150 LearningRate 0.0622 Epoch: 4 Global Step: 70550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:08,312-Speed 9122.10 samples/sec Loss 7.6652 LearningRate 0.0622 Epoch: 4 Global Step: 70560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:09,415-Speed 9290.68 samples/sec Loss 7.6579 LearningRate 0.0622 Epoch: 4 Global Step: 70570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:10,501-Speed 9435.72 samples/sec Loss 7.5890 LearningRate 0.0622 Epoch: 4 Global Step: 70580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:11,555-Speed 9724.53 samples/sec Loss 7.6908 LearningRate 0.0622 Epoch: 4 Global Step: 70590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:12,631-Speed 9522.01 samples/sec Loss 7.6703 LearningRate 0.0622 Epoch: 4 Global Step: 70600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:13,694-Speed 9640.63 samples/sec Loss 7.7092 LearningRate 0.0622 Epoch: 4 Global Step: 70610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:14,777-Speed 9454.94 samples/sec Loss 7.5271 LearningRate 0.0622 Epoch: 4 Global Step: 70620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:15,875-Speed 9337.11 samples/sec Loss 7.6695 LearningRate 0.0622 Epoch: 4 Global Step: 70630 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:16,948-Speed 9544.63 samples/sec Loss 7.6374 LearningRate 0.0622 Epoch: 4 Global Step: 70640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:18,027-Speed 9505.42 samples/sec Loss 7.7834 LearningRate 0.0622 Epoch: 4 Global Step: 70650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:19,130-Speed 9289.27 samples/sec Loss 7.7223 LearningRate 0.0621 Epoch: 4 Global Step: 70660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:20,177-Speed 9784.28 samples/sec Loss 7.7290 LearningRate 0.0621 Epoch: 4 Global Step: 70670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:21,268-Speed 9387.78 samples/sec Loss 7.5355 LearningRate 0.0621 Epoch: 4 Global Step: 70680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:22,331-Speed 9641.68 samples/sec Loss 7.7627 LearningRate 0.0621 Epoch: 4 Global Step: 70690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:23,384-Speed 9735.55 samples/sec Loss 7.6525 LearningRate 0.0621 Epoch: 4 Global Step: 70700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:24,454-Speed 9577.44 samples/sec Loss 7.6717 LearningRate 0.0621 Epoch: 4 Global Step: 70710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:25,546-Speed 9381.22 samples/sec Loss 7.5951 LearningRate 0.0621 Epoch: 4 Global Step: 70720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:26,602-Speed 9694.96 samples/sec Loss 7.7286 LearningRate 0.0621 Epoch: 4 Global Step: 70730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:27,668-Speed 9616.03 samples/sec Loss 7.7204 LearningRate 0.0621 Epoch: 4 Global Step: 70740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:28,736-Speed 9595.65 samples/sec Loss 7.7186 LearningRate 0.0621 Epoch: 4 Global Step: 70750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:29,812-Speed 9520.11 samples/sec Loss 7.7183 LearningRate 0.0621 Epoch: 4 Global Step: 70760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:30,866-Speed 9716.77 samples/sec Loss 7.7084 LearningRate 0.0621 Epoch: 4 Global Step: 70770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:31,976-Speed 9232.05 samples/sec Loss 7.6413 LearningRate 0.0621 Epoch: 4 Global Step: 70780 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:33,062-Speed 9429.82 samples/sec Loss 7.7946 LearningRate 0.0621 Epoch: 4 Global Step: 70790 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:34,166-Speed 9291.34 samples/sec Loss 7.6521 LearningRate 0.0621 Epoch: 4 Global Step: 70800 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:35,274-Speed 9244.98 samples/sec Loss 7.5669 LearningRate 0.0621 Epoch: 4 Global Step: 70810 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:36,364-Speed 9403.02 samples/sec Loss 7.6633 LearningRate 0.0621 Epoch: 4 Global Step: 70820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:37,503-Speed 8995.99 samples/sec Loss 7.7215 LearningRate 0.0621 Epoch: 4 Global Step: 70830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:38,566-Speed 9639.29 samples/sec Loss 7.6435 LearningRate 0.0621 Epoch: 4 Global Step: 70840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:39,662-Speed 9344.18 samples/sec Loss 7.7792 LearningRate 0.0621 Epoch: 4 Global Step: 70850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:40,724-Speed 9651.33 samples/sec Loss 7.7359 LearningRate 0.0621 Epoch: 4 Global Step: 70860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:41,793-Speed 9588.89 samples/sec Loss 7.7621 LearningRate 0.0620 Epoch: 4 Global Step: 70870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:42,891-Speed 9329.18 samples/sec Loss 7.7150 LearningRate 0.0620 Epoch: 4 Global Step: 70880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:43,975-Speed 9456.67 samples/sec Loss 7.6387 LearningRate 0.0620 Epoch: 4 Global Step: 70890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:45,043-Speed 9589.23 samples/sec Loss 7.6145 LearningRate 0.0620 Epoch: 4 Global Step: 70900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:46,103-Speed 9666.93 samples/sec Loss 7.6691 LearningRate 0.0620 Epoch: 4 Global Step: 70910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:47,172-Speed 9586.43 samples/sec Loss 7.6776 LearningRate 0.0620 Epoch: 4 Global Step: 70920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:48,255-Speed 9458.84 samples/sec Loss 7.7032 LearningRate 0.0620 Epoch: 4 Global Step: 70930 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:49,324-Speed 9579.31 samples/sec Loss 7.6760 LearningRate 0.0620 Epoch: 4 Global Step: 70940 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:50,412-Speed 9423.28 samples/sec Loss 7.7294 LearningRate 0.0620 Epoch: 4 Global Step: 70950 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:51,491-Speed 9490.19 samples/sec Loss 7.6667 LearningRate 0.0620 Epoch: 4 Global Step: 70960 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:52,588-Speed 9348.24 samples/sec Loss 7.7154 LearningRate 0.0620 Epoch: 4 Global Step: 70970 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:53,688-Speed 9314.30 samples/sec Loss 7.6681 LearningRate 0.0620 Epoch: 4 Global Step: 70980 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:23:54,762-Speed 9543.20 samples/sec Loss 7.6511 LearningRate 0.0620 Epoch: 4 Global Step: 70990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:23:55,819-Speed 9690.51 samples/sec Loss 7.6863 LearningRate 0.0620 Epoch: 4 Global Step: 71000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:23:56,867-Speed 9782.68 samples/sec Loss 7.6567 LearningRate 0.0620 Epoch: 4 Global Step: 71010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:23:58,004-Speed 9009.70 samples/sec Loss 7.6948 LearningRate 0.0620 Epoch: 4 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:23:59,087-Speed 9460.76 samples/sec Loss 7.7254 LearningRate 0.0620 Epoch: 4 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:24:00,145-Speed 9677.98 samples/sec Loss 7.6366 LearningRate 0.0620 Epoch: 4 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:24:01,237-Speed 9384.10 samples/sec Loss 7.5290 LearningRate 0.0620 Epoch: 4 Global Step: 71050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:24:02,343-Speed 9268.29 samples/sec Loss 7.6968 LearningRate 0.0620 Epoch: 4 Global Step: 71060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:24:03,408-Speed 9614.98 samples/sec Loss 7.6482 LearningRate 0.0620 Epoch: 4 Global Step: 71070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:24:04,492-Speed 9456.76 samples/sec Loss 7.6522 LearningRate 0.0619 Epoch: 4 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:24:05,619-Speed 9089.55 samples/sec Loss 7.6873 LearningRate 0.0619 Epoch: 4 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:24:06,696-Speed 9514.62 samples/sec Loss 7.6300 LearningRate 0.0619 Epoch: 4 Global Step: 71100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:07,792-Speed 9349.96 samples/sec Loss 7.6604 LearningRate 0.0619 Epoch: 4 Global Step: 71110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:08,900-Speed 9239.64 samples/sec Loss 7.6503 LearningRate 0.0619 Epoch: 4 Global Step: 71120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:09,999-Speed 9323.70 samples/sec Loss 7.6550 LearningRate 0.0619 Epoch: 4 Global Step: 71130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:11,052-Speed 9735.56 samples/sec Loss 7.6706 LearningRate 0.0619 Epoch: 4 Global Step: 71140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:12,139-Speed 9433.23 samples/sec Loss 7.6529 LearningRate 0.0619 Epoch: 4 Global Step: 71150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:13,233-Speed 9361.83 samples/sec Loss 7.6979 LearningRate 0.0619 Epoch: 4 Global Step: 71160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:14,317-Speed 9453.03 samples/sec Loss 7.7015 LearningRate 0.0619 Epoch: 4 Global Step: 71170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:15,390-Speed 9550.48 samples/sec Loss 7.5794 LearningRate 0.0619 Epoch: 4 Global Step: 71180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:16,501-Speed 9217.63 samples/sec Loss 7.6182 LearningRate 0.0619 Epoch: 4 Global Step: 71190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:17,566-Speed 9626.43 samples/sec Loss 7.7532 LearningRate 0.0619 Epoch: 4 Global Step: 71200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:18,618-Speed 9732.10 samples/sec Loss 7.7310 LearningRate 0.0619 Epoch: 4 Global Step: 71210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:19,692-Speed 9546.69 samples/sec Loss 7.6483 LearningRate 0.0619 Epoch: 4 Global Step: 71220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:20,800-Speed 9238.61 samples/sec Loss 7.6513 LearningRate 0.0619 Epoch: 4 Global Step: 71230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:21,889-Speed 9415.88 samples/sec Loss 7.6608 LearningRate 0.0619 Epoch: 4 Global Step: 71240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:23,008-Speed 9149.68 samples/sec Loss 7.6373 LearningRate 0.0619 Epoch: 4 Global Step: 71250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:24,134-Speed 9110.00 samples/sec Loss 7.5746 LearningRate 0.0619 Epoch: 4 Global Step: 71260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:25,215-Speed 9472.36 samples/sec Loss 7.6750 LearningRate 0.0619 Epoch: 4 Global Step: 71270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:26,298-Speed 9462.68 samples/sec Loss 7.6608 LearningRate 0.0619 Epoch: 4 Global Step: 71280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:27,415-Speed 9175.50 samples/sec Loss 7.6530 LearningRate 0.0618 Epoch: 4 Global Step: 71290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:28,543-Speed 9090.68 samples/sec Loss 7.7215 LearningRate 0.0618 Epoch: 4 Global Step: 71300 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:24:29,655-Speed 9218.95 samples/sec Loss 7.6042 LearningRate 0.0618 Epoch: 4 Global Step: 71310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:30,759-Speed 9279.54 samples/sec Loss 7.5946 LearningRate 0.0618 Epoch: 4 Global Step: 71320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:31,853-Speed 9369.01 samples/sec Loss 7.6070 LearningRate 0.0618 Epoch: 4 Global Step: 71330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:32,936-Speed 9453.91 samples/sec Loss 7.6135 LearningRate 0.0618 Epoch: 4 Global Step: 71340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:34,085-Speed 8921.76 samples/sec Loss 7.7572 LearningRate 0.0618 Epoch: 4 Global Step: 71350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:35,198-Speed 9205.51 samples/sec Loss 7.6638 LearningRate 0.0618 Epoch: 4 Global Step: 71360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:36,297-Speed 9326.80 samples/sec Loss 7.6283 LearningRate 0.0618 Epoch: 4 Global Step: 71370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:37,361-Speed 9626.05 samples/sec Loss 7.7290 LearningRate 0.0618 Epoch: 4 Global Step: 71380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:38,406-Speed 9800.19 samples/sec Loss 7.6450 LearningRate 0.0618 Epoch: 4 Global Step: 71390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:39,483-Speed 9516.40 samples/sec Loss 7.7314 LearningRate 0.0618 Epoch: 4 Global Step: 71400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:40,606-Speed 9125.46 samples/sec Loss 7.5976 LearningRate 0.0618 Epoch: 4 Global Step: 71410 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:24:41,690-Speed 9449.80 samples/sec Loss 7.6678 LearningRate 0.0618 Epoch: 4 Global Step: 71420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:42,786-Speed 9363.07 samples/sec Loss 7.7023 LearningRate 0.0618 Epoch: 4 Global Step: 71430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:43,860-Speed 9539.15 samples/sec Loss 7.7109 LearningRate 0.0618 Epoch: 4 Global Step: 71440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:44,889-Speed 9950.90 samples/sec Loss 7.7611 LearningRate 0.0618 Epoch: 4 Global Step: 71450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:45,923-Speed 9912.42 samples/sec Loss 7.6449 LearningRate 0.0618 Epoch: 4 Global Step: 71460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:46,982-Speed 9671.54 samples/sec Loss 7.5559 LearningRate 0.0618 Epoch: 4 Global Step: 71470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:48,035-Speed 9731.63 samples/sec Loss 7.6977 LearningRate 0.0618 Epoch: 4 Global Step: 71480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:49,072-Speed 9886.86 samples/sec Loss 7.7235 LearningRate 0.0618 Epoch: 4 Global Step: 71490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:50,139-Speed 9598.19 samples/sec Loss 7.7156 LearningRate 0.0618 Epoch: 4 Global Step: 71500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:51,227-Speed 9419.44 samples/sec Loss 7.6688 LearningRate 0.0617 Epoch: 4 Global Step: 71510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:52,283-Speed 9698.22 samples/sec Loss 7.6616 LearningRate 0.0617 Epoch: 4 Global Step: 71520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:24:53,383-Speed 9321.94 samples/sec Loss 7.6553 LearningRate 0.0617 Epoch: 4 Global Step: 71530 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:24:54,449-Speed 9611.82 samples/sec Loss 7.7888 LearningRate 0.0617 Epoch: 4 Global Step: 71540 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:24:55,548-Speed 9317.02 samples/sec Loss 7.6183 LearningRate 0.0617 Epoch: 4 Global Step: 71550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:56,629-Speed 9485.01 samples/sec Loss 7.7191 LearningRate 0.0617 Epoch: 4 Global Step: 71560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:57,724-Speed 9350.25 samples/sec Loss 7.6636 LearningRate 0.0617 Epoch: 4 Global Step: 71570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:58,777-Speed 9735.97 samples/sec Loss 7.6802 LearningRate 0.0617 Epoch: 4 Global Step: 71580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:24:59,858-Speed 9477.84 samples/sec Loss 7.6208 LearningRate 0.0617 Epoch: 4 Global Step: 71590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:25:00,902-Speed 9814.99 samples/sec Loss 7.6694 LearningRate 0.0617 Epoch: 4 Global Step: 71600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:25:01,988-Speed 9430.26 samples/sec Loss 7.7439 LearningRate 0.0617 Epoch: 4 Global Step: 71610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:25:03,063-Speed 9529.17 samples/sec Loss 7.7816 LearningRate 0.0617 Epoch: 4 Global Step: 71620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:25:04,136-Speed 9556.50 samples/sec Loss 7.6320 LearningRate 0.0617 Epoch: 4 Global Step: 71630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:25:05,196-Speed 9663.04 samples/sec Loss 7.7095 LearningRate 0.0617 Epoch: 4 Global Step: 71640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:25:06,259-Speed 9638.55 samples/sec Loss 7.7807 LearningRate 0.0617 Epoch: 4 Global Step: 71650 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:25:07,395-Speed 9022.31 samples/sec Loss 7.7595 LearningRate 0.0617 Epoch: 4 Global Step: 71660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:25:08,489-Speed 9373.13 samples/sec Loss 7.7536 LearningRate 0.0617 Epoch: 4 Global Step: 71670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:25:09,555-Speed 9613.53 samples/sec Loss 7.7601 LearningRate 0.0617 Epoch: 4 Global Step: 71680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:25:10,640-Speed 9439.77 samples/sec Loss 7.7891 LearningRate 0.0617 Epoch: 4 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:25:11,735-Speed 9357.32 samples/sec Loss 7.6829 LearningRate 0.0617 Epoch: 4 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:25:12,833-Speed 9329.73 samples/sec Loss 7.6498 LearningRate 0.0617 Epoch: 4 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:25:13,953-Speed 9151.89 samples/sec Loss 7.7418 LearningRate 0.0616 Epoch: 4 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 14:25:15,040-Speed 9427.67 samples/sec Loss 7.6047 LearningRate 0.0616 Epoch: 4 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:16,122-Speed 9465.18 samples/sec Loss 7.6466 LearningRate 0.0616 Epoch: 4 Global Step: 71740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:17,204-Speed 9470.55 samples/sec Loss 7.7062 LearningRate 0.0616 Epoch: 4 Global Step: 71750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:18,297-Speed 9375.37 samples/sec Loss 7.7431 LearningRate 0.0616 Epoch: 4 Global Step: 71760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:19,358-Speed 9660.09 samples/sec Loss 7.6771 LearningRate 0.0616 Epoch: 4 Global Step: 71770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:20,431-Speed 9543.38 samples/sec Loss 7.6095 LearningRate 0.0616 Epoch: 4 Global Step: 71780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:21,530-Speed 9321.44 samples/sec Loss 7.7624 LearningRate 0.0616 Epoch: 4 Global Step: 71790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:22,623-Speed 9377.42 samples/sec Loss 7.6001 LearningRate 0.0616 Epoch: 4 Global Step: 71800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:23,716-Speed 9373.82 samples/sec Loss 7.7355 LearningRate 0.0616 Epoch: 4 Global Step: 71810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:24,802-Speed 9429.46 samples/sec Loss 7.6223 LearningRate 0.0616 Epoch: 4 Global Step: 71820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:25,938-Speed 9022.52 samples/sec Loss 7.7032 LearningRate 0.0616 Epoch: 4 Global Step: 71830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:27,009-Speed 9568.86 samples/sec Loss 7.7544 LearningRate 0.0616 Epoch: 4 Global Step: 71840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:28,042-Speed 9921.56 samples/sec Loss 7.6108 LearningRate 0.0616 Epoch: 4 Global Step: 71850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:29,100-Speed 9681.70 samples/sec Loss 7.7560 LearningRate 0.0616 Epoch: 4 Global Step: 71860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:30,145-Speed 9808.45 samples/sec Loss 7.7065 LearningRate 0.0616 Epoch: 4 Global Step: 71870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:31,266-Speed 9147.20 samples/sec Loss 7.6847 LearningRate 0.0616 Epoch: 4 Global Step: 71880 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:25:32,360-Speed 9360.15 samples/sec Loss 7.7373 LearningRate 0.0616 Epoch: 4 Global Step: 71890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:25:33,409-Speed 9768.88 samples/sec Loss 7.6493 LearningRate 0.0616 Epoch: 4 Global Step: 71900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:34,486-Speed 9516.33 samples/sec Loss 7.8857 LearningRate 0.0616 Epoch: 4 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:35,584-Speed 9328.39 samples/sec Loss 7.7622 LearningRate 0.0616 Epoch: 4 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:36,648-Speed 9632.04 samples/sec Loss 7.7293 LearningRate 0.0615 Epoch: 4 Global Step: 71930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:37,722-Speed 9543.57 samples/sec Loss 7.6645 LearningRate 0.0615 Epoch: 4 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:38,770-Speed 9777.08 samples/sec Loss 7.7445 LearningRate 0.0615 Epoch: 4 Global Step: 71950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:39,812-Speed 9831.37 samples/sec Loss 7.7587 LearningRate 0.0615 Epoch: 4 Global Step: 71960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:40,881-Speed 9581.78 samples/sec Loss 7.7119 LearningRate 0.0615 Epoch: 4 Global Step: 71970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:41,961-Speed 9489.02 samples/sec Loss 7.6899 LearningRate 0.0615 Epoch: 4 Global Step: 71980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:43,037-Speed 9522.09 samples/sec Loss 7.7383 LearningRate 0.0615 Epoch: 4 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:25:44,132-Speed 9360.48 samples/sec Loss 7.7833 LearningRate 0.0615 Epoch: 4 Global Step: 72000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:26:06,353-[lfw][72000]XNorm: 12.254861 Training: 2022-04-11 14:26:06,353-[lfw][72000]Accuracy-Flip: 0.99550+-0.00279 Training: 2022-04-11 14:26:06,354-[lfw][72000]Accuracy-Highest: 0.99583 Training: 2022-04-11 14:26:31,903-[cfp_fp][72000]XNorm: 10.391574 Training: 2022-04-11 14:26:31,904-[cfp_fp][72000]Accuracy-Flip: 0.94943+-0.01196 Training: 2022-04-11 14:26:31,904-[cfp_fp][72000]Accuracy-Highest: 0.95171 Training: 2022-04-11 14:26:53,978-[agedb_30][72000]XNorm: 11.810187 Training: 2022-04-11 14:26:53,978-[agedb_30][72000]Accuracy-Flip: 0.95900+-0.00867 Training: 2022-04-11 14:26:53,979-[agedb_30][72000]Accuracy-Highest: 0.96033 Training: 2022-04-11 14:26:55,068-Speed 144.36 samples/sec Loss 7.6925 LearningRate 0.0615 Epoch: 4 Global Step: 72010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:26:56,125-Speed 9687.95 samples/sec Loss 7.7176 LearningRate 0.0615 Epoch: 4 Global Step: 72020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:26:57,179-Speed 9720.98 samples/sec Loss 7.6447 LearningRate 0.0615 Epoch: 4 Global Step: 72030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:26:58,264-Speed 9442.91 samples/sec Loss 7.7842 LearningRate 0.0615 Epoch: 4 Global Step: 72040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:26:59,354-Speed 9403.19 samples/sec Loss 7.7173 LearningRate 0.0615 Epoch: 4 Global Step: 72050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:00,455-Speed 9304.56 samples/sec Loss 7.8604 LearningRate 0.0615 Epoch: 4 Global Step: 72060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:01,534-Speed 9505.46 samples/sec Loss 7.6283 LearningRate 0.0615 Epoch: 4 Global Step: 72070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:02,575-Speed 9845.13 samples/sec Loss 7.6656 LearningRate 0.0615 Epoch: 4 Global Step: 72080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:03,653-Speed 9501.66 samples/sec Loss 7.7763 LearningRate 0.0615 Epoch: 4 Global Step: 72090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:04,724-Speed 9570.97 samples/sec Loss 7.7590 LearningRate 0.0615 Epoch: 4 Global Step: 72100 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:27:05,790-Speed 9610.04 samples/sec Loss 7.7062 LearningRate 0.0615 Epoch: 4 Global Step: 72110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:06,858-Speed 9590.82 samples/sec Loss 7.7312 LearningRate 0.0615 Epoch: 4 Global Step: 72120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:07,919-Speed 9659.47 samples/sec Loss 7.7742 LearningRate 0.0615 Epoch: 4 Global Step: 72130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:08,988-Speed 9578.06 samples/sec Loss 7.7106 LearningRate 0.0614 Epoch: 4 Global Step: 72140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:10,056-Speed 9596.50 samples/sec Loss 7.7237 LearningRate 0.0614 Epoch: 4 Global Step: 72150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:11,155-Speed 9322.80 samples/sec Loss 7.7375 LearningRate 0.0614 Epoch: 4 Global Step: 72160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:12,268-Speed 9209.48 samples/sec Loss 7.6695 LearningRate 0.0614 Epoch: 4 Global Step: 72170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:13,339-Speed 9568.02 samples/sec Loss 7.8288 LearningRate 0.0614 Epoch: 4 Global Step: 72180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:14,383-Speed 9811.12 samples/sec Loss 7.7334 LearningRate 0.0614 Epoch: 4 Global Step: 72190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:15,439-Speed 9702.23 samples/sec Loss 7.7380 LearningRate 0.0614 Epoch: 4 Global Step: 72200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:16,535-Speed 9350.75 samples/sec Loss 7.6980 LearningRate 0.0614 Epoch: 4 Global Step: 72210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:17,627-Speed 9383.36 samples/sec Loss 7.6565 LearningRate 0.0614 Epoch: 4 Global Step: 72220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:18,697-Speed 9576.29 samples/sec Loss 7.6845 LearningRate 0.0614 Epoch: 4 Global Step: 72230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:19,796-Speed 9323.63 samples/sec Loss 7.7047 LearningRate 0.0614 Epoch: 4 Global Step: 72240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:20,890-Speed 9371.79 samples/sec Loss 7.7286 LearningRate 0.0614 Epoch: 4 Global Step: 72250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:22,020-Speed 9063.21 samples/sec Loss 7.6936 LearningRate 0.0614 Epoch: 4 Global Step: 72260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:23,124-Speed 9283.80 samples/sec Loss 7.7752 LearningRate 0.0614 Epoch: 4 Global Step: 72270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:24,183-Speed 9673.55 samples/sec Loss 7.7234 LearningRate 0.0614 Epoch: 4 Global Step: 72280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:25,258-Speed 9524.50 samples/sec Loss 7.7919 LearningRate 0.0614 Epoch: 4 Global Step: 72290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:26,353-Speed 9361.48 samples/sec Loss 7.5490 LearningRate 0.0614 Epoch: 4 Global Step: 72300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:27,454-Speed 9305.25 samples/sec Loss 7.5429 LearningRate 0.0614 Epoch: 4 Global Step: 72310 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:27:28,529-Speed 9533.45 samples/sec Loss 7.6734 LearningRate 0.0614 Epoch: 4 Global Step: 72320 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:27:29,624-Speed 9362.00 samples/sec Loss 7.6521 LearningRate 0.0614 Epoch: 4 Global Step: 72330 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:27:30,739-Speed 9189.24 samples/sec Loss 7.7338 LearningRate 0.0614 Epoch: 4 Global Step: 72340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:31,812-Speed 9544.94 samples/sec Loss 7.7163 LearningRate 0.0614 Epoch: 4 Global Step: 72350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:32,869-Speed 9693.97 samples/sec Loss 7.5407 LearningRate 0.0613 Epoch: 4 Global Step: 72360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:33,922-Speed 9735.87 samples/sec Loss 7.8638 LearningRate 0.0613 Epoch: 4 Global Step: 72370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:35,041-Speed 9150.41 samples/sec Loss 7.6920 LearningRate 0.0613 Epoch: 4 Global Step: 72380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:36,126-Speed 9444.22 samples/sec Loss 7.7430 LearningRate 0.0613 Epoch: 4 Global Step: 72390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:37,234-Speed 9249.63 samples/sec Loss 7.7236 LearningRate 0.0613 Epoch: 4 Global Step: 72400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:38,315-Speed 9481.14 samples/sec Loss 7.5874 LearningRate 0.0613 Epoch: 4 Global Step: 72410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:39,425-Speed 9229.67 samples/sec Loss 7.6475 LearningRate 0.0613 Epoch: 4 Global Step: 72420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:40,566-Speed 8975.04 samples/sec Loss 7.8037 LearningRate 0.0613 Epoch: 4 Global Step: 72430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:41,641-Speed 9528.74 samples/sec Loss 7.6763 LearningRate 0.0613 Epoch: 4 Global Step: 72440 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:27:42,706-Speed 9626.85 samples/sec Loss 7.7216 LearningRate 0.0613 Epoch: 4 Global Step: 72450 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 14:27:43,776-Speed 9569.47 samples/sec Loss 7.6346 LearningRate 0.0613 Epoch: 4 Global Step: 72460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:44,887-Speed 9228.20 samples/sec Loss 7.7020 LearningRate 0.0613 Epoch: 4 Global Step: 72470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:45,963-Speed 9522.40 samples/sec Loss 7.7713 LearningRate 0.0613 Epoch: 4 Global Step: 72480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:47,048-Speed 9446.74 samples/sec Loss 7.7897 LearningRate 0.0613 Epoch: 4 Global Step: 72490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:48,167-Speed 9149.76 samples/sec Loss 7.7014 LearningRate 0.0613 Epoch: 4 Global Step: 72500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:49,252-Speed 9450.96 samples/sec Loss 7.7439 LearningRate 0.0613 Epoch: 4 Global Step: 72510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:50,377-Speed 9108.00 samples/sec Loss 7.6534 LearningRate 0.0613 Epoch: 4 Global Step: 72520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:51,444-Speed 9609.21 samples/sec Loss 7.6724 LearningRate 0.0613 Epoch: 4 Global Step: 72530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:52,535-Speed 9386.53 samples/sec Loss 7.7020 LearningRate 0.0613 Epoch: 4 Global Step: 72540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:53,573-Speed 9878.26 samples/sec Loss 7.5872 LearningRate 0.0613 Epoch: 4 Global Step: 72550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:54,635-Speed 9640.28 samples/sec Loss 7.7825 LearningRate 0.0613 Epoch: 4 Global Step: 72560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:55,727-Speed 9390.58 samples/sec Loss 7.7078 LearningRate 0.0612 Epoch: 4 Global Step: 72570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:56,788-Speed 9657.13 samples/sec Loss 7.6699 LearningRate 0.0612 Epoch: 4 Global Step: 72580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:57,874-Speed 9433.60 samples/sec Loss 7.6908 LearningRate 0.0612 Epoch: 4 Global Step: 72590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:27:59,014-Speed 8984.19 samples/sec Loss 7.7327 LearningRate 0.0612 Epoch: 4 Global Step: 72600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:28:00,090-Speed 9523.20 samples/sec Loss 7.6681 LearningRate 0.0612 Epoch: 4 Global Step: 72610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:28:01,180-Speed 9397.36 samples/sec Loss 7.7447 LearningRate 0.0612 Epoch: 4 Global Step: 72620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:28:02,274-Speed 9364.85 samples/sec Loss 7.7184 LearningRate 0.0612 Epoch: 4 Global Step: 72630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:28:03,388-Speed 9199.73 samples/sec Loss 7.7549 LearningRate 0.0612 Epoch: 4 Global Step: 72640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 14:28:04,488-Speed 9313.13 samples/sec Loss 7.6965 LearningRate 0.0612 Epoch: 4 Global Step: 72650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:05,622-Speed 9040.80 samples/sec Loss 7.7118 LearningRate 0.0612 Epoch: 4 Global Step: 72660 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:28:06,692-Speed 9572.51 samples/sec Loss 7.7246 LearningRate 0.0612 Epoch: 4 Global Step: 72670 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:28:07,806-Speed 9203.41 samples/sec Loss 7.6653 LearningRate 0.0612 Epoch: 4 Global Step: 72680 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:28:08,867-Speed 9653.74 samples/sec Loss 7.7023 LearningRate 0.0612 Epoch: 4 Global Step: 72690 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:28:09,946-Speed 9502.38 samples/sec Loss 7.7217 LearningRate 0.0612 Epoch: 4 Global Step: 72700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:11,039-Speed 9371.04 samples/sec Loss 7.7522 LearningRate 0.0612 Epoch: 4 Global Step: 72710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:12,120-Speed 9476.53 samples/sec Loss 7.7470 LearningRate 0.0612 Epoch: 4 Global Step: 72720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:13,202-Speed 9469.70 samples/sec Loss 7.6959 LearningRate 0.0612 Epoch: 4 Global Step: 72730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:14,310-Speed 9249.06 samples/sec Loss 7.8107 LearningRate 0.0612 Epoch: 4 Global Step: 72740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:15,383-Speed 9549.44 samples/sec Loss 7.7913 LearningRate 0.0612 Epoch: 4 Global Step: 72750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:16,486-Speed 9285.42 samples/sec Loss 7.7274 LearningRate 0.0612 Epoch: 4 Global Step: 72760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:17,576-Speed 9405.48 samples/sec Loss 7.6315 LearningRate 0.0612 Epoch: 4 Global Step: 72770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:18,648-Speed 9549.16 samples/sec Loss 7.6407 LearningRate 0.0611 Epoch: 4 Global Step: 72780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:19,732-Speed 9463.87 samples/sec Loss 7.6696 LearningRate 0.0611 Epoch: 4 Global Step: 72790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:20,788-Speed 9694.34 samples/sec Loss 7.5945 LearningRate 0.0611 Epoch: 4 Global Step: 72800 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:28:21,877-Speed 9409.85 samples/sec Loss 7.7845 LearningRate 0.0611 Epoch: 4 Global Step: 72810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:22,997-Speed 9146.35 samples/sec Loss 7.7850 LearningRate 0.0611 Epoch: 4 Global Step: 72820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:24,141-Speed 8958.24 samples/sec Loss 7.6653 LearningRate 0.0611 Epoch: 4 Global Step: 72830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:25,241-Speed 9315.29 samples/sec Loss 7.5994 LearningRate 0.0611 Epoch: 4 Global Step: 72840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:26,338-Speed 9338.51 samples/sec Loss 7.7591 LearningRate 0.0611 Epoch: 4 Global Step: 72850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:27,421-Speed 9463.23 samples/sec Loss 7.7713 LearningRate 0.0611 Epoch: 4 Global Step: 72860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:28,451-Speed 9956.18 samples/sec Loss 7.7262 LearningRate 0.0611 Epoch: 4 Global Step: 72870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:29,480-Speed 9953.33 samples/sec Loss 7.6434 LearningRate 0.0611 Epoch: 4 Global Step: 72880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:30,560-Speed 9489.80 samples/sec Loss 7.7082 LearningRate 0.0611 Epoch: 4 Global Step: 72890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:31,623-Speed 9637.61 samples/sec Loss 7.8148 LearningRate 0.0611 Epoch: 4 Global Step: 72900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:32,709-Speed 9433.78 samples/sec Loss 7.7894 LearningRate 0.0611 Epoch: 4 Global Step: 72910 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:28:33,817-Speed 9244.47 samples/sec Loss 7.7318 LearningRate 0.0611 Epoch: 4 Global Step: 72920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:34,954-Speed 9012.89 samples/sec Loss 7.7790 LearningRate 0.0611 Epoch: 4 Global Step: 72930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:36,005-Speed 9745.23 samples/sec Loss 7.6325 LearningRate 0.0611 Epoch: 4 Global Step: 72940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:37,103-Speed 9331.47 samples/sec Loss 7.6064 LearningRate 0.0611 Epoch: 4 Global Step: 72950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:38,238-Speed 9028.67 samples/sec Loss 7.7517 LearningRate 0.0611 Epoch: 4 Global Step: 72960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:39,289-Speed 9751.20 samples/sec Loss 7.6306 LearningRate 0.0611 Epoch: 4 Global Step: 72970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:40,410-Speed 9144.45 samples/sec Loss 7.5700 LearningRate 0.0611 Epoch: 4 Global Step: 72980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:41,538-Speed 9075.55 samples/sec Loss 7.9058 LearningRate 0.0611 Epoch: 4 Global Step: 72990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:42,620-Speed 9473.67 samples/sec Loss 7.7986 LearningRate 0.0610 Epoch: 4 Global Step: 73000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:43,700-Speed 9486.68 samples/sec Loss 7.7305 LearningRate 0.0610 Epoch: 4 Global Step: 73010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:44,770-Speed 9574.31 samples/sec Loss 7.6833 LearningRate 0.0610 Epoch: 4 Global Step: 73020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:45,862-Speed 9391.32 samples/sec Loss 7.5371 LearningRate 0.0610 Epoch: 4 Global Step: 73030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:28:46,943-Speed 9473.38 samples/sec Loss 7.5939 LearningRate 0.0610 Epoch: 4 Global Step: 73040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:28:47,992-Speed 9772.81 samples/sec Loss 7.7248 LearningRate 0.0610 Epoch: 4 Global Step: 73050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:28:49,068-Speed 9526.85 samples/sec Loss 7.7862 LearningRate 0.0610 Epoch: 4 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:28:50,153-Speed 9441.98 samples/sec Loss 7.7330 LearningRate 0.0610 Epoch: 4 Global Step: 73070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:28:51,194-Speed 9837.88 samples/sec Loss 7.5012 LearningRate 0.0610 Epoch: 4 Global Step: 73080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:28:52,253-Speed 9673.75 samples/sec Loss 7.6964 LearningRate 0.0610 Epoch: 4 Global Step: 73090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:28:53,352-Speed 9327.29 samples/sec Loss 7.6438 LearningRate 0.0610 Epoch: 4 Global Step: 73100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:28:54,397-Speed 9804.18 samples/sec Loss 7.6869 LearningRate 0.0610 Epoch: 4 Global Step: 73110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:28:55,527-Speed 9071.89 samples/sec Loss 7.8086 LearningRate 0.0610 Epoch: 4 Global Step: 73120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:28:56,620-Speed 9373.01 samples/sec Loss 7.7271 LearningRate 0.0610 Epoch: 4 Global Step: 73130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:57,688-Speed 9593.50 samples/sec Loss 7.6008 LearningRate 0.0610 Epoch: 4 Global Step: 73140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:58,760-Speed 9557.45 samples/sec Loss 7.6304 LearningRate 0.0610 Epoch: 4 Global Step: 73150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:28:59,864-Speed 9278.91 samples/sec Loss 7.6955 LearningRate 0.0610 Epoch: 4 Global Step: 73160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:00,966-Speed 9299.68 samples/sec Loss 7.7380 LearningRate 0.0610 Epoch: 4 Global Step: 73170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:02,074-Speed 9245.28 samples/sec Loss 7.8284 LearningRate 0.0610 Epoch: 4 Global Step: 73180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:03,190-Speed 9175.17 samples/sec Loss 7.7570 LearningRate 0.0610 Epoch: 4 Global Step: 73190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:04,276-Speed 9438.27 samples/sec Loss 7.7772 LearningRate 0.0610 Epoch: 4 Global Step: 73200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:05,387-Speed 9223.49 samples/sec Loss 7.6842 LearningRate 0.0609 Epoch: 4 Global Step: 73210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:06,463-Speed 9525.68 samples/sec Loss 7.6894 LearningRate 0.0609 Epoch: 4 Global Step: 73220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:07,598-Speed 9030.23 samples/sec Loss 7.7351 LearningRate 0.0609 Epoch: 4 Global Step: 73230 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:29:08,676-Speed 9501.26 samples/sec Loss 7.7049 LearningRate 0.0609 Epoch: 4 Global Step: 73240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:09,781-Speed 9269.87 samples/sec Loss 7.8200 LearningRate 0.0609 Epoch: 4 Global Step: 73250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:10,874-Speed 9374.82 samples/sec Loss 7.6273 LearningRate 0.0609 Epoch: 4 Global Step: 73260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:11,956-Speed 9469.71 samples/sec Loss 7.7312 LearningRate 0.0609 Epoch: 4 Global Step: 73270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:13,120-Speed 8800.06 samples/sec Loss 7.5612 LearningRate 0.0609 Epoch: 4 Global Step: 73280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:14,228-Speed 9246.47 samples/sec Loss 7.5990 LearningRate 0.0609 Epoch: 4 Global Step: 73290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:15,332-Speed 9287.17 samples/sec Loss 7.8705 LearningRate 0.0609 Epoch: 4 Global Step: 73300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:16,426-Speed 9360.11 samples/sec Loss 7.6854 LearningRate 0.0609 Epoch: 4 Global Step: 73310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:17,580-Speed 8877.18 samples/sec Loss 7.8262 LearningRate 0.0609 Epoch: 4 Global Step: 73320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:18,675-Speed 9361.88 samples/sec Loss 7.7480 LearningRate 0.0609 Epoch: 4 Global Step: 73330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:19,756-Speed 9480.82 samples/sec Loss 7.8457 LearningRate 0.0609 Epoch: 4 Global Step: 73340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:20,877-Speed 9138.64 samples/sec Loss 7.6272 LearningRate 0.0609 Epoch: 4 Global Step: 73350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:21,967-Speed 9397.32 samples/sec Loss 7.8175 LearningRate 0.0609 Epoch: 4 Global Step: 73360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:23,079-Speed 9218.44 samples/sec Loss 7.7287 LearningRate 0.0609 Epoch: 4 Global Step: 73370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:24,164-Speed 9439.96 samples/sec Loss 7.6978 LearningRate 0.0609 Epoch: 4 Global Step: 73380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:25,235-Speed 9568.86 samples/sec Loss 7.6075 LearningRate 0.0609 Epoch: 4 Global Step: 73390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:26,319-Speed 9451.29 samples/sec Loss 7.6173 LearningRate 0.0609 Epoch: 4 Global Step: 73400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:27,375-Speed 9705.05 samples/sec Loss 7.6824 LearningRate 0.0609 Epoch: 4 Global Step: 73410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:28,464-Speed 9405.40 samples/sec Loss 7.7179 LearningRate 0.0608 Epoch: 4 Global Step: 73420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:29,546-Speed 9467.81 samples/sec Loss 7.6892 LearningRate 0.0608 Epoch: 4 Global Step: 73430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:30,616-Speed 9578.56 samples/sec Loss 7.6587 LearningRate 0.0608 Epoch: 4 Global Step: 73440 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:29:31,694-Speed 9503.63 samples/sec Loss 7.6725 LearningRate 0.0608 Epoch: 4 Global Step: 73450 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:29:32,786-Speed 9382.82 samples/sec Loss 7.7089 LearningRate 0.0608 Epoch: 4 Global Step: 73460 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:29:33,890-Speed 9275.96 samples/sec Loss 7.7386 LearningRate 0.0608 Epoch: 4 Global Step: 73470 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:29:34,990-Speed 9321.20 samples/sec Loss 7.7688 LearningRate 0.0608 Epoch: 4 Global Step: 73480 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:29:36,072-Speed 9464.32 samples/sec Loss 7.6988 LearningRate 0.0608 Epoch: 4 Global Step: 73490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:37,174-Speed 9302.10 samples/sec Loss 7.7810 LearningRate 0.0608 Epoch: 4 Global Step: 73500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:38,266-Speed 9376.24 samples/sec Loss 7.7324 LearningRate 0.0608 Epoch: 4 Global Step: 73510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:39,375-Speed 9241.99 samples/sec Loss 7.6737 LearningRate 0.0608 Epoch: 4 Global Step: 73520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:40,506-Speed 9062.37 samples/sec Loss 7.7418 LearningRate 0.0608 Epoch: 4 Global Step: 73530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:41,604-Speed 9332.86 samples/sec Loss 7.6988 LearningRate 0.0608 Epoch: 4 Global Step: 73540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:42,720-Speed 9178.73 samples/sec Loss 7.6309 LearningRate 0.0608 Epoch: 4 Global Step: 73550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:43,784-Speed 9631.46 samples/sec Loss 7.7244 LearningRate 0.0608 Epoch: 4 Global Step: 73560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:44,872-Speed 9418.07 samples/sec Loss 7.7401 LearningRate 0.0608 Epoch: 4 Global Step: 73570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:45,930-Speed 9680.37 samples/sec Loss 7.7200 LearningRate 0.0608 Epoch: 4 Global Step: 73580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:47,009-Speed 9497.03 samples/sec Loss 7.7319 LearningRate 0.0608 Epoch: 4 Global Step: 73590 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:29:48,075-Speed 9609.83 samples/sec Loss 7.6727 LearningRate 0.0608 Epoch: 4 Global Step: 73600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:49,151-Speed 9531.01 samples/sec Loss 7.7424 LearningRate 0.0608 Epoch: 4 Global Step: 73610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:50,204-Speed 9724.37 samples/sec Loss 7.7768 LearningRate 0.0608 Epoch: 4 Global Step: 73620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:51,264-Speed 9664.30 samples/sec Loss 7.7218 LearningRate 0.0608 Epoch: 4 Global Step: 73630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:52,336-Speed 9564.65 samples/sec Loss 7.8009 LearningRate 0.0607 Epoch: 4 Global Step: 73640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:53,436-Speed 9309.77 samples/sec Loss 7.6936 LearningRate 0.0607 Epoch: 4 Global Step: 73650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:54,566-Speed 9070.80 samples/sec Loss 7.8071 LearningRate 0.0607 Epoch: 4 Global Step: 73660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:55,686-Speed 9151.35 samples/sec Loss 7.7436 LearningRate 0.0607 Epoch: 4 Global Step: 73670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:56,769-Speed 9455.58 samples/sec Loss 7.7061 LearningRate 0.0607 Epoch: 4 Global Step: 73680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:57,808-Speed 9864.53 samples/sec Loss 7.6785 LearningRate 0.0607 Epoch: 4 Global Step: 73690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:29:58,855-Speed 9788.83 samples/sec Loss 7.5955 LearningRate 0.0607 Epoch: 4 Global Step: 73700 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:29:59,950-Speed 9356.29 samples/sec Loss 7.7941 LearningRate 0.0607 Epoch: 4 Global Step: 73710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:01,027-Speed 9513.88 samples/sec Loss 7.7394 LearningRate 0.0607 Epoch: 4 Global Step: 73720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:02,140-Speed 9206.96 samples/sec Loss 7.6673 LearningRate 0.0607 Epoch: 4 Global Step: 73730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:03,259-Speed 9154.93 samples/sec Loss 7.7148 LearningRate 0.0607 Epoch: 4 Global Step: 73740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:04,313-Speed 9717.44 samples/sec Loss 7.7153 LearningRate 0.0607 Epoch: 4 Global Step: 73750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:05,376-Speed 9637.79 samples/sec Loss 7.7951 LearningRate 0.0607 Epoch: 4 Global Step: 73760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:06,429-Speed 9733.65 samples/sec Loss 7.6885 LearningRate 0.0607 Epoch: 4 Global Step: 73770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:07,467-Speed 9868.42 samples/sec Loss 7.7644 LearningRate 0.0607 Epoch: 4 Global Step: 73780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:08,578-Speed 9225.09 samples/sec Loss 7.6507 LearningRate 0.0607 Epoch: 4 Global Step: 73790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:09,683-Speed 9278.61 samples/sec Loss 7.7640 LearningRate 0.0607 Epoch: 4 Global Step: 73800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:10,792-Speed 9235.84 samples/sec Loss 7.7103 LearningRate 0.0607 Epoch: 4 Global Step: 73810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:11,887-Speed 9357.52 samples/sec Loss 7.6746 LearningRate 0.0607 Epoch: 4 Global Step: 73820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:13,016-Speed 9073.63 samples/sec Loss 7.7038 LearningRate 0.0607 Epoch: 4 Global Step: 73830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:14,167-Speed 8896.22 samples/sec Loss 7.7152 LearningRate 0.0607 Epoch: 4 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:15,258-Speed 9395.57 samples/sec Loss 7.5860 LearningRate 0.0606 Epoch: 4 Global Step: 73850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:16,362-Speed 9281.16 samples/sec Loss 7.6463 LearningRate 0.0606 Epoch: 4 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:17,453-Speed 9393.21 samples/sec Loss 7.7060 LearningRate 0.0606 Epoch: 4 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:18,515-Speed 9641.70 samples/sec Loss 7.6702 LearningRate 0.0606 Epoch: 4 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:19,593-Speed 9512.98 samples/sec Loss 7.7246 LearningRate 0.0606 Epoch: 4 Global Step: 73890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:20,663-Speed 9577.20 samples/sec Loss 7.7398 LearningRate 0.0606 Epoch: 4 Global Step: 73900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:21,758-Speed 9357.88 samples/sec Loss 7.6845 LearningRate 0.0606 Epoch: 4 Global Step: 73910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:22,860-Speed 9298.80 samples/sec Loss 7.6799 LearningRate 0.0606 Epoch: 4 Global Step: 73920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:23,950-Speed 9396.70 samples/sec Loss 7.7193 LearningRate 0.0606 Epoch: 4 Global Step: 73930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:30:25,011-Speed 9659.36 samples/sec Loss 7.7164 LearningRate 0.0606 Epoch: 4 Global Step: 73940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:26,091-Speed 9487.60 samples/sec Loss 7.6826 LearningRate 0.0606 Epoch: 4 Global Step: 73950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:27,159-Speed 9593.85 samples/sec Loss 7.7756 LearningRate 0.0606 Epoch: 4 Global Step: 73960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:28,267-Speed 9246.22 samples/sec Loss 7.5107 LearningRate 0.0606 Epoch: 4 Global Step: 73970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:29,343-Speed 9521.97 samples/sec Loss 7.6888 LearningRate 0.0606 Epoch: 4 Global Step: 73980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:30,417-Speed 9536.33 samples/sec Loss 7.6060 LearningRate 0.0606 Epoch: 4 Global Step: 73990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:31,471-Speed 9724.09 samples/sec Loss 7.7406 LearningRate 0.0606 Epoch: 4 Global Step: 74000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:30:53,242-[lfw][74000]XNorm: 12.354982 Training: 2022-04-11 14:30:53,243-[lfw][74000]Accuracy-Flip: 0.99550+-0.00248 Training: 2022-04-11 14:30:53,244-[lfw][74000]Accuracy-Highest: 0.99583 Training: 2022-04-11 14:31:18,400-[cfp_fp][74000]XNorm: 10.379946 Training: 2022-04-11 14:31:18,401-[cfp_fp][74000]Accuracy-Flip: 0.95143+-0.01268 Training: 2022-04-11 14:31:18,401-[cfp_fp][74000]Accuracy-Highest: 0.95171 Training: 2022-04-11 14:31:40,135-[agedb_30][74000]XNorm: 11.873969 Training: 2022-04-11 14:31:40,136-[agedb_30][74000]Accuracy-Flip: 0.95967+-0.01122 Training: 2022-04-11 14:31:40,137-[agedb_30][74000]Accuracy-Highest: 0.96033 Training: 2022-04-11 14:31:41,212-Speed 146.83 samples/sec Loss 7.6394 LearningRate 0.0606 Epoch: 4 Global Step: 74010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:31:42,317-Speed 9277.22 samples/sec Loss 7.6369 LearningRate 0.0606 Epoch: 4 Global Step: 74020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:31:43,401-Speed 9457.38 samples/sec Loss 7.7098 LearningRate 0.0606 Epoch: 4 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:31:44,509-Speed 9242.35 samples/sec Loss 7.6464 LearningRate 0.0606 Epoch: 4 Global Step: 74040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:45,577-Speed 9591.97 samples/sec Loss 7.8545 LearningRate 0.0606 Epoch: 4 Global Step: 74050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:46,719-Speed 8971.72 samples/sec Loss 7.6583 LearningRate 0.0606 Epoch: 4 Global Step: 74060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:47,817-Speed 9330.10 samples/sec Loss 7.6999 LearningRate 0.0605 Epoch: 4 Global Step: 74070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:48,912-Speed 9354.79 samples/sec Loss 7.7060 LearningRate 0.0605 Epoch: 4 Global Step: 74080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:50,029-Speed 9176.46 samples/sec Loss 7.7220 LearningRate 0.0605 Epoch: 4 Global Step: 74090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:51,081-Speed 9742.44 samples/sec Loss 7.7152 LearningRate 0.0605 Epoch: 4 Global Step: 74100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:52,154-Speed 9549.45 samples/sec Loss 7.6948 LearningRate 0.0605 Epoch: 4 Global Step: 74110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:53,230-Speed 9524.92 samples/sec Loss 7.7888 LearningRate 0.0605 Epoch: 4 Global Step: 74120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:54,303-Speed 9545.98 samples/sec Loss 7.7703 LearningRate 0.0605 Epoch: 4 Global Step: 74130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:55,405-Speed 9296.44 samples/sec Loss 7.7252 LearningRate 0.0605 Epoch: 4 Global Step: 74140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:56,512-Speed 9255.89 samples/sec Loss 7.7835 LearningRate 0.0605 Epoch: 4 Global Step: 74150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:31:57,559-Speed 9787.96 samples/sec Loss 7.7161 LearningRate 0.0605 Epoch: 4 Global Step: 74160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:31:58,660-Speed 9304.76 samples/sec Loss 7.6076 LearningRate 0.0605 Epoch: 4 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:31:59,738-Speed 9505.90 samples/sec Loss 7.6798 LearningRate 0.0605 Epoch: 4 Global Step: 74180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:00,785-Speed 9785.57 samples/sec Loss 7.7552 LearningRate 0.0605 Epoch: 4 Global Step: 74190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:01,897-Speed 9215.50 samples/sec Loss 7.8175 LearningRate 0.0605 Epoch: 4 Global Step: 74200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:02,999-Speed 9300.93 samples/sec Loss 7.8126 LearningRate 0.0605 Epoch: 4 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:04,101-Speed 9294.09 samples/sec Loss 7.7293 LearningRate 0.0605 Epoch: 4 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:05,212-Speed 9221.15 samples/sec Loss 7.7448 LearningRate 0.0605 Epoch: 4 Global Step: 74230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:06,271-Speed 9681.60 samples/sec Loss 7.9101 LearningRate 0.0605 Epoch: 4 Global Step: 74240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:07,347-Speed 9521.81 samples/sec Loss 7.6965 LearningRate 0.0605 Epoch: 4 Global Step: 74250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:08,456-Speed 9234.98 samples/sec Loss 7.7927 LearningRate 0.0605 Epoch: 4 Global Step: 74260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:09,577-Speed 9147.57 samples/sec Loss 7.6317 LearningRate 0.0605 Epoch: 4 Global Step: 74270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:10,674-Speed 9338.40 samples/sec Loss 7.7108 LearningRate 0.0604 Epoch: 4 Global Step: 74280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:11,779-Speed 9271.49 samples/sec Loss 7.6179 LearningRate 0.0604 Epoch: 4 Global Step: 74290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:12,866-Speed 9432.21 samples/sec Loss 7.6280 LearningRate 0.0604 Epoch: 4 Global Step: 74300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:14,016-Speed 8904.06 samples/sec Loss 7.7495 LearningRate 0.0604 Epoch: 4 Global Step: 74310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:15,077-Speed 9661.41 samples/sec Loss 7.6992 LearningRate 0.0604 Epoch: 4 Global Step: 74320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:16,213-Speed 9020.48 samples/sec Loss 7.7805 LearningRate 0.0604 Epoch: 4 Global Step: 74330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:17,315-Speed 9294.42 samples/sec Loss 7.6491 LearningRate 0.0604 Epoch: 4 Global Step: 74340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:18,397-Speed 9472.09 samples/sec Loss 7.7296 LearningRate 0.0604 Epoch: 4 Global Step: 74350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:19,498-Speed 9312.29 samples/sec Loss 7.7827 LearningRate 0.0604 Epoch: 4 Global Step: 74360 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:32:20,559-Speed 9654.02 samples/sec Loss 7.7711 LearningRate 0.0604 Epoch: 4 Global Step: 74370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:21,629-Speed 9573.64 samples/sec Loss 7.7994 LearningRate 0.0604 Epoch: 4 Global Step: 74380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:22,724-Speed 9355.87 samples/sec Loss 7.6494 LearningRate 0.0604 Epoch: 4 Global Step: 74390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:23,850-Speed 9097.46 samples/sec Loss 7.7080 LearningRate 0.0604 Epoch: 4 Global Step: 74400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:24,931-Speed 9485.83 samples/sec Loss 7.8726 LearningRate 0.0604 Epoch: 4 Global Step: 74410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:26,035-Speed 9276.77 samples/sec Loss 7.6046 LearningRate 0.0604 Epoch: 4 Global Step: 74420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:27,142-Speed 9257.27 samples/sec Loss 7.6652 LearningRate 0.0604 Epoch: 4 Global Step: 74430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:28,243-Speed 9307.14 samples/sec Loss 7.5779 LearningRate 0.0604 Epoch: 4 Global Step: 74440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:29,285-Speed 9830.40 samples/sec Loss 7.6955 LearningRate 0.0604 Epoch: 4 Global Step: 74450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:30,377-Speed 9384.34 samples/sec Loss 7.7307 LearningRate 0.0604 Epoch: 4 Global Step: 74460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:31,484-Speed 9261.74 samples/sec Loss 7.7081 LearningRate 0.0604 Epoch: 4 Global Step: 74470 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:32:32,572-Speed 9415.95 samples/sec Loss 7.7337 LearningRate 0.0604 Epoch: 4 Global Step: 74480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:33,703-Speed 9053.41 samples/sec Loss 7.7903 LearningRate 0.0604 Epoch: 4 Global Step: 74490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:34,806-Speed 9293.73 samples/sec Loss 7.7990 LearningRate 0.0603 Epoch: 4 Global Step: 74500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:35,898-Speed 9381.38 samples/sec Loss 7.6577 LearningRate 0.0603 Epoch: 4 Global Step: 74510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:36,973-Speed 9526.83 samples/sec Loss 7.8045 LearningRate 0.0603 Epoch: 4 Global Step: 74520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:38,080-Speed 9257.96 samples/sec Loss 7.6777 LearningRate 0.0603 Epoch: 4 Global Step: 74530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:39,165-Speed 9450.11 samples/sec Loss 7.7354 LearningRate 0.0603 Epoch: 4 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:40,234-Speed 9583.62 samples/sec Loss 7.6254 LearningRate 0.0603 Epoch: 4 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:41,304-Speed 9575.12 samples/sec Loss 7.5729 LearningRate 0.0603 Epoch: 4 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:42,431-Speed 9092.90 samples/sec Loss 7.8025 LearningRate 0.0603 Epoch: 4 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:43,511-Speed 9482.46 samples/sec Loss 7.6950 LearningRate 0.0603 Epoch: 4 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:44,595-Speed 9449.51 samples/sec Loss 7.7670 LearningRate 0.0603 Epoch: 4 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:45,704-Speed 9242.05 samples/sec Loss 7.6441 LearningRate 0.0603 Epoch: 4 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:46,764-Speed 9662.96 samples/sec Loss 7.6063 LearningRate 0.0603 Epoch: 4 Global Step: 74610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:32:47,873-Speed 9242.93 samples/sec Loss 7.7308 LearningRate 0.0603 Epoch: 4 Global Step: 74620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:48,960-Speed 9424.96 samples/sec Loss 7.7148 LearningRate 0.0603 Epoch: 4 Global Step: 74630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:50,005-Speed 9805.20 samples/sec Loss 7.6639 LearningRate 0.0603 Epoch: 4 Global Step: 74640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:51,064-Speed 9678.38 samples/sec Loss 7.7876 LearningRate 0.0603 Epoch: 4 Global Step: 74650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:52,106-Speed 9833.89 samples/sec Loss 7.7177 LearningRate 0.0603 Epoch: 4 Global Step: 74660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:53,204-Speed 9333.66 samples/sec Loss 7.7357 LearningRate 0.0603 Epoch: 4 Global Step: 74670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:54,256-Speed 9736.65 samples/sec Loss 7.6441 LearningRate 0.0603 Epoch: 4 Global Step: 74680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:55,339-Speed 9465.25 samples/sec Loss 7.7879 LearningRate 0.0603 Epoch: 4 Global Step: 74690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:56,419-Speed 9483.91 samples/sec Loss 7.6983 LearningRate 0.0603 Epoch: 4 Global Step: 74700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:57,525-Speed 9267.83 samples/sec Loss 7.6924 LearningRate 0.0602 Epoch: 4 Global Step: 74710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:32:58,615-Speed 9396.47 samples/sec Loss 7.6009 LearningRate 0.0602 Epoch: 4 Global Step: 74720 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:32:59,673-Speed 9680.45 samples/sec Loss 7.5920 LearningRate 0.0602 Epoch: 4 Global Step: 74730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:00,764-Speed 9396.72 samples/sec Loss 7.5836 LearningRate 0.0602 Epoch: 4 Global Step: 74740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:01,868-Speed 9282.71 samples/sec Loss 7.6934 LearningRate 0.0602 Epoch: 4 Global Step: 74750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:02,996-Speed 9083.31 samples/sec Loss 7.7337 LearningRate 0.0602 Epoch: 4 Global Step: 74760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:04,095-Speed 9318.31 samples/sec Loss 7.6716 LearningRate 0.0602 Epoch: 4 Global Step: 74770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:05,176-Speed 9477.75 samples/sec Loss 7.6808 LearningRate 0.0602 Epoch: 4 Global Step: 74780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:06,252-Speed 9523.56 samples/sec Loss 7.7057 LearningRate 0.0602 Epoch: 4 Global Step: 74790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:07,337-Speed 9444.79 samples/sec Loss 7.6631 LearningRate 0.0602 Epoch: 4 Global Step: 74800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:08,409-Speed 9556.15 samples/sec Loss 7.6698 LearningRate 0.0602 Epoch: 4 Global Step: 74810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:09,503-Speed 9370.43 samples/sec Loss 7.8071 LearningRate 0.0602 Epoch: 4 Global Step: 74820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:10,588-Speed 9441.38 samples/sec Loss 7.7066 LearningRate 0.0602 Epoch: 4 Global Step: 74830 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:33:11,682-Speed 9370.69 samples/sec Loss 7.7264 LearningRate 0.0602 Epoch: 4 Global Step: 74840 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:33:12,758-Speed 9521.89 samples/sec Loss 7.7400 LearningRate 0.0602 Epoch: 4 Global Step: 74850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:13,825-Speed 9599.49 samples/sec Loss 7.6842 LearningRate 0.0602 Epoch: 4 Global Step: 74860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:14,918-Speed 9379.42 samples/sec Loss 7.7622 LearningRate 0.0602 Epoch: 4 Global Step: 74870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:16,009-Speed 9386.05 samples/sec Loss 7.8018 LearningRate 0.0602 Epoch: 4 Global Step: 74880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:17,077-Speed 9599.25 samples/sec Loss 7.6562 LearningRate 0.0602 Epoch: 4 Global Step: 74890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:18,177-Speed 9307.87 samples/sec Loss 7.7817 LearningRate 0.0602 Epoch: 4 Global Step: 74900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:19,273-Speed 9351.27 samples/sec Loss 7.7279 LearningRate 0.0602 Epoch: 4 Global Step: 74910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:20,318-Speed 9799.81 samples/sec Loss 7.8157 LearningRate 0.0602 Epoch: 4 Global Step: 74920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:21,406-Speed 9416.42 samples/sec Loss 7.7031 LearningRate 0.0601 Epoch: 4 Global Step: 74930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:22,518-Speed 9216.90 samples/sec Loss 7.6754 LearningRate 0.0601 Epoch: 4 Global Step: 74940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:23,665-Speed 8934.96 samples/sec Loss 7.6564 LearningRate 0.0601 Epoch: 4 Global Step: 74950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:24,763-Speed 9337.24 samples/sec Loss 7.6455 LearningRate 0.0601 Epoch: 4 Global Step: 74960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:25,873-Speed 9232.39 samples/sec Loss 7.7112 LearningRate 0.0601 Epoch: 4 Global Step: 74970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:26,959-Speed 9434.82 samples/sec Loss 7.6109 LearningRate 0.0601 Epoch: 4 Global Step: 74980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:28,025-Speed 9612.47 samples/sec Loss 7.7165 LearningRate 0.0601 Epoch: 4 Global Step: 74990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:29,099-Speed 9539.30 samples/sec Loss 7.7508 LearningRate 0.0601 Epoch: 4 Global Step: 75000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:30,196-Speed 9339.21 samples/sec Loss 7.8384 LearningRate 0.0601 Epoch: 4 Global Step: 75010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:31,307-Speed 9230.61 samples/sec Loss 7.7124 LearningRate 0.0601 Epoch: 4 Global Step: 75020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:32,401-Speed 9363.68 samples/sec Loss 7.6991 LearningRate 0.0601 Epoch: 4 Global Step: 75030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:33,475-Speed 9536.59 samples/sec Loss 7.7375 LearningRate 0.0601 Epoch: 4 Global Step: 75040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:34,586-Speed 9225.50 samples/sec Loss 7.6357 LearningRate 0.0601 Epoch: 4 Global Step: 75050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:35,634-Speed 9770.78 samples/sec Loss 7.6418 LearningRate 0.0601 Epoch: 4 Global Step: 75060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:36,736-Speed 9304.32 samples/sec Loss 7.6642 LearningRate 0.0601 Epoch: 4 Global Step: 75070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:37,841-Speed 9269.58 samples/sec Loss 7.6619 LearningRate 0.0601 Epoch: 4 Global Step: 75080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:38,932-Speed 9391.18 samples/sec Loss 7.7716 LearningRate 0.0601 Epoch: 4 Global Step: 75090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:40,014-Speed 9471.27 samples/sec Loss 7.7068 LearningRate 0.0601 Epoch: 4 Global Step: 75100 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:33:41,143-Speed 9070.41 samples/sec Loss 7.7541 LearningRate 0.0601 Epoch: 4 Global Step: 75110 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:33:42,268-Speed 9103.64 samples/sec Loss 7.7590 LearningRate 0.0601 Epoch: 4 Global Step: 75120 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:33:43,411-Speed 8972.05 samples/sec Loss 7.7383 LearningRate 0.0601 Epoch: 4 Global Step: 75130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:44,503-Speed 9382.96 samples/sec Loss 7.7469 LearningRate 0.0600 Epoch: 4 Global Step: 75140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:45,597-Speed 9362.27 samples/sec Loss 7.7063 LearningRate 0.0600 Epoch: 4 Global Step: 75150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:46,647-Speed 9761.83 samples/sec Loss 7.7616 LearningRate 0.0600 Epoch: 4 Global Step: 75160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:47,753-Speed 9264.76 samples/sec Loss 7.7242 LearningRate 0.0600 Epoch: 4 Global Step: 75170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:48,855-Speed 9302.59 samples/sec Loss 7.6955 LearningRate 0.0600 Epoch: 4 Global Step: 75180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:49,923-Speed 9584.97 samples/sec Loss 7.6636 LearningRate 0.0600 Epoch: 4 Global Step: 75190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:51,013-Speed 9405.62 samples/sec Loss 7.7325 LearningRate 0.0600 Epoch: 4 Global Step: 75200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:33:52,091-Speed 9503.94 samples/sec Loss 7.6778 LearningRate 0.0600 Epoch: 4 Global Step: 75210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:53,158-Speed 9601.80 samples/sec Loss 7.7559 LearningRate 0.0600 Epoch: 4 Global Step: 75220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:54,219-Speed 9658.91 samples/sec Loss 7.6711 LearningRate 0.0600 Epoch: 4 Global Step: 75230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:55,286-Speed 9602.39 samples/sec Loss 7.7598 LearningRate 0.0600 Epoch: 4 Global Step: 75240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:56,376-Speed 9401.42 samples/sec Loss 7.6958 LearningRate 0.0600 Epoch: 4 Global Step: 75250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:57,465-Speed 9410.14 samples/sec Loss 7.6229 LearningRate 0.0600 Epoch: 4 Global Step: 75260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:58,512-Speed 9780.76 samples/sec Loss 7.6900 LearningRate 0.0600 Epoch: 4 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:33:59,604-Speed 9386.20 samples/sec Loss 7.6266 LearningRate 0.0600 Epoch: 4 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:00,669-Speed 9622.04 samples/sec Loss 7.8112 LearningRate 0.0600 Epoch: 4 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:01,742-Speed 9547.47 samples/sec Loss 7.7357 LearningRate 0.0600 Epoch: 4 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:02,844-Speed 9301.30 samples/sec Loss 7.7116 LearningRate 0.0600 Epoch: 4 Global Step: 75310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:03,927-Speed 9458.26 samples/sec Loss 7.8207 LearningRate 0.0600 Epoch: 4 Global Step: 75320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:05,040-Speed 9221.71 samples/sec Loss 7.6456 LearningRate 0.0600 Epoch: 4 Global Step: 75330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:06,108-Speed 9586.81 samples/sec Loss 7.7050 LearningRate 0.0600 Epoch: 4 Global Step: 75340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:07,218-Speed 9234.65 samples/sec Loss 7.6994 LearningRate 0.0600 Epoch: 4 Global Step: 75350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:08,314-Speed 9348.91 samples/sec Loss 7.7548 LearningRate 0.0599 Epoch: 4 Global Step: 75360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:09,360-Speed 9792.78 samples/sec Loss 7.6627 LearningRate 0.0599 Epoch: 4 Global Step: 75370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:10,461-Speed 9304.85 samples/sec Loss 7.6593 LearningRate 0.0599 Epoch: 4 Global Step: 75380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:11,530-Speed 9585.77 samples/sec Loss 7.7702 LearningRate 0.0599 Epoch: 4 Global Step: 75390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:12,639-Speed 9238.26 samples/sec Loss 7.6703 LearningRate 0.0599 Epoch: 4 Global Step: 75400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:13,721-Speed 9468.72 samples/sec Loss 7.7506 LearningRate 0.0599 Epoch: 4 Global Step: 75410 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:34:14,813-Speed 9391.62 samples/sec Loss 7.6666 LearningRate 0.0599 Epoch: 4 Global Step: 75420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:15,903-Speed 9394.85 samples/sec Loss 7.7893 LearningRate 0.0599 Epoch: 4 Global Step: 75430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:17,024-Speed 9143.26 samples/sec Loss 7.7913 LearningRate 0.0599 Epoch: 4 Global Step: 75440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:18,162-Speed 9001.09 samples/sec Loss 7.7643 LearningRate 0.0599 Epoch: 4 Global Step: 75450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:19,259-Speed 9342.46 samples/sec Loss 7.7643 LearningRate 0.0599 Epoch: 4 Global Step: 75460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:20,323-Speed 9625.83 samples/sec Loss 7.7918 LearningRate 0.0599 Epoch: 4 Global Step: 75470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:21,424-Speed 9302.46 samples/sec Loss 7.6304 LearningRate 0.0599 Epoch: 4 Global Step: 75480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:22,515-Speed 9396.69 samples/sec Loss 7.6871 LearningRate 0.0599 Epoch: 4 Global Step: 75490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:23,605-Speed 9400.46 samples/sec Loss 7.7607 LearningRate 0.0599 Epoch: 4 Global Step: 75500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:24,748-Speed 8965.48 samples/sec Loss 7.7001 LearningRate 0.0599 Epoch: 4 Global Step: 75510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:25,776-Speed 9968.48 samples/sec Loss 7.7561 LearningRate 0.0599 Epoch: 4 Global Step: 75520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:34:26,897-Speed 9137.58 samples/sec Loss 7.7267 LearningRate 0.0599 Epoch: 4 Global Step: 75530 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:34:27,958-Speed 9660.65 samples/sec Loss 7.6183 LearningRate 0.0599 Epoch: 4 Global Step: 75540 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:34:29,009-Speed 9749.48 samples/sec Loss 7.6574 LearningRate 0.0599 Epoch: 4 Global Step: 75550 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:34:30,067-Speed 9677.86 samples/sec Loss 7.8451 LearningRate 0.0599 Epoch: 4 Global Step: 75560 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:34:31,152-Speed 9449.91 samples/sec Loss 7.6569 LearningRate 0.0598 Epoch: 4 Global Step: 75570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:32,227-Speed 9526.54 samples/sec Loss 7.6122 LearningRate 0.0598 Epoch: 4 Global Step: 75580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:33,297-Speed 9578.12 samples/sec Loss 7.7094 LearningRate 0.0598 Epoch: 4 Global Step: 75590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:34,372-Speed 9528.76 samples/sec Loss 7.8231 LearningRate 0.0598 Epoch: 4 Global Step: 75600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:35,489-Speed 9173.95 samples/sec Loss 7.6476 LearningRate 0.0598 Epoch: 4 Global Step: 75610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:36,596-Speed 9251.43 samples/sec Loss 7.6871 LearningRate 0.0598 Epoch: 4 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:37,647-Speed 9756.28 samples/sec Loss 7.6200 LearningRate 0.0598 Epoch: 4 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:38,723-Speed 9521.22 samples/sec Loss 7.7264 LearningRate 0.0598 Epoch: 4 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:39,830-Speed 9251.51 samples/sec Loss 7.6525 LearningRate 0.0598 Epoch: 4 Global Step: 75650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:40,949-Speed 9161.96 samples/sec Loss 7.6892 LearningRate 0.0598 Epoch: 4 Global Step: 75660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:42,028-Speed 9492.46 samples/sec Loss 7.6652 LearningRate 0.0598 Epoch: 4 Global Step: 75670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:43,105-Speed 9512.12 samples/sec Loss 7.7016 LearningRate 0.0598 Epoch: 4 Global Step: 75680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:44,156-Speed 9756.91 samples/sec Loss 7.6619 LearningRate 0.0598 Epoch: 4 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:45,195-Speed 9854.76 samples/sec Loss 7.7228 LearningRate 0.0598 Epoch: 4 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:46,275-Speed 9487.37 samples/sec Loss 7.7200 LearningRate 0.0598 Epoch: 4 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:34:47,336-Speed 9661.68 samples/sec Loss 7.6773 LearningRate 0.0598 Epoch: 4 Global Step: 75720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:48,382-Speed 9790.61 samples/sec Loss 7.7386 LearningRate 0.0598 Epoch: 4 Global Step: 75730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:49,447-Speed 9621.40 samples/sec Loss 7.6192 LearningRate 0.0598 Epoch: 4 Global Step: 75740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:50,504-Speed 9695.43 samples/sec Loss 7.5187 LearningRate 0.0598 Epoch: 4 Global Step: 75750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:51,548-Speed 9816.08 samples/sec Loss 7.7086 LearningRate 0.0598 Epoch: 4 Global Step: 75760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:52,631-Speed 9457.76 samples/sec Loss 7.6582 LearningRate 0.0598 Epoch: 4 Global Step: 75770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:53,752-Speed 9142.00 samples/sec Loss 7.7763 LearningRate 0.0598 Epoch: 4 Global Step: 75780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:54,838-Speed 9435.42 samples/sec Loss 7.7005 LearningRate 0.0597 Epoch: 4 Global Step: 75790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:55,923-Speed 9446.29 samples/sec Loss 7.7155 LearningRate 0.0597 Epoch: 4 Global Step: 75800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:57,006-Speed 9459.02 samples/sec Loss 7.6868 LearningRate 0.0597 Epoch: 4 Global Step: 75810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:34:58,051-Speed 9800.52 samples/sec Loss 7.5923 LearningRate 0.0597 Epoch: 4 Global Step: 75820 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:34:59,114-Speed 9640.74 samples/sec Loss 7.6622 LearningRate 0.0597 Epoch: 4 Global Step: 75830 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:35:00,203-Speed 9415.15 samples/sec Loss 7.6166 LearningRate 0.0597 Epoch: 4 Global Step: 75840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:01,259-Speed 9703.84 samples/sec Loss 7.8831 LearningRate 0.0597 Epoch: 4 Global Step: 75850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:02,364-Speed 9274.36 samples/sec Loss 7.6937 LearningRate 0.0597 Epoch: 4 Global Step: 75860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:03,433-Speed 9584.30 samples/sec Loss 7.6105 LearningRate 0.0597 Epoch: 4 Global Step: 75870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:04,526-Speed 9368.02 samples/sec Loss 7.6401 LearningRate 0.0597 Epoch: 4 Global Step: 75880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:05,599-Speed 9555.99 samples/sec Loss 7.8226 LearningRate 0.0597 Epoch: 4 Global Step: 75890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:06,673-Speed 9537.05 samples/sec Loss 7.6737 LearningRate 0.0597 Epoch: 4 Global Step: 75900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:07,762-Speed 9404.08 samples/sec Loss 7.7453 LearningRate 0.0597 Epoch: 4 Global Step: 75910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:08,841-Speed 9500.98 samples/sec Loss 7.7002 LearningRate 0.0597 Epoch: 4 Global Step: 75920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:09,925-Speed 9447.79 samples/sec Loss 7.5009 LearningRate 0.0597 Epoch: 4 Global Step: 75930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:10,972-Speed 9784.24 samples/sec Loss 7.6305 LearningRate 0.0597 Epoch: 4 Global Step: 75940 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:35:12,033-Speed 9660.75 samples/sec Loss 7.6159 LearningRate 0.0597 Epoch: 4 Global Step: 75950 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:35:13,180-Speed 8929.49 samples/sec Loss 7.7100 LearningRate 0.0597 Epoch: 4 Global Step: 75960 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:35:14,254-Speed 9550.78 samples/sec Loss 7.7015 LearningRate 0.0597 Epoch: 4 Global Step: 75970 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:35:15,350-Speed 9345.11 samples/sec Loss 7.5848 LearningRate 0.0597 Epoch: 4 Global Step: 75980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:16,477-Speed 9095.70 samples/sec Loss 7.7701 LearningRate 0.0597 Epoch: 4 Global Step: 75990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:17,600-Speed 9119.54 samples/sec Loss 7.6163 LearningRate 0.0596 Epoch: 4 Global Step: 76000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:35:39,740-[lfw][76000]XNorm: 11.971042 Training: 2022-04-11 14:35:39,741-[lfw][76000]Accuracy-Flip: 0.99517+-0.00302 Training: 2022-04-11 14:35:39,741-[lfw][76000]Accuracy-Highest: 0.99583 Training: 2022-04-11 14:36:05,359-[cfp_fp][76000]XNorm: 10.078779 Training: 2022-04-11 14:36:05,360-[cfp_fp][76000]Accuracy-Flip: 0.95400+-0.00905 Training: 2022-04-11 14:36:05,360-[cfp_fp][76000]Accuracy-Highest: 0.95400 Training: 2022-04-11 14:36:27,480-[agedb_30][76000]XNorm: 11.492150 Training: 2022-04-11 14:36:27,481-[agedb_30][76000]Accuracy-Flip: 0.96067+-0.00967 Training: 2022-04-11 14:36:27,481-[agedb_30][76000]Accuracy-Highest: 0.96067 Training: 2022-04-11 14:36:28,600-Speed 144.23 samples/sec Loss 7.6557 LearningRate 0.0596 Epoch: 4 Global Step: 76010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:29,645-Speed 9804.31 samples/sec Loss 7.7408 LearningRate 0.0596 Epoch: 4 Global Step: 76020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:30,680-Speed 9904.17 samples/sec Loss 7.7558 LearningRate 0.0596 Epoch: 4 Global Step: 76030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:31,753-Speed 9545.65 samples/sec Loss 7.6399 LearningRate 0.0596 Epoch: 4 Global Step: 76040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:32,878-Speed 9119.42 samples/sec Loss 7.7199 LearningRate 0.0596 Epoch: 4 Global Step: 76050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:33,989-Speed 9214.27 samples/sec Loss 7.6388 LearningRate 0.0596 Epoch: 4 Global Step: 76060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:35,078-Speed 9414.11 samples/sec Loss 7.7027 LearningRate 0.0596 Epoch: 4 Global Step: 76070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:36,147-Speed 9587.18 samples/sec Loss 7.6163 LearningRate 0.0596 Epoch: 4 Global Step: 76080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:37,186-Speed 9860.22 samples/sec Loss 7.7661 LearningRate 0.0596 Epoch: 4 Global Step: 76090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:38,266-Speed 9486.63 samples/sec Loss 7.7015 LearningRate 0.0596 Epoch: 4 Global Step: 76100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:39,338-Speed 9558.88 samples/sec Loss 7.7157 LearningRate 0.0596 Epoch: 4 Global Step: 76110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:40,439-Speed 9301.96 samples/sec Loss 7.6732 LearningRate 0.0596 Epoch: 4 Global Step: 76120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:41,548-Speed 9245.23 samples/sec Loss 7.7260 LearningRate 0.0596 Epoch: 4 Global Step: 76130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:42,605-Speed 9688.20 samples/sec Loss 7.7398 LearningRate 0.0596 Epoch: 4 Global Step: 76140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:43,711-Speed 9267.33 samples/sec Loss 7.7804 LearningRate 0.0596 Epoch: 4 Global Step: 76150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:44,811-Speed 9314.57 samples/sec Loss 7.6468 LearningRate 0.0596 Epoch: 4 Global Step: 76160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:45,861-Speed 9760.54 samples/sec Loss 7.7378 LearningRate 0.0596 Epoch: 4 Global Step: 76170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:46,933-Speed 9564.74 samples/sec Loss 7.7101 LearningRate 0.0596 Epoch: 4 Global Step: 76180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:47,975-Speed 9836.25 samples/sec Loss 7.7804 LearningRate 0.0596 Epoch: 4 Global Step: 76190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:49,055-Speed 9479.49 samples/sec Loss 7.6474 LearningRate 0.0596 Epoch: 4 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:50,150-Speed 9358.72 samples/sec Loss 7.6997 LearningRate 0.0596 Epoch: 4 Global Step: 76210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:51,247-Speed 9337.76 samples/sec Loss 7.7355 LearningRate 0.0595 Epoch: 4 Global Step: 76220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:36:52,342-Speed 9359.29 samples/sec Loss 7.6910 LearningRate 0.0595 Epoch: 4 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:53,430-Speed 9413.31 samples/sec Loss 7.7167 LearningRate 0.0595 Epoch: 4 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:54,537-Speed 9259.89 samples/sec Loss 7.6636 LearningRate 0.0595 Epoch: 4 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:55,638-Speed 9301.12 samples/sec Loss 7.6810 LearningRate 0.0595 Epoch: 4 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:56,720-Speed 9469.64 samples/sec Loss 7.7711 LearningRate 0.0595 Epoch: 4 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:57,801-Speed 9478.05 samples/sec Loss 7.7270 LearningRate 0.0595 Epoch: 4 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:58,890-Speed 9407.47 samples/sec Loss 7.5351 LearningRate 0.0595 Epoch: 4 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:36:59,950-Speed 9678.13 samples/sec Loss 7.5800 LearningRate 0.0595 Epoch: 4 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:01,042-Speed 9384.90 samples/sec Loss 7.5478 LearningRate 0.0595 Epoch: 4 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:02,133-Speed 9387.83 samples/sec Loss 7.5933 LearningRate 0.0595 Epoch: 4 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:03,205-Speed 9553.77 samples/sec Loss 7.6270 LearningRate 0.0595 Epoch: 4 Global Step: 76330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:04,282-Speed 9521.84 samples/sec Loss 7.6851 LearningRate 0.0595 Epoch: 4 Global Step: 76340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:05,366-Speed 9448.79 samples/sec Loss 7.6292 LearningRate 0.0595 Epoch: 4 Global Step: 76350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:06,431-Speed 9625.55 samples/sec Loss 7.5878 LearningRate 0.0595 Epoch: 4 Global Step: 76360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:07,519-Speed 9412.39 samples/sec Loss 7.6097 LearningRate 0.0595 Epoch: 4 Global Step: 76370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:08,636-Speed 9178.57 samples/sec Loss 7.6887 LearningRate 0.0595 Epoch: 4 Global Step: 76380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:09,752-Speed 9180.98 samples/sec Loss 7.6130 LearningRate 0.0595 Epoch: 4 Global Step: 76390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:10,819-Speed 9597.91 samples/sec Loss 7.7013 LearningRate 0.0595 Epoch: 4 Global Step: 76400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:11,905-Speed 9435.63 samples/sec Loss 7.6056 LearningRate 0.0595 Epoch: 4 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:12,980-Speed 9529.51 samples/sec Loss 7.5751 LearningRate 0.0595 Epoch: 4 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:14,081-Speed 9309.57 samples/sec Loss 7.7110 LearningRate 0.0595 Epoch: 4 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:15,180-Speed 9324.38 samples/sec Loss 7.6858 LearningRate 0.0594 Epoch: 4 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:16,242-Speed 9651.81 samples/sec Loss 7.6351 LearningRate 0.0594 Epoch: 4 Global Step: 76450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:17,316-Speed 9539.42 samples/sec Loss 7.7915 LearningRate 0.0594 Epoch: 4 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:18,438-Speed 9124.30 samples/sec Loss 7.7554 LearningRate 0.0594 Epoch: 4 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:19,544-Speed 9271.03 samples/sec Loss 7.8388 LearningRate 0.0594 Epoch: 4 Global Step: 76480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:20,650-Speed 9260.84 samples/sec Loss 7.7320 LearningRate 0.0594 Epoch: 4 Global Step: 76490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:21,736-Speed 9431.98 samples/sec Loss 7.6656 LearningRate 0.0594 Epoch: 4 Global Step: 76500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:22,807-Speed 9569.64 samples/sec Loss 7.7122 LearningRate 0.0594 Epoch: 4 Global Step: 76510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:23,856-Speed 9770.60 samples/sec Loss 7.7083 LearningRate 0.0594 Epoch: 4 Global Step: 76520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:24,910-Speed 9726.24 samples/sec Loss 7.6734 LearningRate 0.0594 Epoch: 4 Global Step: 76530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:25,981-Speed 9558.10 samples/sec Loss 7.6590 LearningRate 0.0594 Epoch: 4 Global Step: 76540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:27,056-Speed 9533.03 samples/sec Loss 7.7099 LearningRate 0.0594 Epoch: 4 Global Step: 76550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:28,114-Speed 9687.73 samples/sec Loss 7.7094 LearningRate 0.0594 Epoch: 4 Global Step: 76560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:29,237-Speed 9127.45 samples/sec Loss 7.7314 LearningRate 0.0594 Epoch: 4 Global Step: 76570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:30,298-Speed 9651.95 samples/sec Loss 7.6480 LearningRate 0.0594 Epoch: 4 Global Step: 76580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:31,394-Speed 9350.76 samples/sec Loss 7.7098 LearningRate 0.0594 Epoch: 4 Global Step: 76590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:32,469-Speed 9531.77 samples/sec Loss 7.6397 LearningRate 0.0594 Epoch: 4 Global Step: 76600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:33,579-Speed 9230.38 samples/sec Loss 7.7184 LearningRate 0.0594 Epoch: 4 Global Step: 76610 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:37:34,645-Speed 9607.09 samples/sec Loss 7.7007 LearningRate 0.0594 Epoch: 4 Global Step: 76620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:35,695-Speed 9758.24 samples/sec Loss 7.6067 LearningRate 0.0594 Epoch: 4 Global Step: 76630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:36,772-Speed 9515.82 samples/sec Loss 7.6932 LearningRate 0.0594 Epoch: 4 Global Step: 76640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:37,855-Speed 9458.20 samples/sec Loss 7.5425 LearningRate 0.0593 Epoch: 4 Global Step: 76650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:38,903-Speed 9776.41 samples/sec Loss 7.5341 LearningRate 0.0593 Epoch: 4 Global Step: 76660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:40,016-Speed 9215.90 samples/sec Loss 7.6828 LearningRate 0.0593 Epoch: 4 Global Step: 76670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:41,080-Speed 9632.59 samples/sec Loss 7.6486 LearningRate 0.0593 Epoch: 4 Global Step: 76680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:42,154-Speed 9536.50 samples/sec Loss 7.6957 LearningRate 0.0593 Epoch: 4 Global Step: 76690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:43,210-Speed 9701.26 samples/sec Loss 7.6026 LearningRate 0.0593 Epoch: 4 Global Step: 76700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:44,291-Speed 9483.04 samples/sec Loss 7.7901 LearningRate 0.0593 Epoch: 4 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:45,345-Speed 9725.16 samples/sec Loss 7.6477 LearningRate 0.0593 Epoch: 4 Global Step: 76720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:46,430-Speed 9439.67 samples/sec Loss 7.6521 LearningRate 0.0593 Epoch: 4 Global Step: 76730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:47,527-Speed 9342.30 samples/sec Loss 7.6177 LearningRate 0.0593 Epoch: 4 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:48,603-Speed 9527.55 samples/sec Loss 7.6042 LearningRate 0.0593 Epoch: 4 Global Step: 76750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:49,643-Speed 9846.61 samples/sec Loss 7.6013 LearningRate 0.0593 Epoch: 4 Global Step: 76760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:50,707-Speed 9629.88 samples/sec Loss 7.6625 LearningRate 0.0593 Epoch: 4 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:51,801-Speed 9370.12 samples/sec Loss 7.6128 LearningRate 0.0593 Epoch: 4 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:52,889-Speed 9414.22 samples/sec Loss 7.8256 LearningRate 0.0593 Epoch: 4 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:37:53,979-Speed 9398.51 samples/sec Loss 7.6665 LearningRate 0.0593 Epoch: 4 Global Step: 76800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:55,030-Speed 9753.03 samples/sec Loss 7.7214 LearningRate 0.0593 Epoch: 4 Global Step: 76810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:56,113-Speed 9459.06 samples/sec Loss 7.7808 LearningRate 0.0593 Epoch: 4 Global Step: 76820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:57,171-Speed 9681.21 samples/sec Loss 7.4894 LearningRate 0.0593 Epoch: 4 Global Step: 76830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:58,279-Speed 9243.53 samples/sec Loss 7.7540 LearningRate 0.0593 Epoch: 4 Global Step: 76840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:37:59,377-Speed 9338.53 samples/sec Loss 7.6286 LearningRate 0.0593 Epoch: 4 Global Step: 76850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:00,444-Speed 9600.75 samples/sec Loss 7.7741 LearningRate 0.0593 Epoch: 4 Global Step: 76860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:01,522-Speed 9509.11 samples/sec Loss 7.5438 LearningRate 0.0592 Epoch: 4 Global Step: 76870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:02,622-Speed 9320.34 samples/sec Loss 7.7029 LearningRate 0.0592 Epoch: 4 Global Step: 76880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:03,731-Speed 9237.24 samples/sec Loss 7.7500 LearningRate 0.0592 Epoch: 4 Global Step: 76890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:04,814-Speed 9461.86 samples/sec Loss 7.7432 LearningRate 0.0592 Epoch: 4 Global Step: 76900 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:38:05,867-Speed 9725.01 samples/sec Loss 7.5830 LearningRate 0.0592 Epoch: 4 Global Step: 76910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:06,932-Speed 9617.45 samples/sec Loss 7.6381 LearningRate 0.0592 Epoch: 4 Global Step: 76920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:08,055-Speed 9126.50 samples/sec Loss 7.6597 LearningRate 0.0592 Epoch: 4 Global Step: 76930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:09,109-Speed 9723.14 samples/sec Loss 7.7784 LearningRate 0.0592 Epoch: 4 Global Step: 76940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:10,180-Speed 9564.54 samples/sec Loss 7.6244 LearningRate 0.0592 Epoch: 4 Global Step: 76950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:11,281-Speed 9312.45 samples/sec Loss 7.7284 LearningRate 0.0592 Epoch: 4 Global Step: 76960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:12,342-Speed 9657.13 samples/sec Loss 7.6547 LearningRate 0.0592 Epoch: 4 Global Step: 76970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:13,417-Speed 9527.19 samples/sec Loss 7.8048 LearningRate 0.0592 Epoch: 4 Global Step: 76980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:14,547-Speed 9066.12 samples/sec Loss 7.7704 LearningRate 0.0592 Epoch: 4 Global Step: 76990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:15,647-Speed 9313.40 samples/sec Loss 7.7346 LearningRate 0.0592 Epoch: 4 Global Step: 77000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:16,758-Speed 9223.33 samples/sec Loss 7.6435 LearningRate 0.0592 Epoch: 4 Global Step: 77010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:17,836-Speed 9501.26 samples/sec Loss 7.7361 LearningRate 0.0592 Epoch: 4 Global Step: 77020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:18,878-Speed 9836.00 samples/sec Loss 7.7636 LearningRate 0.0592 Epoch: 4 Global Step: 77030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:19,934-Speed 9703.23 samples/sec Loss 7.6643 LearningRate 0.0592 Epoch: 4 Global Step: 77040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:21,010-Speed 9523.53 samples/sec Loss 7.7595 LearningRate 0.0592 Epoch: 4 Global Step: 77050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:22,110-Speed 9318.21 samples/sec Loss 7.6439 LearningRate 0.0592 Epoch: 4 Global Step: 77060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:23,191-Speed 9472.49 samples/sec Loss 7.6489 LearningRate 0.0592 Epoch: 4 Global Step: 77070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:24,251-Speed 9673.77 samples/sec Loss 7.6493 LearningRate 0.0592 Epoch: 4 Global Step: 77080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:25,324-Speed 9547.30 samples/sec Loss 7.7300 LearningRate 0.0591 Epoch: 4 Global Step: 77090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:26,411-Speed 9422.97 samples/sec Loss 7.7611 LearningRate 0.0591 Epoch: 4 Global Step: 77100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:27,486-Speed 9532.85 samples/sec Loss 7.5951 LearningRate 0.0591 Epoch: 4 Global Step: 77110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:28,581-Speed 9356.36 samples/sec Loss 7.6268 LearningRate 0.0591 Epoch: 4 Global Step: 77120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:29,628-Speed 9788.89 samples/sec Loss 7.6084 LearningRate 0.0591 Epoch: 4 Global Step: 77130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:30,666-Speed 9873.03 samples/sec Loss 7.7152 LearningRate 0.0591 Epoch: 4 Global Step: 77140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:31,729-Speed 9637.49 samples/sec Loss 7.7142 LearningRate 0.0591 Epoch: 4 Global Step: 77150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:32,843-Speed 9196.43 samples/sec Loss 7.5392 LearningRate 0.0591 Epoch: 4 Global Step: 77160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:33,916-Speed 9546.88 samples/sec Loss 7.7210 LearningRate 0.0591 Epoch: 4 Global Step: 77170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:35,008-Speed 9377.92 samples/sec Loss 7.6985 LearningRate 0.0591 Epoch: 4 Global Step: 77180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:36,081-Speed 9548.20 samples/sec Loss 7.7245 LearningRate 0.0591 Epoch: 4 Global Step: 77190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:37,157-Speed 9524.08 samples/sec Loss 7.6569 LearningRate 0.0591 Epoch: 4 Global Step: 77200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:38,232-Speed 9533.99 samples/sec Loss 7.6588 LearningRate 0.0591 Epoch: 4 Global Step: 77210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:39,314-Speed 9475.94 samples/sec Loss 7.7178 LearningRate 0.0591 Epoch: 4 Global Step: 77220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:40,396-Speed 9470.37 samples/sec Loss 7.7909 LearningRate 0.0591 Epoch: 4 Global Step: 77230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:41,473-Speed 9514.12 samples/sec Loss 7.6293 LearningRate 0.0591 Epoch: 4 Global Step: 77240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:42,533-Speed 9665.62 samples/sec Loss 7.6637 LearningRate 0.0591 Epoch: 4 Global Step: 77250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:43,624-Speed 9390.62 samples/sec Loss 7.7568 LearningRate 0.0591 Epoch: 4 Global Step: 77260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:44,736-Speed 9214.10 samples/sec Loss 7.5771 LearningRate 0.0591 Epoch: 4 Global Step: 77270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:45,763-Speed 9977.55 samples/sec Loss 7.6581 LearningRate 0.0591 Epoch: 4 Global Step: 77280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:46,792-Speed 9954.58 samples/sec Loss 7.6606 LearningRate 0.0591 Epoch: 4 Global Step: 77290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:47,866-Speed 9541.31 samples/sec Loss 7.7072 LearningRate 0.0590 Epoch: 4 Global Step: 77300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:48,931-Speed 9620.38 samples/sec Loss 7.6902 LearningRate 0.0590 Epoch: 4 Global Step: 77310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:50,034-Speed 9290.99 samples/sec Loss 7.8366 LearningRate 0.0590 Epoch: 4 Global Step: 77320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:51,136-Speed 9298.55 samples/sec Loss 7.7433 LearningRate 0.0590 Epoch: 4 Global Step: 77330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:52,225-Speed 9409.48 samples/sec Loss 7.6495 LearningRate 0.0590 Epoch: 4 Global Step: 77340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:53,307-Speed 9464.72 samples/sec Loss 7.7126 LearningRate 0.0590 Epoch: 4 Global Step: 77350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:54,384-Speed 9516.47 samples/sec Loss 7.6293 LearningRate 0.0590 Epoch: 4 Global Step: 77360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:55,461-Speed 9507.91 samples/sec Loss 7.5943 LearningRate 0.0590 Epoch: 4 Global Step: 77370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:38:56,568-Speed 9259.25 samples/sec Loss 7.5226 LearningRate 0.0590 Epoch: 4 Global Step: 77380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:57,678-Speed 9231.56 samples/sec Loss 7.6378 LearningRate 0.0590 Epoch: 4 Global Step: 77390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:58,784-Speed 9263.33 samples/sec Loss 7.6686 LearningRate 0.0590 Epoch: 4 Global Step: 77400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:38:59,881-Speed 9342.52 samples/sec Loss 7.6074 LearningRate 0.0590 Epoch: 4 Global Step: 77410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:39:00,984-Speed 9295.20 samples/sec Loss 7.6271 LearningRate 0.0590 Epoch: 4 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:39:02,087-Speed 9287.02 samples/sec Loss 7.7080 LearningRate 0.0590 Epoch: 4 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:39:03,192-Speed 9270.82 samples/sec Loss 7.6263 LearningRate 0.0590 Epoch: 4 Global Step: 77440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:39:04,273-Speed 9477.80 samples/sec Loss 7.7790 LearningRate 0.0590 Epoch: 4 Global Step: 77450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:39:05,355-Speed 9467.34 samples/sec Loss 7.7667 LearningRate 0.0590 Epoch: 4 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:39:06,507-Speed 8891.07 samples/sec Loss 7.7643 LearningRate 0.0590 Epoch: 4 Global Step: 77470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:39:07,584-Speed 9517.72 samples/sec Loss 7.7034 LearningRate 0.0590 Epoch: 4 Global Step: 77480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:08,648-Speed 9626.90 samples/sec Loss 7.6387 LearningRate 0.0590 Epoch: 4 Global Step: 77490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:09,722-Speed 9541.43 samples/sec Loss 7.6479 LearningRate 0.0590 Epoch: 4 Global Step: 77500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:10,796-Speed 9545.40 samples/sec Loss 7.6616 LearningRate 0.0590 Epoch: 4 Global Step: 77510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:11,860-Speed 9626.57 samples/sec Loss 7.7930 LearningRate 0.0589 Epoch: 4 Global Step: 77520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:12,930-Speed 9573.54 samples/sec Loss 7.7283 LearningRate 0.0589 Epoch: 4 Global Step: 77530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:14,040-Speed 9235.45 samples/sec Loss 7.8006 LearningRate 0.0589 Epoch: 4 Global Step: 77540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:15,102-Speed 9649.96 samples/sec Loss 7.8724 LearningRate 0.0589 Epoch: 4 Global Step: 77550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:16,163-Speed 9657.56 samples/sec Loss 7.6538 LearningRate 0.0589 Epoch: 4 Global Step: 77560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:17,263-Speed 9318.98 samples/sec Loss 7.5981 LearningRate 0.0589 Epoch: 4 Global Step: 77570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:18,349-Speed 9432.16 samples/sec Loss 7.6112 LearningRate 0.0589 Epoch: 4 Global Step: 77580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:19,462-Speed 9211.92 samples/sec Loss 7.6949 LearningRate 0.0589 Epoch: 4 Global Step: 77590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:20,549-Speed 9423.85 samples/sec Loss 7.6799 LearningRate 0.0589 Epoch: 4 Global Step: 77600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:21,596-Speed 9791.17 samples/sec Loss 7.6541 LearningRate 0.0589 Epoch: 4 Global Step: 77610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:22,706-Speed 9228.51 samples/sec Loss 7.7709 LearningRate 0.0589 Epoch: 4 Global Step: 77620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:23,769-Speed 9638.33 samples/sec Loss 7.6605 LearningRate 0.0589 Epoch: 4 Global Step: 77630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:24,845-Speed 9527.49 samples/sec Loss 7.6347 LearningRate 0.0589 Epoch: 4 Global Step: 77640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:25,907-Speed 9643.40 samples/sec Loss 7.6709 LearningRate 0.0589 Epoch: 4 Global Step: 77650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:27,001-Speed 9364.28 samples/sec Loss 7.6341 LearningRate 0.0589 Epoch: 4 Global Step: 77660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:28,068-Speed 9603.94 samples/sec Loss 7.6463 LearningRate 0.0589 Epoch: 4 Global Step: 77670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:29,134-Speed 9607.98 samples/sec Loss 7.7042 LearningRate 0.0589 Epoch: 4 Global Step: 77680 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:39:30,236-Speed 9299.11 samples/sec Loss 7.5839 LearningRate 0.0589 Epoch: 4 Global Step: 77690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:31,306-Speed 9581.39 samples/sec Loss 7.6189 LearningRate 0.0589 Epoch: 4 Global Step: 77700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:32,376-Speed 9577.52 samples/sec Loss 7.5762 LearningRate 0.0589 Epoch: 4 Global Step: 77710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:33,443-Speed 9602.46 samples/sec Loss 7.5885 LearningRate 0.0589 Epoch: 4 Global Step: 77720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:34,501-Speed 9675.83 samples/sec Loss 7.6247 LearningRate 0.0589 Epoch: 4 Global Step: 77730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:35,619-Speed 9165.99 samples/sec Loss 7.5747 LearningRate 0.0588 Epoch: 4 Global Step: 77740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:36,666-Speed 9786.97 samples/sec Loss 7.6904 LearningRate 0.0588 Epoch: 4 Global Step: 77750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:37,767-Speed 9310.18 samples/sec Loss 7.6819 LearningRate 0.0588 Epoch: 4 Global Step: 77760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:38,852-Speed 9441.00 samples/sec Loss 7.6622 LearningRate 0.0588 Epoch: 4 Global Step: 77770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:39,903-Speed 9755.33 samples/sec Loss 7.6716 LearningRate 0.0588 Epoch: 4 Global Step: 77780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:40,960-Speed 9692.53 samples/sec Loss 7.6897 LearningRate 0.0588 Epoch: 4 Global Step: 77790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:42,041-Speed 9478.17 samples/sec Loss 7.6902 LearningRate 0.0588 Epoch: 4 Global Step: 77800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:43,145-Speed 9281.17 samples/sec Loss 7.6361 LearningRate 0.0588 Epoch: 4 Global Step: 77810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:44,229-Speed 9444.33 samples/sec Loss 7.6009 LearningRate 0.0588 Epoch: 4 Global Step: 77820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:45,294-Speed 9625.37 samples/sec Loss 7.5196 LearningRate 0.0588 Epoch: 4 Global Step: 77830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:46,369-Speed 9532.98 samples/sec Loss 7.7084 LearningRate 0.0588 Epoch: 4 Global Step: 77840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:47,434-Speed 9619.81 samples/sec Loss 7.6564 LearningRate 0.0588 Epoch: 4 Global Step: 77850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:48,505-Speed 9571.82 samples/sec Loss 7.6103 LearningRate 0.0588 Epoch: 4 Global Step: 77860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:49,572-Speed 9599.69 samples/sec Loss 7.5398 LearningRate 0.0588 Epoch: 4 Global Step: 77870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:50,661-Speed 9405.73 samples/sec Loss 7.5741 LearningRate 0.0588 Epoch: 4 Global Step: 77880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:51,800-Speed 9001.23 samples/sec Loss 7.5899 LearningRate 0.0588 Epoch: 4 Global Step: 77890 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:39:52,859-Speed 9668.66 samples/sec Loss 7.5755 LearningRate 0.0588 Epoch: 4 Global Step: 77900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:53,938-Speed 9493.92 samples/sec Loss 7.6374 LearningRate 0.0588 Epoch: 4 Global Step: 77910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:54,988-Speed 9764.81 samples/sec Loss 7.6502 LearningRate 0.0588 Epoch: 4 Global Step: 77920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:56,068-Speed 9487.61 samples/sec Loss 7.6796 LearningRate 0.0588 Epoch: 4 Global Step: 77930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:57,153-Speed 9447.66 samples/sec Loss 7.6480 LearningRate 0.0588 Epoch: 4 Global Step: 77940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:58,223-Speed 9573.40 samples/sec Loss 7.6498 LearningRate 0.0588 Epoch: 4 Global Step: 77950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:39:59,297-Speed 9550.60 samples/sec Loss 7.6523 LearningRate 0.0587 Epoch: 4 Global Step: 77960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:40:00,363-Speed 9619.43 samples/sec Loss 7.7279 LearningRate 0.0587 Epoch: 4 Global Step: 77970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:40:01,428-Speed 9613.23 samples/sec Loss 7.6778 LearningRate 0.0587 Epoch: 4 Global Step: 77980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:40:02,522-Speed 9365.51 samples/sec Loss 7.6297 LearningRate 0.0587 Epoch: 4 Global Step: 77990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:40:03,606-Speed 9456.06 samples/sec Loss 7.6724 LearningRate 0.0587 Epoch: 4 Global Step: 78000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:40:25,575-[lfw][78000]XNorm: 12.139055 Training: 2022-04-11 14:40:25,576-[lfw][78000]Accuracy-Flip: 0.99633+-0.00267 Training: 2022-04-11 14:40:25,576-[lfw][78000]Accuracy-Highest: 0.99633 Training: 2022-04-11 14:40:50,945-[cfp_fp][78000]XNorm: 10.332268 Training: 2022-04-11 14:40:50,945-[cfp_fp][78000]Accuracy-Flip: 0.95014+-0.01267 Training: 2022-04-11 14:40:50,946-[cfp_fp][78000]Accuracy-Highest: 0.95400 Training: 2022-04-11 14:41:12,855-[agedb_30][78000]XNorm: 11.762301 Training: 2022-04-11 14:41:12,856-[agedb_30][78000]Accuracy-Flip: 0.95900+-0.00873 Training: 2022-04-11 14:41:12,856-[agedb_30][78000]Accuracy-Highest: 0.96067 Training: 2022-04-11 14:41:13,956-Speed 145.56 samples/sec Loss 7.6344 LearningRate 0.0587 Epoch: 4 Global Step: 78010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:15,025-Speed 9588.87 samples/sec Loss 7.6983 LearningRate 0.0587 Epoch: 4 Global Step: 78020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:16,098-Speed 9545.17 samples/sec Loss 7.6208 LearningRate 0.0587 Epoch: 4 Global Step: 78030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:17,150-Speed 9744.73 samples/sec Loss 7.6669 LearningRate 0.0587 Epoch: 4 Global Step: 78040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:18,239-Speed 9408.34 samples/sec Loss 7.6989 LearningRate 0.0587 Epoch: 4 Global Step: 78050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:19,316-Speed 9512.71 samples/sec Loss 7.6719 LearningRate 0.0587 Epoch: 4 Global Step: 78060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:20,415-Speed 9328.00 samples/sec Loss 7.5922 LearningRate 0.0587 Epoch: 4 Global Step: 78070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:21,501-Speed 9431.96 samples/sec Loss 7.6070 LearningRate 0.0587 Epoch: 4 Global Step: 78080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:22,584-Speed 9457.75 samples/sec Loss 7.5512 LearningRate 0.0587 Epoch: 4 Global Step: 78090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:23,642-Speed 9686.37 samples/sec Loss 7.5096 LearningRate 0.0587 Epoch: 4 Global Step: 78100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:24,724-Speed 9468.99 samples/sec Loss 7.9518 LearningRate 0.0587 Epoch: 4 Global Step: 78110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:25,799-Speed 9530.21 samples/sec Loss 7.7583 LearningRate 0.0587 Epoch: 4 Global Step: 78120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:26,879-Speed 9488.90 samples/sec Loss 7.8605 LearningRate 0.0587 Epoch: 4 Global Step: 78130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:27,945-Speed 9612.60 samples/sec Loss 7.6759 LearningRate 0.0587 Epoch: 4 Global Step: 78140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:29,020-Speed 9528.87 samples/sec Loss 7.7340 LearningRate 0.0587 Epoch: 4 Global Step: 78150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:30,084-Speed 9632.84 samples/sec Loss 7.7140 LearningRate 0.0587 Epoch: 4 Global Step: 78160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:31,156-Speed 9558.64 samples/sec Loss 7.6470 LearningRate 0.0586 Epoch: 4 Global Step: 78170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:32,235-Speed 9489.26 samples/sec Loss 7.7518 LearningRate 0.0586 Epoch: 4 Global Step: 78180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:33,314-Speed 9509.40 samples/sec Loss 7.6634 LearningRate 0.0586 Epoch: 4 Global Step: 78190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:34,370-Speed 9700.43 samples/sec Loss 7.7305 LearningRate 0.0586 Epoch: 4 Global Step: 78200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:35,456-Speed 9437.90 samples/sec Loss 7.6415 LearningRate 0.0586 Epoch: 4 Global Step: 78210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:36,506-Speed 9750.47 samples/sec Loss 7.6485 LearningRate 0.0586 Epoch: 4 Global Step: 78220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:37,599-Speed 9376.38 samples/sec Loss 7.6155 LearningRate 0.0586 Epoch: 4 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:38,660-Speed 9657.31 samples/sec Loss 7.6183 LearningRate 0.0586 Epoch: 4 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:39,739-Speed 9491.07 samples/sec Loss 7.6430 LearningRate 0.0586 Epoch: 4 Global Step: 78250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:40,855-Speed 9180.89 samples/sec Loss 7.7151 LearningRate 0.0586 Epoch: 4 Global Step: 78260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:41,916-Speed 9655.25 samples/sec Loss 7.6103 LearningRate 0.0586 Epoch: 4 Global Step: 78270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:42,964-Speed 9786.15 samples/sec Loss 7.6157 LearningRate 0.0586 Epoch: 4 Global Step: 78280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:44,038-Speed 9538.34 samples/sec Loss 7.6257 LearningRate 0.0586 Epoch: 4 Global Step: 78290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:45,152-Speed 9194.13 samples/sec Loss 7.7694 LearningRate 0.0586 Epoch: 4 Global Step: 78300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:41:46,230-Speed 9510.07 samples/sec Loss 7.6204 LearningRate 0.0586 Epoch: 4 Global Step: 78310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:47,321-Speed 9392.56 samples/sec Loss 7.6486 LearningRate 0.0586 Epoch: 4 Global Step: 78320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:48,428-Speed 9254.02 samples/sec Loss 7.6059 LearningRate 0.0586 Epoch: 4 Global Step: 78330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:49,527-Speed 9324.82 samples/sec Loss 7.5561 LearningRate 0.0586 Epoch: 4 Global Step: 78340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:50,617-Speed 9394.56 samples/sec Loss 7.6917 LearningRate 0.0586 Epoch: 4 Global Step: 78350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:51,656-Speed 9864.86 samples/sec Loss 7.6668 LearningRate 0.0586 Epoch: 4 Global Step: 78360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:52,744-Speed 9414.59 samples/sec Loss 7.6037 LearningRate 0.0586 Epoch: 4 Global Step: 78370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:53,838-Speed 9373.20 samples/sec Loss 7.7943 LearningRate 0.0586 Epoch: 4 Global Step: 78380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:54,922-Speed 9445.24 samples/sec Loss 7.6972 LearningRate 0.0585 Epoch: 4 Global Step: 78390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:56,085-Speed 8814.80 samples/sec Loss 7.6051 LearningRate 0.0585 Epoch: 4 Global Step: 78400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:57,159-Speed 9535.13 samples/sec Loss 7.4433 LearningRate 0.0585 Epoch: 4 Global Step: 78410 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:41:58,273-Speed 9196.97 samples/sec Loss 7.6419 LearningRate 0.0585 Epoch: 4 Global Step: 78420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:41:59,342-Speed 9589.70 samples/sec Loss 7.6463 LearningRate 0.0585 Epoch: 4 Global Step: 78430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:00,417-Speed 9524.10 samples/sec Loss 7.6767 LearningRate 0.0585 Epoch: 4 Global Step: 78440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:01,510-Speed 9378.57 samples/sec Loss 7.6365 LearningRate 0.0585 Epoch: 4 Global Step: 78450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:02,614-Speed 9278.19 samples/sec Loss 7.6809 LearningRate 0.0585 Epoch: 4 Global Step: 78460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:03,699-Speed 9446.94 samples/sec Loss 7.5331 LearningRate 0.0585 Epoch: 4 Global Step: 78470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:04,773-Speed 9544.33 samples/sec Loss 7.5321 LearningRate 0.0585 Epoch: 4 Global Step: 78480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:05,869-Speed 9344.14 samples/sec Loss 7.6945 LearningRate 0.0585 Epoch: 4 Global Step: 78490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:06,951-Speed 9476.16 samples/sec Loss 7.7459 LearningRate 0.0585 Epoch: 4 Global Step: 78500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:08,044-Speed 9367.85 samples/sec Loss 7.7671 LearningRate 0.0585 Epoch: 4 Global Step: 78510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:09,145-Speed 9305.70 samples/sec Loss 7.6302 LearningRate 0.0585 Epoch: 4 Global Step: 78520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:10,245-Speed 9319.74 samples/sec Loss 7.7015 LearningRate 0.0585 Epoch: 4 Global Step: 78530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:11,369-Speed 9112.59 samples/sec Loss 7.5634 LearningRate 0.0585 Epoch: 4 Global Step: 78540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:12,480-Speed 9219.71 samples/sec Loss 7.6354 LearningRate 0.0585 Epoch: 4 Global Step: 78550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:13,573-Speed 9379.47 samples/sec Loss 7.5868 LearningRate 0.0585 Epoch: 4 Global Step: 78560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:14,647-Speed 9533.92 samples/sec Loss 7.6194 LearningRate 0.0585 Epoch: 4 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:15,704-Speed 9693.11 samples/sec Loss 7.7923 LearningRate 0.0585 Epoch: 4 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:16,778-Speed 9543.43 samples/sec Loss 7.7865 LearningRate 0.0585 Epoch: 4 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:17,849-Speed 9569.94 samples/sec Loss 7.6087 LearningRate 0.0585 Epoch: 4 Global Step: 78600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:18,933-Speed 9455.40 samples/sec Loss 7.5640 LearningRate 0.0584 Epoch: 4 Global Step: 78610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:19,985-Speed 9737.94 samples/sec Loss 7.5512 LearningRate 0.0584 Epoch: 4 Global Step: 78620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:21,023-Speed 9867.34 samples/sec Loss 7.6808 LearningRate 0.0584 Epoch: 4 Global Step: 78630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:22,112-Speed 9414.03 samples/sec Loss 7.6282 LearningRate 0.0584 Epoch: 4 Global Step: 78640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:23,254-Speed 8974.37 samples/sec Loss 7.8285 LearningRate 0.0584 Epoch: 4 Global Step: 78650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:24,356-Speed 9295.14 samples/sec Loss 7.7123 LearningRate 0.0584 Epoch: 4 Global Step: 78660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:25,481-Speed 9109.25 samples/sec Loss 7.5911 LearningRate 0.0584 Epoch: 4 Global Step: 78670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:26,576-Speed 9356.41 samples/sec Loss 7.5898 LearningRate 0.0584 Epoch: 4 Global Step: 78680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:27,642-Speed 9615.27 samples/sec Loss 7.6383 LearningRate 0.0584 Epoch: 4 Global Step: 78690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:28,721-Speed 9497.57 samples/sec Loss 7.6419 LearningRate 0.0584 Epoch: 4 Global Step: 78700 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:42:29,752-Speed 9933.50 samples/sec Loss 7.6020 LearningRate 0.0584 Epoch: 4 Global Step: 78710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:30,825-Speed 9552.13 samples/sec Loss 7.6641 LearningRate 0.0584 Epoch: 4 Global Step: 78720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:31,906-Speed 9472.58 samples/sec Loss 7.7830 LearningRate 0.0584 Epoch: 4 Global Step: 78730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:32,975-Speed 9592.01 samples/sec Loss 7.6197 LearningRate 0.0584 Epoch: 4 Global Step: 78740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:34,050-Speed 9538.89 samples/sec Loss 7.5458 LearningRate 0.0584 Epoch: 4 Global Step: 78750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:35,122-Speed 9562.06 samples/sec Loss 7.5982 LearningRate 0.0584 Epoch: 4 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:36,182-Speed 9665.95 samples/sec Loss 7.7203 LearningRate 0.0584 Epoch: 4 Global Step: 78770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:37,277-Speed 9357.60 samples/sec Loss 7.7123 LearningRate 0.0584 Epoch: 4 Global Step: 78780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:38,368-Speed 9398.10 samples/sec Loss 7.6778 LearningRate 0.0584 Epoch: 4 Global Step: 78790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:39,457-Speed 9402.42 samples/sec Loss 7.5970 LearningRate 0.0584 Epoch: 4 Global Step: 78800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:40,569-Speed 9214.59 samples/sec Loss 7.6721 LearningRate 0.0584 Epoch: 4 Global Step: 78810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:41,642-Speed 9546.66 samples/sec Loss 7.7090 LearningRate 0.0584 Epoch: 4 Global Step: 78820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:42,785-Speed 8968.11 samples/sec Loss 7.5321 LearningRate 0.0583 Epoch: 4 Global Step: 78830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:43,932-Speed 8938.18 samples/sec Loss 7.5087 LearningRate 0.0583 Epoch: 4 Global Step: 78840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:45,016-Speed 9443.80 samples/sec Loss 7.7136 LearningRate 0.0583 Epoch: 4 Global Step: 78850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:46,105-Speed 9413.04 samples/sec Loss 7.5735 LearningRate 0.0583 Epoch: 4 Global Step: 78860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:47,192-Speed 9424.44 samples/sec Loss 7.7427 LearningRate 0.0583 Epoch: 4 Global Step: 78870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:48,320-Speed 9083.05 samples/sec Loss 7.7805 LearningRate 0.0583 Epoch: 4 Global Step: 78880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:49,418-Speed 9332.29 samples/sec Loss 7.6143 LearningRate 0.0583 Epoch: 4 Global Step: 78890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:50,519-Speed 9306.74 samples/sec Loss 7.6248 LearningRate 0.0583 Epoch: 4 Global Step: 78900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:42:51,608-Speed 9409.48 samples/sec Loss 7.7363 LearningRate 0.0583 Epoch: 4 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:52,670-Speed 9646.89 samples/sec Loss 7.5733 LearningRate 0.0583 Epoch: 4 Global Step: 78920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:53,738-Speed 9598.93 samples/sec Loss 7.7328 LearningRate 0.0583 Epoch: 4 Global Step: 78930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:54,825-Speed 9426.24 samples/sec Loss 7.5830 LearningRate 0.0583 Epoch: 4 Global Step: 78940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:55,892-Speed 9598.51 samples/sec Loss 7.6245 LearningRate 0.0583 Epoch: 4 Global Step: 78950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:56,985-Speed 9372.99 samples/sec Loss 7.6217 LearningRate 0.0583 Epoch: 4 Global Step: 78960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:58,038-Speed 9736.32 samples/sec Loss 7.6176 LearningRate 0.0583 Epoch: 4 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:42:59,183-Speed 8948.59 samples/sec Loss 7.7644 LearningRate 0.0583 Epoch: 4 Global Step: 78980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:00,250-Speed 9600.04 samples/sec Loss 7.6796 LearningRate 0.0583 Epoch: 4 Global Step: 78990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:01,402-Speed 8894.24 samples/sec Loss 7.6895 LearningRate 0.0583 Epoch: 4 Global Step: 79000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:02,443-Speed 9840.62 samples/sec Loss 7.6015 LearningRate 0.0583 Epoch: 4 Global Step: 79010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:03,519-Speed 9527.06 samples/sec Loss 7.6710 LearningRate 0.0583 Epoch: 4 Global Step: 79020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:04,584-Speed 9615.03 samples/sec Loss 7.6409 LearningRate 0.0583 Epoch: 4 Global Step: 79030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:05,683-Speed 9324.21 samples/sec Loss 7.6685 LearningRate 0.0583 Epoch: 4 Global Step: 79040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:06,789-Speed 9267.66 samples/sec Loss 7.6506 LearningRate 0.0582 Epoch: 4 Global Step: 79050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:07,922-Speed 9043.42 samples/sec Loss 7.6567 LearningRate 0.0582 Epoch: 4 Global Step: 79060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:09,030-Speed 9246.58 samples/sec Loss 7.6822 LearningRate 0.0582 Epoch: 4 Global Step: 79070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:10,104-Speed 9543.21 samples/sec Loss 7.7039 LearningRate 0.0582 Epoch: 4 Global Step: 79080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:11,164-Speed 9661.00 samples/sec Loss 7.5988 LearningRate 0.0582 Epoch: 4 Global Step: 79090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:12,256-Speed 9402.19 samples/sec Loss 7.6416 LearningRate 0.0582 Epoch: 4 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:13,339-Speed 9460.50 samples/sec Loss 7.6069 LearningRate 0.0582 Epoch: 4 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:14,436-Speed 9339.38 samples/sec Loss 7.6536 LearningRate 0.0582 Epoch: 4 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:15,505-Speed 9577.33 samples/sec Loss 7.6769 LearningRate 0.0582 Epoch: 4 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:16,593-Speed 9424.41 samples/sec Loss 7.6410 LearningRate 0.0582 Epoch: 4 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:17,689-Speed 9349.02 samples/sec Loss 7.5814 LearningRate 0.0582 Epoch: 4 Global Step: 79150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:18,759-Speed 9574.26 samples/sec Loss 7.5709 LearningRate 0.0582 Epoch: 4 Global Step: 79160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:19,880-Speed 9140.76 samples/sec Loss 7.6847 LearningRate 0.0582 Epoch: 4 Global Step: 79170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:20,997-Speed 9168.11 samples/sec Loss 7.5840 LearningRate 0.0582 Epoch: 4 Global Step: 79180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:43:22,130-Speed 9042.92 samples/sec Loss 7.5816 LearningRate 0.0582 Epoch: 4 Global Step: 79190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:23,279-Speed 8916.48 samples/sec Loss 7.6273 LearningRate 0.0582 Epoch: 4 Global Step: 79200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:24,320-Speed 9854.97 samples/sec Loss 7.5100 LearningRate 0.0582 Epoch: 4 Global Step: 79210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:25,398-Speed 9498.46 samples/sec Loss 7.7014 LearningRate 0.0582 Epoch: 4 Global Step: 79220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:26,452-Speed 9719.57 samples/sec Loss 7.5915 LearningRate 0.0582 Epoch: 4 Global Step: 79230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:27,539-Speed 9425.27 samples/sec Loss 7.5320 LearningRate 0.0582 Epoch: 4 Global Step: 79240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:28,679-Speed 8991.43 samples/sec Loss 7.6848 LearningRate 0.0582 Epoch: 4 Global Step: 79250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:29,777-Speed 9331.71 samples/sec Loss 7.6142 LearningRate 0.0582 Epoch: 4 Global Step: 79260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:30,848-Speed 9563.16 samples/sec Loss 7.5720 LearningRate 0.0581 Epoch: 4 Global Step: 79270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:31,947-Speed 9322.63 samples/sec Loss 7.6906 LearningRate 0.0581 Epoch: 4 Global Step: 79280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:33,083-Speed 9018.61 samples/sec Loss 7.6502 LearningRate 0.0581 Epoch: 4 Global Step: 79290 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:43:34,160-Speed 9516.85 samples/sec Loss 7.6267 LearningRate 0.0581 Epoch: 4 Global Step: 79300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:35,240-Speed 9480.89 samples/sec Loss 7.6877 LearningRate 0.0581 Epoch: 4 Global Step: 79310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:36,368-Speed 9084.06 samples/sec Loss 7.6617 LearningRate 0.0581 Epoch: 4 Global Step: 79320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:37,465-Speed 9345.50 samples/sec Loss 7.5734 LearningRate 0.0581 Epoch: 4 Global Step: 79330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:38,562-Speed 9343.55 samples/sec Loss 7.6454 LearningRate 0.0581 Epoch: 4 Global Step: 79340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:39,651-Speed 9401.79 samples/sec Loss 7.6354 LearningRate 0.0581 Epoch: 4 Global Step: 79350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:40,784-Speed 9044.83 samples/sec Loss 7.5582 LearningRate 0.0581 Epoch: 4 Global Step: 79360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:41,898-Speed 9200.00 samples/sec Loss 7.6135 LearningRate 0.0581 Epoch: 4 Global Step: 79370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:42,971-Speed 9548.00 samples/sec Loss 7.7361 LearningRate 0.0581 Epoch: 4 Global Step: 79380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:44,007-Speed 9887.87 samples/sec Loss 7.6591 LearningRate 0.0581 Epoch: 4 Global Step: 79390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:45,091-Speed 9454.74 samples/sec Loss 7.6755 LearningRate 0.0581 Epoch: 4 Global Step: 79400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:46,132-Speed 9844.13 samples/sec Loss 7.6758 LearningRate 0.0581 Epoch: 4 Global Step: 79410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:47,235-Speed 9290.34 samples/sec Loss 7.7314 LearningRate 0.0581 Epoch: 4 Global Step: 79420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:48,308-Speed 9545.67 samples/sec Loss 7.7109 LearningRate 0.0581 Epoch: 4 Global Step: 79430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:49,387-Speed 9501.17 samples/sec Loss 7.5758 LearningRate 0.0581 Epoch: 4 Global Step: 79440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:50,456-Speed 9585.02 samples/sec Loss 7.6092 LearningRate 0.0581 Epoch: 4 Global Step: 79450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:51,566-Speed 9227.17 samples/sec Loss 7.6520 LearningRate 0.0581 Epoch: 4 Global Step: 79460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:52,649-Speed 9461.90 samples/sec Loss 7.5804 LearningRate 0.0581 Epoch: 4 Global Step: 79470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:53,750-Speed 9311.20 samples/sec Loss 7.6462 LearningRate 0.0581 Epoch: 4 Global Step: 79480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:54,826-Speed 9520.96 samples/sec Loss 7.6264 LearningRate 0.0580 Epoch: 4 Global Step: 79490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:43:55,874-Speed 9775.78 samples/sec Loss 7.6676 LearningRate 0.0580 Epoch: 4 Global Step: 79500 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:43:56,976-Speed 9297.48 samples/sec Loss 7.6555 LearningRate 0.0580 Epoch: 4 Global Step: 79510 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:43:58,030-Speed 9715.73 samples/sec Loss 7.6472 LearningRate 0.0580 Epoch: 4 Global Step: 79520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:43:59,071-Speed 9842.48 samples/sec Loss 7.6761 LearningRate 0.0580 Epoch: 4 Global Step: 79530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:00,183-Speed 9215.54 samples/sec Loss 7.6316 LearningRate 0.0580 Epoch: 4 Global Step: 79540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:01,226-Speed 9825.17 samples/sec Loss 7.5183 LearningRate 0.0580 Epoch: 4 Global Step: 79550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:02,300-Speed 9538.50 samples/sec Loss 7.6593 LearningRate 0.0580 Epoch: 4 Global Step: 79560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:03,358-Speed 9681.66 samples/sec Loss 7.6537 LearningRate 0.0580 Epoch: 4 Global Step: 79570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:04,465-Speed 9255.63 samples/sec Loss 7.4911 LearningRate 0.0580 Epoch: 4 Global Step: 79580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:05,528-Speed 9647.10 samples/sec Loss 7.5809 LearningRate 0.0580 Epoch: 4 Global Step: 79590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:06,639-Speed 9220.29 samples/sec Loss 7.5796 LearningRate 0.0580 Epoch: 4 Global Step: 79600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:07,765-Speed 9100.09 samples/sec Loss 7.6817 LearningRate 0.0580 Epoch: 4 Global Step: 79610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:08,838-Speed 9548.29 samples/sec Loss 7.6712 LearningRate 0.0580 Epoch: 4 Global Step: 79620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:09,874-Speed 9892.85 samples/sec Loss 7.7449 LearningRate 0.0580 Epoch: 4 Global Step: 79630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:10,919-Speed 9799.17 samples/sec Loss 7.5920 LearningRate 0.0580 Epoch: 4 Global Step: 79640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:12,020-Speed 9302.76 samples/sec Loss 7.7382 LearningRate 0.0580 Epoch: 4 Global Step: 79650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:13,108-Speed 9420.27 samples/sec Loss 7.5549 LearningRate 0.0580 Epoch: 4 Global Step: 79660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:14,190-Speed 9466.24 samples/sec Loss 7.5684 LearningRate 0.0580 Epoch: 4 Global Step: 79670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:15,286-Speed 9347.86 samples/sec Loss 7.6755 LearningRate 0.0580 Epoch: 4 Global Step: 79680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:16,351-Speed 9621.14 samples/sec Loss 7.6882 LearningRate 0.0580 Epoch: 4 Global Step: 79690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:17,432-Speed 9486.93 samples/sec Loss 7.6076 LearningRate 0.0579 Epoch: 4 Global Step: 79700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:18,546-Speed 9192.29 samples/sec Loss 7.6891 LearningRate 0.0579 Epoch: 4 Global Step: 79710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:19,639-Speed 9378.28 samples/sec Loss 7.6508 LearningRate 0.0579 Epoch: 4 Global Step: 79720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:20,740-Speed 9302.98 samples/sec Loss 7.6833 LearningRate 0.0579 Epoch: 4 Global Step: 79730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:21,816-Speed 9519.02 samples/sec Loss 7.6770 LearningRate 0.0579 Epoch: 4 Global Step: 79740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:22,896-Speed 9493.68 samples/sec Loss 7.6838 LearningRate 0.0579 Epoch: 4 Global Step: 79750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:23,962-Speed 9621.10 samples/sec Loss 7.5915 LearningRate 0.0579 Epoch: 4 Global Step: 79760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:25,054-Speed 9375.86 samples/sec Loss 7.6719 LearningRate 0.0579 Epoch: 4 Global Step: 79770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:26,121-Speed 9607.18 samples/sec Loss 7.6355 LearningRate 0.0579 Epoch: 4 Global Step: 79780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:27,191-Speed 9577.28 samples/sec Loss 7.6503 LearningRate 0.0579 Epoch: 4 Global Step: 79790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:28,287-Speed 9347.01 samples/sec Loss 7.5247 LearningRate 0.0579 Epoch: 4 Global Step: 79800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:29,338-Speed 9742.72 samples/sec Loss 7.4152 LearningRate 0.0579 Epoch: 4 Global Step: 79810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:30,414-Speed 9525.27 samples/sec Loss 7.5846 LearningRate 0.0579 Epoch: 4 Global Step: 79820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:31,471-Speed 9693.16 samples/sec Loss 7.6397 LearningRate 0.0579 Epoch: 4 Global Step: 79830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:32,606-Speed 9027.21 samples/sec Loss 7.6242 LearningRate 0.0579 Epoch: 4 Global Step: 79840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:33,661-Speed 9712.33 samples/sec Loss 7.6439 LearningRate 0.0579 Epoch: 4 Global Step: 79850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:34,773-Speed 9211.60 samples/sec Loss 7.6878 LearningRate 0.0579 Epoch: 4 Global Step: 79860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:35,865-Speed 9386.66 samples/sec Loss 7.6978 LearningRate 0.0579 Epoch: 4 Global Step: 79870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:36,976-Speed 9224.70 samples/sec Loss 7.5249 LearningRate 0.0579 Epoch: 4 Global Step: 79880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:38,091-Speed 9192.60 samples/sec Loss 7.7018 LearningRate 0.0579 Epoch: 4 Global Step: 79890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:39,170-Speed 9491.98 samples/sec Loss 7.6060 LearningRate 0.0579 Epoch: 4 Global Step: 79900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:40,270-Speed 9316.88 samples/sec Loss 7.6314 LearningRate 0.0579 Epoch: 4 Global Step: 79910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:41,338-Speed 9592.42 samples/sec Loss 7.6474 LearningRate 0.0578 Epoch: 4 Global Step: 79920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:44:42,431-Speed 9373.79 samples/sec Loss 7.6218 LearningRate 0.0578 Epoch: 4 Global Step: 79930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:43,528-Speed 9338.63 samples/sec Loss 7.5501 LearningRate 0.0578 Epoch: 4 Global Step: 79940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:44,641-Speed 9210.30 samples/sec Loss 7.6886 LearningRate 0.0578 Epoch: 4 Global Step: 79950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:45,768-Speed 9095.44 samples/sec Loss 7.6035 LearningRate 0.0578 Epoch: 4 Global Step: 79960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:46,869-Speed 9303.90 samples/sec Loss 7.5249 LearningRate 0.0578 Epoch: 4 Global Step: 79970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:47,949-Speed 9492.00 samples/sec Loss 7.6747 LearningRate 0.0578 Epoch: 4 Global Step: 79980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:49,023-Speed 9539.43 samples/sec Loss 7.6832 LearningRate 0.0578 Epoch: 4 Global Step: 79990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:44:50,085-Speed 9652.42 samples/sec Loss 7.7134 LearningRate 0.0578 Epoch: 4 Global Step: 80000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:45:12,056-[lfw][80000]XNorm: 12.140908 Training: 2022-04-11 14:45:12,057-[lfw][80000]Accuracy-Flip: 0.99617+-0.00183 Training: 2022-04-11 14:45:12,057-[lfw][80000]Accuracy-Highest: 0.99633 Training: 2022-04-11 14:45:37,494-[cfp_fp][80000]XNorm: 10.246566 Training: 2022-04-11 14:45:37,495-[cfp_fp][80000]Accuracy-Flip: 0.95157+-0.01238 Training: 2022-04-11 14:45:37,496-[cfp_fp][80000]Accuracy-Highest: 0.95400 Training: 2022-04-11 14:45:59,450-[agedb_30][80000]XNorm: 11.691534 Training: 2022-04-11 14:45:59,450-[agedb_30][80000]Accuracy-Flip: 0.96083+-0.00938 Training: 2022-04-11 14:45:59,450-[agedb_30][80000]Accuracy-Highest: 0.96083 Training: 2022-04-11 14:46:00,495-Speed 145.43 samples/sec Loss 7.5942 LearningRate 0.0578 Epoch: 4 Global Step: 80010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:01,570-Speed 9529.84 samples/sec Loss 7.7299 LearningRate 0.0578 Epoch: 4 Global Step: 80020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:02,638-Speed 9594.28 samples/sec Loss 7.6507 LearningRate 0.0578 Epoch: 4 Global Step: 80030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:03,712-Speed 9537.68 samples/sec Loss 7.5572 LearningRate 0.0578 Epoch: 4 Global Step: 80040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:04,760-Speed 9781.99 samples/sec Loss 7.6443 LearningRate 0.0578 Epoch: 4 Global Step: 80050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:05,832-Speed 9555.99 samples/sec Loss 7.5612 LearningRate 0.0578 Epoch: 4 Global Step: 80060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:06,893-Speed 9655.78 samples/sec Loss 7.6196 LearningRate 0.0578 Epoch: 4 Global Step: 80070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:07,966-Speed 9554.50 samples/sec Loss 7.6145 LearningRate 0.0578 Epoch: 4 Global Step: 80080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:09,029-Speed 9638.98 samples/sec Loss 7.5917 LearningRate 0.0578 Epoch: 4 Global Step: 80090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:10,124-Speed 9357.75 samples/sec Loss 7.7587 LearningRate 0.0578 Epoch: 4 Global Step: 80100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:11,190-Speed 9609.94 samples/sec Loss 7.5311 LearningRate 0.0578 Epoch: 4 Global Step: 80110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:12,273-Speed 9460.28 samples/sec Loss 7.5843 LearningRate 0.0578 Epoch: 4 Global Step: 80120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:13,335-Speed 9650.33 samples/sec Loss 7.6020 LearningRate 0.0578 Epoch: 4 Global Step: 80130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:14,406-Speed 9563.90 samples/sec Loss 7.4643 LearningRate 0.0577 Epoch: 4 Global Step: 80140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:15,486-Speed 9490.58 samples/sec Loss 7.6195 LearningRate 0.0577 Epoch: 4 Global Step: 80150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:16,569-Speed 9460.94 samples/sec Loss 7.6370 LearningRate 0.0577 Epoch: 4 Global Step: 80160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:17,616-Speed 9779.01 samples/sec Loss 7.4860 LearningRate 0.0577 Epoch: 4 Global Step: 80170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:18,729-Speed 9214.98 samples/sec Loss 7.5098 LearningRate 0.0577 Epoch: 4 Global Step: 80180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:19,793-Speed 9625.47 samples/sec Loss 7.6363 LearningRate 0.0577 Epoch: 4 Global Step: 80190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:20,849-Speed 9698.04 samples/sec Loss 7.5132 LearningRate 0.0577 Epoch: 4 Global Step: 80200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:21,913-Speed 9637.70 samples/sec Loss 7.6520 LearningRate 0.0577 Epoch: 4 Global Step: 80210 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:46:22,976-Speed 9635.04 samples/sec Loss 7.5011 LearningRate 0.0577 Epoch: 4 Global Step: 80220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:24,051-Speed 9530.60 samples/sec Loss 7.6032 LearningRate 0.0577 Epoch: 4 Global Step: 80230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:25,105-Speed 9717.12 samples/sec Loss 7.5876 LearningRate 0.0577 Epoch: 4 Global Step: 80240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:26,225-Speed 9152.13 samples/sec Loss 7.6088 LearningRate 0.0577 Epoch: 4 Global Step: 80250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:27,344-Speed 9154.30 samples/sec Loss 7.6109 LearningRate 0.0577 Epoch: 4 Global Step: 80260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:28,459-Speed 9188.34 samples/sec Loss 7.4627 LearningRate 0.0577 Epoch: 4 Global Step: 80270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:29,541-Speed 9471.34 samples/sec Loss 7.6277 LearningRate 0.0577 Epoch: 4 Global Step: 80280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:30,598-Speed 9693.26 samples/sec Loss 7.6521 LearningRate 0.0577 Epoch: 4 Global Step: 80290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:31,711-Speed 9209.01 samples/sec Loss 7.6173 LearningRate 0.0577 Epoch: 4 Global Step: 80300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:32,815-Speed 9279.39 samples/sec Loss 7.7147 LearningRate 0.0577 Epoch: 4 Global Step: 80310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:33,838-Speed 10023.87 samples/sec Loss 7.5528 LearningRate 0.0577 Epoch: 4 Global Step: 80320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:34,919-Speed 9474.96 samples/sec Loss 7.4884 LearningRate 0.0577 Epoch: 4 Global Step: 80330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:35,991-Speed 9560.25 samples/sec Loss 7.6017 LearningRate 0.0577 Epoch: 4 Global Step: 80340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:37,054-Speed 9635.80 samples/sec Loss 7.5063 LearningRate 0.0577 Epoch: 4 Global Step: 80350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:38,144-Speed 9400.08 samples/sec Loss 7.6289 LearningRate 0.0576 Epoch: 4 Global Step: 80360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:39,211-Speed 9597.04 samples/sec Loss 7.6931 LearningRate 0.0576 Epoch: 4 Global Step: 80370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:40,254-Speed 9823.03 samples/sec Loss 7.5676 LearningRate 0.0576 Epoch: 4 Global Step: 80380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:41,351-Speed 9345.89 samples/sec Loss 7.4775 LearningRate 0.0576 Epoch: 4 Global Step: 80390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:42,470-Speed 9156.69 samples/sec Loss 7.5979 LearningRate 0.0576 Epoch: 4 Global Step: 80400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:43,589-Speed 9155.22 samples/sec Loss 7.7466 LearningRate 0.0576 Epoch: 4 Global Step: 80410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:44,665-Speed 9523.15 samples/sec Loss 7.6292 LearningRate 0.0576 Epoch: 4 Global Step: 80420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:45,738-Speed 9545.37 samples/sec Loss 7.5393 LearningRate 0.0576 Epoch: 4 Global Step: 80430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:46,857-Speed 9158.42 samples/sec Loss 7.6011 LearningRate 0.0576 Epoch: 4 Global Step: 80440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:47,918-Speed 9655.28 samples/sec Loss 7.6049 LearningRate 0.0576 Epoch: 4 Global Step: 80450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:49,070-Speed 8895.57 samples/sec Loss 7.6685 LearningRate 0.0576 Epoch: 4 Global Step: 80460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:50,173-Speed 9290.95 samples/sec Loss 7.6443 LearningRate 0.0576 Epoch: 4 Global Step: 80470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:51,251-Speed 9507.96 samples/sec Loss 7.5933 LearningRate 0.0576 Epoch: 4 Global Step: 80480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:52,337-Speed 9434.81 samples/sec Loss 7.6591 LearningRate 0.0576 Epoch: 4 Global Step: 80490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:53,434-Speed 9338.90 samples/sec Loss 7.6044 LearningRate 0.0576 Epoch: 4 Global Step: 80500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:54,530-Speed 9346.34 samples/sec Loss 7.5204 LearningRate 0.0576 Epoch: 4 Global Step: 80510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:46:55,639-Speed 9243.30 samples/sec Loss 7.5135 LearningRate 0.0576 Epoch: 4 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:56,740-Speed 9302.07 samples/sec Loss 7.5288 LearningRate 0.0576 Epoch: 4 Global Step: 80530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:57,819-Speed 9494.90 samples/sec Loss 7.6336 LearningRate 0.0576 Epoch: 4 Global Step: 80540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:58,916-Speed 9343.78 samples/sec Loss 7.6237 LearningRate 0.0576 Epoch: 4 Global Step: 80550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:46:59,997-Speed 9477.82 samples/sec Loss 7.5137 LearningRate 0.0576 Epoch: 4 Global Step: 80560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:01,083-Speed 9436.33 samples/sec Loss 7.6105 LearningRate 0.0576 Epoch: 4 Global Step: 80570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:02,168-Speed 9437.44 samples/sec Loss 7.7041 LearningRate 0.0575 Epoch: 4 Global Step: 80580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:03,256-Speed 9420.58 samples/sec Loss 7.5461 LearningRate 0.0575 Epoch: 4 Global Step: 80590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:04,287-Speed 9934.15 samples/sec Loss 7.6228 LearningRate 0.0575 Epoch: 4 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:05,350-Speed 9644.31 samples/sec Loss 7.6515 LearningRate 0.0575 Epoch: 4 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:06,428-Speed 9500.78 samples/sec Loss 7.5587 LearningRate 0.0575 Epoch: 4 Global Step: 80620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:07,530-Speed 9293.78 samples/sec Loss 7.6525 LearningRate 0.0575 Epoch: 4 Global Step: 80630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:08,628-Speed 9334.03 samples/sec Loss 7.4652 LearningRate 0.0575 Epoch: 4 Global Step: 80640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:09,654-Speed 9983.09 samples/sec Loss 7.3554 LearningRate 0.0575 Epoch: 4 Global Step: 80650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:10,743-Speed 9412.84 samples/sec Loss 7.5148 LearningRate 0.0575 Epoch: 4 Global Step: 80660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:11,825-Speed 9473.11 samples/sec Loss 7.4732 LearningRate 0.0575 Epoch: 4 Global Step: 80670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:12,861-Speed 9888.51 samples/sec Loss 7.6090 LearningRate 0.0575 Epoch: 4 Global Step: 80680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:13,931-Speed 9580.08 samples/sec Loss 7.5580 LearningRate 0.0575 Epoch: 4 Global Step: 80690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:15,013-Speed 9466.49 samples/sec Loss 7.5368 LearningRate 0.0575 Epoch: 4 Global Step: 80700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:16,098-Speed 9444.07 samples/sec Loss 7.6092 LearningRate 0.0575 Epoch: 4 Global Step: 80710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:17,208-Speed 9230.91 samples/sec Loss 7.5738 LearningRate 0.0575 Epoch: 4 Global Step: 80720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:18,298-Speed 9401.46 samples/sec Loss 7.6380 LearningRate 0.0575 Epoch: 4 Global Step: 80730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:19,367-Speed 9579.06 samples/sec Loss 7.8047 LearningRate 0.0575 Epoch: 4 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:20,469-Speed 9298.51 samples/sec Loss 7.6525 LearningRate 0.0575 Epoch: 4 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:21,554-Speed 9443.85 samples/sec Loss 7.6468 LearningRate 0.0575 Epoch: 4 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:22,638-Speed 9452.94 samples/sec Loss 7.5903 LearningRate 0.0575 Epoch: 4 Global Step: 80770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:23,746-Speed 9242.37 samples/sec Loss 7.6305 LearningRate 0.0575 Epoch: 4 Global Step: 80780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:24,816-Speed 9578.94 samples/sec Loss 7.4774 LearningRate 0.0575 Epoch: 4 Global Step: 80790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:25,871-Speed 9713.82 samples/sec Loss 7.5449 LearningRate 0.0574 Epoch: 4 Global Step: 80800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:26,957-Speed 9434.46 samples/sec Loss 7.5410 LearningRate 0.0574 Epoch: 4 Global Step: 80810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:28,015-Speed 9680.41 samples/sec Loss 7.5850 LearningRate 0.0574 Epoch: 4 Global Step: 80820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:29,116-Speed 9310.98 samples/sec Loss 7.6086 LearningRate 0.0574 Epoch: 4 Global Step: 80830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:30,212-Speed 9345.87 samples/sec Loss 7.5443 LearningRate 0.0574 Epoch: 4 Global Step: 80840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:31,292-Speed 9495.04 samples/sec Loss 7.5814 LearningRate 0.0574 Epoch: 4 Global Step: 80850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:32,355-Speed 9633.74 samples/sec Loss 7.5575 LearningRate 0.0574 Epoch: 4 Global Step: 80860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:33,424-Speed 9582.88 samples/sec Loss 7.5879 LearningRate 0.0574 Epoch: 4 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:34,499-Speed 9532.86 samples/sec Loss 7.6831 LearningRate 0.0574 Epoch: 4 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:35,544-Speed 9804.61 samples/sec Loss 7.6507 LearningRate 0.0574 Epoch: 4 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:36,617-Speed 9549.16 samples/sec Loss 7.5562 LearningRate 0.0574 Epoch: 4 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:37,667-Speed 9764.76 samples/sec Loss 7.5857 LearningRate 0.0574 Epoch: 4 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:38,761-Speed 9359.20 samples/sec Loss 7.6059 LearningRate 0.0574 Epoch: 4 Global Step: 80920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:39,868-Speed 9261.12 samples/sec Loss 7.6655 LearningRate 0.0574 Epoch: 4 Global Step: 80930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:40,937-Speed 9582.84 samples/sec Loss 7.5098 LearningRate 0.0574 Epoch: 4 Global Step: 80940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:41,996-Speed 9676.92 samples/sec Loss 7.5171 LearningRate 0.0574 Epoch: 4 Global Step: 80950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:43,096-Speed 9312.63 samples/sec Loss 7.6478 LearningRate 0.0574 Epoch: 4 Global Step: 80960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:44,202-Speed 9261.63 samples/sec Loss 7.5671 LearningRate 0.0574 Epoch: 4 Global Step: 80970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:45,289-Speed 9432.20 samples/sec Loss 7.5659 LearningRate 0.0574 Epoch: 4 Global Step: 80980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:46,349-Speed 9663.44 samples/sec Loss 7.5846 LearningRate 0.0574 Epoch: 4 Global Step: 80990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:47,417-Speed 9591.50 samples/sec Loss 7.5476 LearningRate 0.0574 Epoch: 4 Global Step: 81000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:47:48,526-Speed 9247.91 samples/sec Loss 7.6058 LearningRate 0.0574 Epoch: 4 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:49,617-Speed 9386.96 samples/sec Loss 7.5739 LearningRate 0.0573 Epoch: 4 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:50,667-Speed 9757.77 samples/sec Loss 7.6020 LearningRate 0.0573 Epoch: 4 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:51,766-Speed 9323.37 samples/sec Loss 7.4763 LearningRate 0.0573 Epoch: 4 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:52,852-Speed 9438.94 samples/sec Loss 7.5994 LearningRate 0.0573 Epoch: 4 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:53,919-Speed 9601.62 samples/sec Loss 7.6838 LearningRate 0.0573 Epoch: 4 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:54,980-Speed 9652.92 samples/sec Loss 7.4805 LearningRate 0.0573 Epoch: 4 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:56,073-Speed 9371.86 samples/sec Loss 7.6918 LearningRate 0.0573 Epoch: 4 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:57,155-Speed 9474.12 samples/sec Loss 7.6284 LearningRate 0.0573 Epoch: 4 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:58,235-Speed 9484.99 samples/sec Loss 7.4075 LearningRate 0.0573 Epoch: 4 Global Step: 81100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:47:59,361-Speed 9100.88 samples/sec Loss 7.6548 LearningRate 0.0573 Epoch: 4 Global Step: 81110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:00,445-Speed 9448.01 samples/sec Loss 7.4049 LearningRate 0.0573 Epoch: 4 Global Step: 81120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:01,528-Speed 9467.29 samples/sec Loss 7.6265 LearningRate 0.0573 Epoch: 4 Global Step: 81130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:02,615-Speed 9422.25 samples/sec Loss 7.5795 LearningRate 0.0573 Epoch: 4 Global Step: 81140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:03,737-Speed 9128.46 samples/sec Loss 7.4682 LearningRate 0.0573 Epoch: 4 Global Step: 81150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:04,797-Speed 9669.25 samples/sec Loss 7.5919 LearningRate 0.0573 Epoch: 4 Global Step: 81160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:05,873-Speed 9526.18 samples/sec Loss 7.6015 LearningRate 0.0573 Epoch: 4 Global Step: 81170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:06,945-Speed 9557.41 samples/sec Loss 7.5697 LearningRate 0.0573 Epoch: 4 Global Step: 81180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:08,087-Speed 8965.85 samples/sec Loss 7.6876 LearningRate 0.0573 Epoch: 4 Global Step: 81190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:09,139-Speed 9745.39 samples/sec Loss 7.6244 LearningRate 0.0573 Epoch: 4 Global Step: 81200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:10,196-Speed 9696.09 samples/sec Loss 7.5883 LearningRate 0.0573 Epoch: 4 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:11,299-Speed 9282.65 samples/sec Loss 7.5061 LearningRate 0.0573 Epoch: 4 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:12,390-Speed 9398.57 samples/sec Loss 7.5420 LearningRate 0.0573 Epoch: 4 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:13,479-Speed 9412.09 samples/sec Loss 7.6061 LearningRate 0.0572 Epoch: 4 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:14,574-Speed 9352.28 samples/sec Loss 7.5260 LearningRate 0.0572 Epoch: 4 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:15,647-Speed 9545.74 samples/sec Loss 7.6761 LearningRate 0.0572 Epoch: 4 Global Step: 81260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:16,694-Speed 9787.87 samples/sec Loss 7.4123 LearningRate 0.0572 Epoch: 4 Global Step: 81270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:17,800-Speed 9262.20 samples/sec Loss 7.6168 LearningRate 0.0572 Epoch: 4 Global Step: 81280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:18,884-Speed 9458.08 samples/sec Loss 7.5734 LearningRate 0.0572 Epoch: 4 Global Step: 81290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:20,007-Speed 9124.53 samples/sec Loss 7.6251 LearningRate 0.0572 Epoch: 4 Global Step: 81300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:21,092-Speed 9444.28 samples/sec Loss 7.6272 LearningRate 0.0572 Epoch: 4 Global Step: 81310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:22,182-Speed 9392.41 samples/sec Loss 7.5711 LearningRate 0.0572 Epoch: 4 Global Step: 81320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:23,269-Speed 9425.08 samples/sec Loss 7.5356 LearningRate 0.0572 Epoch: 4 Global Step: 81330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:24,394-Speed 9112.02 samples/sec Loss 7.5252 LearningRate 0.0572 Epoch: 4 Global Step: 81340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:25,470-Speed 9521.62 samples/sec Loss 7.6183 LearningRate 0.0572 Epoch: 4 Global Step: 81350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:26,534-Speed 9630.57 samples/sec Loss 7.5484 LearningRate 0.0572 Epoch: 4 Global Step: 81360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:27,599-Speed 9617.55 samples/sec Loss 7.6389 LearningRate 0.0572 Epoch: 4 Global Step: 81370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:28,706-Speed 9257.67 samples/sec Loss 7.5060 LearningRate 0.0572 Epoch: 4 Global Step: 81380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:29,765-Speed 9674.36 samples/sec Loss 7.5391 LearningRate 0.0572 Epoch: 4 Global Step: 81390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:30,830-Speed 9629.56 samples/sec Loss 7.5465 LearningRate 0.0572 Epoch: 4 Global Step: 81400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:31,901-Speed 9567.05 samples/sec Loss 7.5156 LearningRate 0.0572 Epoch: 4 Global Step: 81410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:32,949-Speed 9777.37 samples/sec Loss 7.5521 LearningRate 0.0572 Epoch: 4 Global Step: 81420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:34,097-Speed 8922.76 samples/sec Loss 7.6537 LearningRate 0.0572 Epoch: 4 Global Step: 81430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:35,159-Speed 9650.95 samples/sec Loss 7.5646 LearningRate 0.0572 Epoch: 4 Global Step: 81440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:36,269-Speed 9231.33 samples/sec Loss 7.5114 LearningRate 0.0572 Epoch: 4 Global Step: 81450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:37,310-Speed 9836.06 samples/sec Loss 7.6591 LearningRate 0.0572 Epoch: 4 Global Step: 81460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:38,369-Speed 9678.57 samples/sec Loss 7.5401 LearningRate 0.0571 Epoch: 4 Global Step: 81470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:39,480-Speed 9217.86 samples/sec Loss 7.5610 LearningRate 0.0571 Epoch: 4 Global Step: 81480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:40,548-Speed 9600.57 samples/sec Loss 7.5362 LearningRate 0.0571 Epoch: 4 Global Step: 81490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:41,652-Speed 9281.67 samples/sec Loss 7.5454 LearningRate 0.0571 Epoch: 4 Global Step: 81500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:42,733-Speed 9472.65 samples/sec Loss 7.7549 LearningRate 0.0571 Epoch: 4 Global Step: 81510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:43,803-Speed 9574.62 samples/sec Loss 7.5661 LearningRate 0.0571 Epoch: 4 Global Step: 81520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:44,916-Speed 9211.85 samples/sec Loss 7.5043 LearningRate 0.0571 Epoch: 4 Global Step: 81530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:45,981-Speed 9613.37 samples/sec Loss 7.5183 LearningRate 0.0571 Epoch: 4 Global Step: 81540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:47,091-Speed 9234.10 samples/sec Loss 7.5882 LearningRate 0.0571 Epoch: 4 Global Step: 81550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:48,171-Speed 9490.87 samples/sec Loss 7.5072 LearningRate 0.0571 Epoch: 4 Global Step: 81560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:48:49,258-Speed 9431.06 samples/sec Loss 7.6258 LearningRate 0.0571 Epoch: 4 Global Step: 81570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:50,337-Speed 9487.86 samples/sec Loss 7.5018 LearningRate 0.0571 Epoch: 4 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:51,427-Speed 9405.83 samples/sec Loss 7.5396 LearningRate 0.0571 Epoch: 4 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:52,501-Speed 9541.15 samples/sec Loss 7.6507 LearningRate 0.0571 Epoch: 4 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:53,570-Speed 9581.82 samples/sec Loss 7.6453 LearningRate 0.0571 Epoch: 4 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:54,605-Speed 9895.39 samples/sec Loss 7.5566 LearningRate 0.0571 Epoch: 4 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:55,675-Speed 9583.73 samples/sec Loss 7.6676 LearningRate 0.0571 Epoch: 4 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:56,771-Speed 9346.87 samples/sec Loss 7.4777 LearningRate 0.0571 Epoch: 4 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:57,827-Speed 9695.64 samples/sec Loss 7.4784 LearningRate 0.0571 Epoch: 4 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:58,919-Speed 9383.49 samples/sec Loss 7.6585 LearningRate 0.0571 Epoch: 4 Global Step: 81660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:48:59,983-Speed 9632.09 samples/sec Loss 7.6009 LearningRate 0.0571 Epoch: 4 Global Step: 81670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:01,055-Speed 9561.36 samples/sec Loss 7.6830 LearningRate 0.0571 Epoch: 4 Global Step: 81680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:02,130-Speed 9560.08 samples/sec Loss 7.7026 LearningRate 0.0570 Epoch: 4 Global Step: 81690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:03,201-Speed 9566.24 samples/sec Loss 7.6072 LearningRate 0.0570 Epoch: 4 Global Step: 81700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:04,268-Speed 9605.80 samples/sec Loss 7.5413 LearningRate 0.0570 Epoch: 4 Global Step: 81710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:05,370-Speed 9297.27 samples/sec Loss 7.5726 LearningRate 0.0570 Epoch: 4 Global Step: 81720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:06,466-Speed 9350.29 samples/sec Loss 7.5226 LearningRate 0.0570 Epoch: 4 Global Step: 81730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:07,521-Speed 9716.74 samples/sec Loss 7.4880 LearningRate 0.0570 Epoch: 4 Global Step: 81740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:08,619-Speed 9326.51 samples/sec Loss 7.6370 LearningRate 0.0570 Epoch: 4 Global Step: 81750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:09,671-Speed 9738.39 samples/sec Loss 7.5245 LearningRate 0.0570 Epoch: 4 Global Step: 81760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:10,748-Speed 9514.09 samples/sec Loss 7.6072 LearningRate 0.0570 Epoch: 4 Global Step: 81770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:11,785-Speed 9885.18 samples/sec Loss 7.5212 LearningRate 0.0570 Epoch: 4 Global Step: 81780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:12,849-Speed 9628.89 samples/sec Loss 7.6712 LearningRate 0.0570 Epoch: 4 Global Step: 81790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:13,939-Speed 9404.87 samples/sec Loss 7.4932 LearningRate 0.0570 Epoch: 4 Global Step: 81800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:14,996-Speed 9688.81 samples/sec Loss 7.6187 LearningRate 0.0570 Epoch: 4 Global Step: 81810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:16,057-Speed 9660.23 samples/sec Loss 7.4759 LearningRate 0.0570 Epoch: 4 Global Step: 81820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:17,120-Speed 9637.07 samples/sec Loss 7.5993 LearningRate 0.0570 Epoch: 4 Global Step: 81830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:18,201-Speed 9479.54 samples/sec Loss 7.5697 LearningRate 0.0570 Epoch: 4 Global Step: 81840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:19,306-Speed 9270.28 samples/sec Loss 7.5501 LearningRate 0.0570 Epoch: 4 Global Step: 81850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:20,410-Speed 9284.38 samples/sec Loss 7.5312 LearningRate 0.0570 Epoch: 4 Global Step: 81860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:21,494-Speed 9449.84 samples/sec Loss 7.5353 LearningRate 0.0570 Epoch: 4 Global Step: 81870 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:49:22,547-Speed 9730.23 samples/sec Loss 7.5328 LearningRate 0.0570 Epoch: 4 Global Step: 81880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:49:23,593-Speed 9790.55 samples/sec Loss 7.4946 LearningRate 0.0570 Epoch: 4 Global Step: 81890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:49:24,672-Speed 9496.26 samples/sec Loss 7.5457 LearningRate 0.0570 Epoch: 4 Global Step: 81900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:49:25,751-Speed 9502.83 samples/sec Loss 7.6725 LearningRate 0.0569 Epoch: 4 Global Step: 81910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:49:26,831-Speed 9483.62 samples/sec Loss 7.6741 LearningRate 0.0569 Epoch: 4 Global Step: 81920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:49:27,936-Speed 9268.49 samples/sec Loss 7.6886 LearningRate 0.0569 Epoch: 4 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:49:29,019-Speed 9462.98 samples/sec Loss 7.5496 LearningRate 0.0569 Epoch: 4 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:49:30,096-Speed 9516.49 samples/sec Loss 7.5310 LearningRate 0.0569 Epoch: 4 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:49:31,184-Speed 9415.61 samples/sec Loss 7.4911 LearningRate 0.0569 Epoch: 4 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:49:32,301-Speed 9176.68 samples/sec Loss 7.6126 LearningRate 0.0569 Epoch: 4 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:49:33,417-Speed 9180.98 samples/sec Loss 7.6094 LearningRate 0.0569 Epoch: 4 Global Step: 81980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:34,501-Speed 9451.58 samples/sec Loss 7.6126 LearningRate 0.0569 Epoch: 4 Global Step: 81990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:35,603-Speed 9293.45 samples/sec Loss 7.5589 LearningRate 0.0569 Epoch: 4 Global Step: 82000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:49:57,728-[lfw][82000]XNorm: 12.077396 Training: 2022-04-11 14:49:57,729-[lfw][82000]Accuracy-Flip: 0.99667+-0.00258 Training: 2022-04-11 14:49:57,729-[lfw][82000]Accuracy-Highest: 0.99667 Training: 2022-04-11 14:50:23,223-[cfp_fp][82000]XNorm: 10.286977 Training: 2022-04-11 14:50:23,224-[cfp_fp][82000]Accuracy-Flip: 0.94900+-0.01020 Training: 2022-04-11 14:50:23,224-[cfp_fp][82000]Accuracy-Highest: 0.95400 Training: 2022-04-11 14:50:45,254-[agedb_30][82000]XNorm: 11.711109 Training: 2022-04-11 14:50:45,254-[agedb_30][82000]Accuracy-Flip: 0.96300+-0.00894 Training: 2022-04-11 14:50:45,255-[agedb_30][82000]Accuracy-Highest: 0.96300 Training: 2022-04-11 14:50:46,336-Speed 144.77 samples/sec Loss 7.4093 LearningRate 0.0569 Epoch: 4 Global Step: 82010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:47,420-Speed 9449.72 samples/sec Loss 7.5476 LearningRate 0.0569 Epoch: 4 Global Step: 82020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:48,487-Speed 9601.62 samples/sec Loss 7.7032 LearningRate 0.0569 Epoch: 4 Global Step: 82030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:49,570-Speed 9457.09 samples/sec Loss 7.4803 LearningRate 0.0569 Epoch: 4 Global Step: 82040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:50,669-Speed 9323.85 samples/sec Loss 7.5728 LearningRate 0.0569 Epoch: 4 Global Step: 82050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:51,709-Speed 9854.83 samples/sec Loss 7.5819 LearningRate 0.0569 Epoch: 4 Global Step: 82060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:52,750-Speed 9852.03 samples/sec Loss 7.5700 LearningRate 0.0569 Epoch: 4 Global Step: 82070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:53,837-Speed 9423.69 samples/sec Loss 7.5986 LearningRate 0.0569 Epoch: 4 Global Step: 82080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:54,904-Speed 9603.79 samples/sec Loss 7.6253 LearningRate 0.0569 Epoch: 4 Global Step: 82090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:55,987-Speed 9459.52 samples/sec Loss 7.5834 LearningRate 0.0569 Epoch: 4 Global Step: 82100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:57,091-Speed 9277.86 samples/sec Loss 7.5464 LearningRate 0.0569 Epoch: 4 Global Step: 82110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:58,174-Speed 9467.84 samples/sec Loss 7.5347 LearningRate 0.0569 Epoch: 4 Global Step: 82120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:50:59,253-Speed 9488.32 samples/sec Loss 7.5891 LearningRate 0.0568 Epoch: 4 Global Step: 82130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:00,330-Speed 9518.92 samples/sec Loss 7.6193 LearningRate 0.0568 Epoch: 4 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:01,420-Speed 9396.02 samples/sec Loss 7.5073 LearningRate 0.0568 Epoch: 4 Global Step: 82150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:02,519-Speed 9319.24 samples/sec Loss 7.5819 LearningRate 0.0568 Epoch: 4 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:03,652-Speed 9045.51 samples/sec Loss 7.6474 LearningRate 0.0568 Epoch: 4 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:04,784-Speed 9051.19 samples/sec Loss 7.5410 LearningRate 0.0568 Epoch: 4 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:05,870-Speed 9437.42 samples/sec Loss 7.5102 LearningRate 0.0568 Epoch: 4 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:06,968-Speed 9333.30 samples/sec Loss 7.5027 LearningRate 0.0568 Epoch: 4 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:08,126-Speed 8845.64 samples/sec Loss 7.4791 LearningRate 0.0568 Epoch: 4 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:09,249-Speed 9126.31 samples/sec Loss 7.5927 LearningRate 0.0568 Epoch: 4 Global Step: 82220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:10,315-Speed 9611.86 samples/sec Loss 7.4737 LearningRate 0.0568 Epoch: 4 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:11,390-Speed 9528.46 samples/sec Loss 7.5694 LearningRate 0.0568 Epoch: 4 Global Step: 82240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:12,463-Speed 9553.47 samples/sec Loss 7.5331 LearningRate 0.0568 Epoch: 4 Global Step: 82250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:13,511-Speed 9780.44 samples/sec Loss 7.6035 LearningRate 0.0568 Epoch: 4 Global Step: 82260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:14,570-Speed 9672.92 samples/sec Loss 7.5532 LearningRate 0.0568 Epoch: 4 Global Step: 82270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:15,671-Speed 9310.01 samples/sec Loss 7.5646 LearningRate 0.0568 Epoch: 4 Global Step: 82280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:16,713-Speed 9834.65 samples/sec Loss 7.6164 LearningRate 0.0568 Epoch: 4 Global Step: 82290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:17,768-Speed 9703.90 samples/sec Loss 7.5100 LearningRate 0.0568 Epoch: 4 Global Step: 82300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:18,848-Speed 9492.01 samples/sec Loss 7.5191 LearningRate 0.0568 Epoch: 4 Global Step: 82310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:19,932-Speed 9454.80 samples/sec Loss 7.5521 LearningRate 0.0568 Epoch: 4 Global Step: 82320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:21,008-Speed 9518.92 samples/sec Loss 7.5168 LearningRate 0.0568 Epoch: 4 Global Step: 82330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:22,067-Speed 9678.54 samples/sec Loss 7.5783 LearningRate 0.0568 Epoch: 4 Global Step: 82340 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:51:23,126-Speed 9672.42 samples/sec Loss 7.4629 LearningRate 0.0567 Epoch: 4 Global Step: 82350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:24,230-Speed 9284.19 samples/sec Loss 7.4879 LearningRate 0.0567 Epoch: 4 Global Step: 82360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:25,330-Speed 9307.61 samples/sec Loss 7.5882 LearningRate 0.0567 Epoch: 4 Global Step: 82370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:26,377-Speed 9796.58 samples/sec Loss 7.4362 LearningRate 0.0567 Epoch: 4 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:27,417-Speed 9850.28 samples/sec Loss 7.5698 LearningRate 0.0567 Epoch: 4 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:28,492-Speed 9530.39 samples/sec Loss 7.4487 LearningRate 0.0567 Epoch: 4 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:29,584-Speed 9383.55 samples/sec Loss 7.4898 LearningRate 0.0567 Epoch: 4 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:30,666-Speed 9469.13 samples/sec Loss 7.6204 LearningRate 0.0567 Epoch: 4 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:31,764-Speed 9328.82 samples/sec Loss 7.5401 LearningRate 0.0567 Epoch: 4 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:32,877-Speed 9207.73 samples/sec Loss 7.6057 LearningRate 0.0567 Epoch: 4 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:33,979-Speed 9296.71 samples/sec Loss 7.6119 LearningRate 0.0567 Epoch: 4 Global Step: 82450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:35,067-Speed 9417.63 samples/sec Loss 7.6371 LearningRate 0.0567 Epoch: 4 Global Step: 82460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:36,116-Speed 9768.18 samples/sec Loss 7.6256 LearningRate 0.0567 Epoch: 4 Global Step: 82470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:37,170-Speed 9716.47 samples/sec Loss 7.5157 LearningRate 0.0567 Epoch: 4 Global Step: 82480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:38,265-Speed 9363.67 samples/sec Loss 7.6556 LearningRate 0.0567 Epoch: 4 Global Step: 82490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:39,336-Speed 9562.87 samples/sec Loss 7.6090 LearningRate 0.0567 Epoch: 4 Global Step: 82500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:40,428-Speed 9380.76 samples/sec Loss 7.4727 LearningRate 0.0567 Epoch: 4 Global Step: 82510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:41,495-Speed 9604.62 samples/sec Loss 7.6024 LearningRate 0.0567 Epoch: 4 Global Step: 82520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:42,551-Speed 9703.48 samples/sec Loss 7.5087 LearningRate 0.0567 Epoch: 4 Global Step: 82530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:43,628-Speed 9514.17 samples/sec Loss 7.5796 LearningRate 0.0567 Epoch: 4 Global Step: 82540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:44,712-Speed 9458.59 samples/sec Loss 7.5703 LearningRate 0.0567 Epoch: 4 Global Step: 82550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:45,806-Speed 9361.70 samples/sec Loss 7.5193 LearningRate 0.0567 Epoch: 4 Global Step: 82560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:46,922-Speed 9184.13 samples/sec Loss 7.5358 LearningRate 0.0566 Epoch: 4 Global Step: 82570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:48,007-Speed 9436.77 samples/sec Loss 7.6243 LearningRate 0.0566 Epoch: 4 Global Step: 82580 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:51:49,091-Speed 9450.57 samples/sec Loss 7.5273 LearningRate 0.0566 Epoch: 4 Global Step: 82590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:50,173-Speed 9480.32 samples/sec Loss 7.5452 LearningRate 0.0566 Epoch: 4 Global Step: 82600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:51,218-Speed 9799.04 samples/sec Loss 7.5146 LearningRate 0.0566 Epoch: 4 Global Step: 82610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:52,347-Speed 9081.89 samples/sec Loss 7.4187 LearningRate 0.0566 Epoch: 4 Global Step: 82620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:53,440-Speed 9369.12 samples/sec Loss 7.5585 LearningRate 0.0566 Epoch: 4 Global Step: 82630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:54,570-Speed 9068.59 samples/sec Loss 7.6391 LearningRate 0.0566 Epoch: 4 Global Step: 82640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:51:55,690-Speed 9142.09 samples/sec Loss 7.4948 LearningRate 0.0566 Epoch: 4 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:56,798-Speed 9256.60 samples/sec Loss 7.4607 LearningRate 0.0566 Epoch: 4 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:57,879-Speed 9473.76 samples/sec Loss 7.4880 LearningRate 0.0566 Epoch: 4 Global Step: 82670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:51:58,979-Speed 9317.53 samples/sec Loss 7.4126 LearningRate 0.0566 Epoch: 4 Global Step: 82680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:00,032-Speed 9731.81 samples/sec Loss 7.5317 LearningRate 0.0566 Epoch: 4 Global Step: 82690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:01,101-Speed 9583.97 samples/sec Loss 7.4501 LearningRate 0.0566 Epoch: 4 Global Step: 82700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:02,209-Speed 9245.15 samples/sec Loss 7.5660 LearningRate 0.0566 Epoch: 4 Global Step: 82710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:03,313-Speed 9280.75 samples/sec Loss 7.5651 LearningRate 0.0566 Epoch: 4 Global Step: 82720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:04,384-Speed 9562.60 samples/sec Loss 7.4749 LearningRate 0.0566 Epoch: 4 Global Step: 82730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:05,460-Speed 9524.85 samples/sec Loss 7.6461 LearningRate 0.0566 Epoch: 4 Global Step: 82740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:06,586-Speed 9099.20 samples/sec Loss 7.5888 LearningRate 0.0566 Epoch: 4 Global Step: 82750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:07,674-Speed 9417.74 samples/sec Loss 7.5408 LearningRate 0.0566 Epoch: 4 Global Step: 82760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:08,787-Speed 9206.60 samples/sec Loss 7.5620 LearningRate 0.0566 Epoch: 4 Global Step: 82770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:09,859-Speed 9562.30 samples/sec Loss 7.5151 LearningRate 0.0566 Epoch: 4 Global Step: 82780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:10,940-Speed 9478.11 samples/sec Loss 7.5276 LearningRate 0.0565 Epoch: 4 Global Step: 82790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:12,038-Speed 9330.78 samples/sec Loss 7.6225 LearningRate 0.0565 Epoch: 4 Global Step: 82800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:13,145-Speed 9250.31 samples/sec Loss 7.6131 LearningRate 0.0565 Epoch: 4 Global Step: 82810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:14,225-Speed 9490.25 samples/sec Loss 7.5550 LearningRate 0.0565 Epoch: 4 Global Step: 82820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:15,296-Speed 9563.48 samples/sec Loss 7.4493 LearningRate 0.0565 Epoch: 4 Global Step: 82830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:16,369-Speed 9553.73 samples/sec Loss 7.5594 LearningRate 0.0565 Epoch: 4 Global Step: 82840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:17,436-Speed 9604.11 samples/sec Loss 7.6655 LearningRate 0.0565 Epoch: 4 Global Step: 82850 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:52:18,489-Speed 9725.86 samples/sec Loss 7.5780 LearningRate 0.0565 Epoch: 4 Global Step: 82860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:19,597-Speed 9251.11 samples/sec Loss 7.4709 LearningRate 0.0565 Epoch: 4 Global Step: 82870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:20,688-Speed 9389.89 samples/sec Loss 7.5304 LearningRate 0.0565 Epoch: 4 Global Step: 82880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:21,770-Speed 9465.22 samples/sec Loss 7.5354 LearningRate 0.0565 Epoch: 4 Global Step: 82890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:22,803-Speed 9924.70 samples/sec Loss 7.5128 LearningRate 0.0565 Epoch: 4 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:23,889-Speed 9429.18 samples/sec Loss 7.6535 LearningRate 0.0565 Epoch: 4 Global Step: 82910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:24,967-Speed 9508.99 samples/sec Loss 7.4667 LearningRate 0.0565 Epoch: 4 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:26,032-Speed 9619.29 samples/sec Loss 7.4617 LearningRate 0.0565 Epoch: 4 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:27,090-Speed 9692.48 samples/sec Loss 7.4870 LearningRate 0.0565 Epoch: 4 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:28,176-Speed 9435.08 samples/sec Loss 7.4498 LearningRate 0.0565 Epoch: 4 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:29,222-Speed 9791.17 samples/sec Loss 7.7069 LearningRate 0.0565 Epoch: 4 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:30,282-Speed 9673.20 samples/sec Loss 7.5458 LearningRate 0.0565 Epoch: 4 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:31,372-Speed 9395.88 samples/sec Loss 7.5892 LearningRate 0.0565 Epoch: 4 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:32,425-Speed 9725.96 samples/sec Loss 7.5649 LearningRate 0.0565 Epoch: 4 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:33,481-Speed 9702.90 samples/sec Loss 7.5561 LearningRate 0.0565 Epoch: 4 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:34,573-Speed 9386.64 samples/sec Loss 7.5919 LearningRate 0.0565 Epoch: 4 Global Step: 83010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:35,649-Speed 9518.14 samples/sec Loss 7.4050 LearningRate 0.0564 Epoch: 4 Global Step: 83020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:36,716-Speed 9613.08 samples/sec Loss 7.5655 LearningRate 0.0564 Epoch: 4 Global Step: 83030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:37,814-Speed 9329.80 samples/sec Loss 7.4060 LearningRate 0.0564 Epoch: 4 Global Step: 83040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:38,893-Speed 9498.79 samples/sec Loss 7.6047 LearningRate 0.0564 Epoch: 4 Global Step: 83050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:39,944-Speed 9754.86 samples/sec Loss 7.5476 LearningRate 0.0564 Epoch: 4 Global Step: 83060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:41,018-Speed 9533.61 samples/sec Loss 7.5897 LearningRate 0.0564 Epoch: 4 Global Step: 83070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:42,119-Speed 9306.96 samples/sec Loss 7.5587 LearningRate 0.0564 Epoch: 4 Global Step: 83080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:43,209-Speed 9397.72 samples/sec Loss 7.5468 LearningRate 0.0564 Epoch: 4 Global Step: 83090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:44,294-Speed 9444.91 samples/sec Loss 7.7776 LearningRate 0.0564 Epoch: 4 Global Step: 83100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:45,436-Speed 8972.02 samples/sec Loss 7.5282 LearningRate 0.0564 Epoch: 4 Global Step: 83110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:46,473-Speed 9881.54 samples/sec Loss 7.4661 LearningRate 0.0564 Epoch: 4 Global Step: 83120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:47,537-Speed 9638.57 samples/sec Loss 7.4986 LearningRate 0.0564 Epoch: 4 Global Step: 83130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:48,621-Speed 9450.68 samples/sec Loss 7.4991 LearningRate 0.0564 Epoch: 4 Global Step: 83140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:52:49,671-Speed 9760.35 samples/sec Loss 7.4936 LearningRate 0.0564 Epoch: 4 Global Step: 83150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:50,756-Speed 9452.05 samples/sec Loss 7.6161 LearningRate 0.0564 Epoch: 4 Global Step: 83160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:51,817-Speed 9660.88 samples/sec Loss 7.4261 LearningRate 0.0564 Epoch: 4 Global Step: 83170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:52,896-Speed 9493.96 samples/sec Loss 7.5128 LearningRate 0.0564 Epoch: 4 Global Step: 83180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:53,938-Speed 9839.86 samples/sec Loss 7.6314 LearningRate 0.0564 Epoch: 4 Global Step: 83190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:54,993-Speed 9705.19 samples/sec Loss 7.4995 LearningRate 0.0564 Epoch: 4 Global Step: 83200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:56,126-Speed 9043.94 samples/sec Loss 7.5221 LearningRate 0.0564 Epoch: 4 Global Step: 83210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:57,203-Speed 9517.51 samples/sec Loss 7.5373 LearningRate 0.0564 Epoch: 4 Global Step: 83220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:58,311-Speed 9242.82 samples/sec Loss 7.4733 LearningRate 0.0564 Epoch: 4 Global Step: 83230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:52:59,439-Speed 9081.88 samples/sec Loss 7.4655 LearningRate 0.0563 Epoch: 4 Global Step: 83240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:53:00,515-Speed 9528.55 samples/sec Loss 7.5744 LearningRate 0.0563 Epoch: 4 Global Step: 83250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:01,603-Speed 9420.22 samples/sec Loss 7.5831 LearningRate 0.0563 Epoch: 4 Global Step: 83260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:02,679-Speed 9518.92 samples/sec Loss 7.5118 LearningRate 0.0563 Epoch: 4 Global Step: 83270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:03,809-Speed 9063.73 samples/sec Loss 7.5507 LearningRate 0.0563 Epoch: 4 Global Step: 83280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:04,871-Speed 9652.72 samples/sec Loss 7.4683 LearningRate 0.0563 Epoch: 4 Global Step: 83290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:05,936-Speed 9616.72 samples/sec Loss 7.4488 LearningRate 0.0563 Epoch: 4 Global Step: 83300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:07,003-Speed 9603.78 samples/sec Loss 7.4151 LearningRate 0.0563 Epoch: 4 Global Step: 83310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:08,100-Speed 9349.63 samples/sec Loss 7.5929 LearningRate 0.0563 Epoch: 4 Global Step: 83320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:09,121-Speed 10032.27 samples/sec Loss 7.5396 LearningRate 0.0563 Epoch: 4 Global Step: 83330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:10,187-Speed 9611.28 samples/sec Loss 7.6132 LearningRate 0.0563 Epoch: 4 Global Step: 83340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:11,261-Speed 9540.74 samples/sec Loss 7.4154 LearningRate 0.0563 Epoch: 4 Global Step: 83350 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:53:12,350-Speed 9401.20 samples/sec Loss 7.5620 LearningRate 0.0563 Epoch: 4 Global Step: 83360 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:53:13,476-Speed 9099.11 samples/sec Loss 7.5304 LearningRate 0.0563 Epoch: 4 Global Step: 83370 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:53:14,520-Speed 9820.28 samples/sec Loss 7.5265 LearningRate 0.0563 Epoch: 4 Global Step: 83380 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:53:15,573-Speed 9728.42 samples/sec Loss 7.4940 LearningRate 0.0563 Epoch: 4 Global Step: 83390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:16,667-Speed 9359.70 samples/sec Loss 7.5519 LearningRate 0.0563 Epoch: 4 Global Step: 83400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:17,743-Speed 9523.50 samples/sec Loss 7.5237 LearningRate 0.0563 Epoch: 4 Global Step: 83410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:18,819-Speed 9527.97 samples/sec Loss 7.5095 LearningRate 0.0563 Epoch: 4 Global Step: 83420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:53:19,890-Speed 9570.43 samples/sec Loss 7.4850 LearningRate 0.0563 Epoch: 4 Global Step: 83430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:53:20,984-Speed 9370.21 samples/sec Loss 7.4945 LearningRate 0.0563 Epoch: 4 Global Step: 83440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:53:22,353-Speed 7486.43 samples/sec Loss 7.4955 LearningRate 0.0563 Epoch: 4 Global Step: 83450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:53:57,220-Speed 293.70 samples/sec Loss 7.2486 LearningRate 0.0562 Epoch: 5 Global Step: 83460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:53:58,674-Speed 7045.06 samples/sec Loss 6.8231 LearningRate 0.0562 Epoch: 5 Global Step: 83470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:00,001-Speed 7723.91 samples/sec Loss 6.7100 LearningRate 0.0562 Epoch: 5 Global Step: 83480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:01,617-Speed 6340.63 samples/sec Loss 6.7391 LearningRate 0.0562 Epoch: 5 Global Step: 83490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:02,767-Speed 8906.15 samples/sec Loss 6.7404 LearningRate 0.0562 Epoch: 5 Global Step: 83500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:03,883-Speed 9178.09 samples/sec Loss 6.6010 LearningRate 0.0562 Epoch: 5 Global Step: 83510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:05,172-Speed 7952.04 samples/sec Loss 6.7226 LearningRate 0.0562 Epoch: 5 Global Step: 83520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:06,266-Speed 9369.56 samples/sec Loss 6.6605 LearningRate 0.0562 Epoch: 5 Global Step: 83530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:07,344-Speed 9500.53 samples/sec Loss 6.7870 LearningRate 0.0562 Epoch: 5 Global Step: 83540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:08,619-Speed 8032.17 samples/sec Loss 6.7855 LearningRate 0.0562 Epoch: 5 Global Step: 83550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:09,758-Speed 8998.84 samples/sec Loss 6.7179 LearningRate 0.0562 Epoch: 5 Global Step: 83560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:10,824-Speed 9613.46 samples/sec Loss 6.7818 LearningRate 0.0562 Epoch: 5 Global Step: 83570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:11,942-Speed 9169.03 samples/sec Loss 6.7457 LearningRate 0.0562 Epoch: 5 Global Step: 83580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:13,027-Speed 9446.39 samples/sec Loss 6.6008 LearningRate 0.0562 Epoch: 5 Global Step: 83590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:14,109-Speed 9468.91 samples/sec Loss 6.6882 LearningRate 0.0562 Epoch: 5 Global Step: 83600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:15,183-Speed 9535.74 samples/sec Loss 6.7908 LearningRate 0.0562 Epoch: 5 Global Step: 83610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:16,281-Speed 9333.43 samples/sec Loss 6.7496 LearningRate 0.0562 Epoch: 5 Global Step: 83620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:17,389-Speed 9293.52 samples/sec Loss 6.8095 LearningRate 0.0562 Epoch: 5 Global Step: 83630 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:54:18,449-Speed 9670.57 samples/sec Loss 6.7980 LearningRate 0.0562 Epoch: 5 Global Step: 83640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:19,519-Speed 9576.56 samples/sec Loss 6.6908 LearningRate 0.0562 Epoch: 5 Global Step: 83650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:20,602-Speed 9463.48 samples/sec Loss 6.8049 LearningRate 0.0562 Epoch: 5 Global Step: 83660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:21,715-Speed 9197.59 samples/sec Loss 6.6674 LearningRate 0.0562 Epoch: 5 Global Step: 83670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:22,799-Speed 9457.43 samples/sec Loss 6.7317 LearningRate 0.0561 Epoch: 5 Global Step: 83680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:23,841-Speed 9833.33 samples/sec Loss 6.7760 LearningRate 0.0561 Epoch: 5 Global Step: 83690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:24,956-Speed 9184.69 samples/sec Loss 6.7527 LearningRate 0.0561 Epoch: 5 Global Step: 83700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:26,120-Speed 8805.64 samples/sec Loss 6.6767 LearningRate 0.0561 Epoch: 5 Global Step: 83710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:27,174-Speed 9721.44 samples/sec Loss 6.6996 LearningRate 0.0561 Epoch: 5 Global Step: 83720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:28,264-Speed 9404.57 samples/sec Loss 6.5686 LearningRate 0.0561 Epoch: 5 Global Step: 83730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:29,365-Speed 9304.96 samples/sec Loss 6.8174 LearningRate 0.0561 Epoch: 5 Global Step: 83740 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:54:30,430-Speed 9620.97 samples/sec Loss 6.7475 LearningRate 0.0561 Epoch: 5 Global Step: 83750 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:54:31,500-Speed 9570.47 samples/sec Loss 6.7842 LearningRate 0.0561 Epoch: 5 Global Step: 83760 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:54:32,753-Speed 8175.99 samples/sec Loss 6.7525 LearningRate 0.0561 Epoch: 5 Global Step: 83770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:33,783-Speed 9950.18 samples/sec Loss 6.8157 LearningRate 0.0561 Epoch: 5 Global Step: 83780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:34,868-Speed 9448.90 samples/sec Loss 6.7893 LearningRate 0.0561 Epoch: 5 Global Step: 83790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:35,934-Speed 9609.03 samples/sec Loss 6.8073 LearningRate 0.0561 Epoch: 5 Global Step: 83800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:37,034-Speed 9310.19 samples/sec Loss 6.8060 LearningRate 0.0561 Epoch: 5 Global Step: 83810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:38,090-Speed 9705.01 samples/sec Loss 6.7827 LearningRate 0.0561 Epoch: 5 Global Step: 83820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:39,132-Speed 9829.96 samples/sec Loss 6.8660 LearningRate 0.0561 Epoch: 5 Global Step: 83830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:40,224-Speed 9380.79 samples/sec Loss 6.7377 LearningRate 0.0561 Epoch: 5 Global Step: 83840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:41,293-Speed 9589.96 samples/sec Loss 6.8233 LearningRate 0.0561 Epoch: 5 Global Step: 83850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:42,351-Speed 9682.09 samples/sec Loss 6.7774 LearningRate 0.0561 Epoch: 5 Global Step: 83860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:43,438-Speed 9432.71 samples/sec Loss 6.8449 LearningRate 0.0561 Epoch: 5 Global Step: 83870 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:54:44,533-Speed 9355.27 samples/sec Loss 6.8723 LearningRate 0.0561 Epoch: 5 Global Step: 83880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:45,634-Speed 9307.77 samples/sec Loss 6.8611 LearningRate 0.0561 Epoch: 5 Global Step: 83890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:46,702-Speed 9589.72 samples/sec Loss 6.8346 LearningRate 0.0561 Epoch: 5 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:47,815-Speed 9209.28 samples/sec Loss 6.8523 LearningRate 0.0560 Epoch: 5 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:48,882-Speed 9605.25 samples/sec Loss 6.9535 LearningRate 0.0560 Epoch: 5 Global Step: 83920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:49,931-Speed 9767.74 samples/sec Loss 6.7939 LearningRate 0.0560 Epoch: 5 Global Step: 83930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:51,001-Speed 9575.02 samples/sec Loss 6.7228 LearningRate 0.0560 Epoch: 5 Global Step: 83940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:52,163-Speed 8821.42 samples/sec Loss 6.7967 LearningRate 0.0560 Epoch: 5 Global Step: 83950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:53,260-Speed 9334.99 samples/sec Loss 6.8158 LearningRate 0.0560 Epoch: 5 Global Step: 83960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:54,347-Speed 9431.89 samples/sec Loss 6.7863 LearningRate 0.0560 Epoch: 5 Global Step: 83970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:55,459-Speed 9216.40 samples/sec Loss 6.9072 LearningRate 0.0560 Epoch: 5 Global Step: 83980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:54:56,606-Speed 8931.18 samples/sec Loss 6.8151 LearningRate 0.0560 Epoch: 5 Global Step: 83990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:54:57,647-Speed 9847.78 samples/sec Loss 6.8362 LearningRate 0.0560 Epoch: 5 Global Step: 84000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:55:19,591-[lfw][84000]XNorm: 11.742262 Training: 2022-04-11 14:55:19,592-[lfw][84000]Accuracy-Flip: 0.99617+-0.00248 Training: 2022-04-11 14:55:19,592-[lfw][84000]Accuracy-Highest: 0.99667 Training: 2022-04-11 14:55:44,947-[cfp_fp][84000]XNorm: 10.006655 Training: 2022-04-11 14:55:44,948-[cfp_fp][84000]Accuracy-Flip: 0.95143+-0.01204 Training: 2022-04-11 14:55:44,948-[cfp_fp][84000]Accuracy-Highest: 0.95400 Training: 2022-04-11 14:56:06,761-[agedb_30][84000]XNorm: 11.427986 Training: 2022-04-11 14:56:06,762-[agedb_30][84000]Accuracy-Flip: 0.96200+-0.01087 Training: 2022-04-11 14:56:06,763-[agedb_30][84000]Accuracy-Highest: 0.96300 Training: 2022-04-11 14:56:07,821-Speed 145.92 samples/sec Loss 6.9539 LearningRate 0.0560 Epoch: 5 Global Step: 84010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:08,850-Speed 9964.28 samples/sec Loss 6.8922 LearningRate 0.0560 Epoch: 5 Global Step: 84020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:09,884-Speed 9903.54 samples/sec Loss 6.8138 LearningRate 0.0560 Epoch: 5 Global Step: 84030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:10,993-Speed 9235.87 samples/sec Loss 6.9348 LearningRate 0.0560 Epoch: 5 Global Step: 84040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:12,064-Speed 9566.09 samples/sec Loss 6.8688 LearningRate 0.0560 Epoch: 5 Global Step: 84050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:13,137-Speed 9552.29 samples/sec Loss 6.9391 LearningRate 0.0560 Epoch: 5 Global Step: 84060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:14,193-Speed 9710.46 samples/sec Loss 6.9801 LearningRate 0.0560 Epoch: 5 Global Step: 84070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:15,319-Speed 9101.38 samples/sec Loss 6.8414 LearningRate 0.0560 Epoch: 5 Global Step: 84080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:16,394-Speed 9529.12 samples/sec Loss 6.9123 LearningRate 0.0560 Epoch: 5 Global Step: 84090 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:56:17,460-Speed 9607.10 samples/sec Loss 6.9088 LearningRate 0.0560 Epoch: 5 Global Step: 84100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:18,559-Speed 9320.91 samples/sec Loss 6.8185 LearningRate 0.0560 Epoch: 5 Global Step: 84110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:19,637-Speed 9511.79 samples/sec Loss 6.8253 LearningRate 0.0560 Epoch: 5 Global Step: 84120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:20,740-Speed 9288.69 samples/sec Loss 6.9782 LearningRate 0.0559 Epoch: 5 Global Step: 84130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:21,828-Speed 9418.78 samples/sec Loss 6.9596 LearningRate 0.0559 Epoch: 5 Global Step: 84140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:22,924-Speed 9346.93 samples/sec Loss 6.9251 LearningRate 0.0559 Epoch: 5 Global Step: 84150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:24,035-Speed 9223.49 samples/sec Loss 6.9228 LearningRate 0.0559 Epoch: 5 Global Step: 84160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:25,123-Speed 9411.19 samples/sec Loss 6.9550 LearningRate 0.0559 Epoch: 5 Global Step: 84170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:26,377-Speed 8170.75 samples/sec Loss 6.7801 LearningRate 0.0559 Epoch: 5 Global Step: 84180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:27,947-Speed 6524.61 samples/sec Loss 6.9460 LearningRate 0.0559 Epoch: 5 Global Step: 84190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:29,352-Speed 7294.29 samples/sec Loss 6.9408 LearningRate 0.0559 Epoch: 5 Global Step: 84200 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:56:30,613-Speed 8127.79 samples/sec Loss 6.9029 LearningRate 0.0559 Epoch: 5 Global Step: 84210 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:56:31,668-Speed 9706.46 samples/sec Loss 6.8428 LearningRate 0.0559 Epoch: 5 Global Step: 84220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:32,744-Speed 9523.30 samples/sec Loss 6.9316 LearningRate 0.0559 Epoch: 5 Global Step: 84230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:33,791-Speed 9783.93 samples/sec Loss 6.8470 LearningRate 0.0559 Epoch: 5 Global Step: 84240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:34,867-Speed 9519.99 samples/sec Loss 6.8919 LearningRate 0.0559 Epoch: 5 Global Step: 84250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:35,968-Speed 9318.43 samples/sec Loss 6.8533 LearningRate 0.0559 Epoch: 5 Global Step: 84260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:37,023-Speed 9711.22 samples/sec Loss 6.9514 LearningRate 0.0559 Epoch: 5 Global Step: 84270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:38,091-Speed 9586.63 samples/sec Loss 6.9639 LearningRate 0.0559 Epoch: 5 Global Step: 84280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:39,159-Speed 9596.07 samples/sec Loss 6.8953 LearningRate 0.0559 Epoch: 5 Global Step: 84290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:40,229-Speed 9579.60 samples/sec Loss 6.9853 LearningRate 0.0559 Epoch: 5 Global Step: 84300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:41,332-Speed 9290.50 samples/sec Loss 6.9329 LearningRate 0.0559 Epoch: 5 Global Step: 84310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:42,437-Speed 9271.22 samples/sec Loss 6.9065 LearningRate 0.0559 Epoch: 5 Global Step: 84320 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:56:43,538-Speed 9309.89 samples/sec Loss 6.9276 LearningRate 0.0559 Epoch: 5 Global Step: 84330 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:56:44,593-Speed 9708.11 samples/sec Loss 6.8816 LearningRate 0.0559 Epoch: 5 Global Step: 84340 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:56:45,668-Speed 9534.69 samples/sec Loss 6.9829 LearningRate 0.0558 Epoch: 5 Global Step: 84350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:46,741-Speed 9548.68 samples/sec Loss 6.8771 LearningRate 0.0558 Epoch: 5 Global Step: 84360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:47,784-Speed 9822.45 samples/sec Loss 6.8994 LearningRate 0.0558 Epoch: 5 Global Step: 84370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:48,907-Speed 9125.26 samples/sec Loss 6.9433 LearningRate 0.0558 Epoch: 5 Global Step: 84380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:50,008-Speed 9307.96 samples/sec Loss 6.9989 LearningRate 0.0558 Epoch: 5 Global Step: 84390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:51,066-Speed 9687.64 samples/sec Loss 6.9395 LearningRate 0.0558 Epoch: 5 Global Step: 84400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:52,128-Speed 9640.68 samples/sec Loss 6.8301 LearningRate 0.0558 Epoch: 5 Global Step: 84410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:53,236-Speed 9252.04 samples/sec Loss 6.9007 LearningRate 0.0558 Epoch: 5 Global Step: 84420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:54,309-Speed 9541.20 samples/sec Loss 6.9283 LearningRate 0.0558 Epoch: 5 Global Step: 84430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:55,385-Speed 9525.27 samples/sec Loss 7.0353 LearningRate 0.0558 Epoch: 5 Global Step: 84440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:56,486-Speed 9307.44 samples/sec Loss 7.0176 LearningRate 0.0558 Epoch: 5 Global Step: 84450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:57,603-Speed 9171.13 samples/sec Loss 6.9477 LearningRate 0.0558 Epoch: 5 Global Step: 84460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:58,684-Speed 9482.65 samples/sec Loss 6.9905 LearningRate 0.0558 Epoch: 5 Global Step: 84470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:56:59,810-Speed 9097.03 samples/sec Loss 6.8850 LearningRate 0.0558 Epoch: 5 Global Step: 84480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:00,877-Speed 9606.52 samples/sec Loss 6.9128 LearningRate 0.0558 Epoch: 5 Global Step: 84490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:01,978-Speed 9299.28 samples/sec Loss 6.9180 LearningRate 0.0558 Epoch: 5 Global Step: 84500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:03,070-Speed 9385.45 samples/sec Loss 7.0533 LearningRate 0.0558 Epoch: 5 Global Step: 84510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:04,159-Speed 9410.98 samples/sec Loss 6.8828 LearningRate 0.0558 Epoch: 5 Global Step: 84520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:05,246-Speed 9430.60 samples/sec Loss 6.8858 LearningRate 0.0558 Epoch: 5 Global Step: 84530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:06,323-Speed 9517.21 samples/sec Loss 6.9004 LearningRate 0.0558 Epoch: 5 Global Step: 84540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:07,394-Speed 9564.62 samples/sec Loss 7.0932 LearningRate 0.0558 Epoch: 5 Global Step: 84550 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:57:08,489-Speed 9354.44 samples/sec Loss 6.9426 LearningRate 0.0558 Epoch: 5 Global Step: 84560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:09,583-Speed 9369.19 samples/sec Loss 7.0763 LearningRate 0.0558 Epoch: 5 Global Step: 84570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:10,647-Speed 9631.87 samples/sec Loss 6.9280 LearningRate 0.0557 Epoch: 5 Global Step: 84580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:11,710-Speed 9636.49 samples/sec Loss 6.9494 LearningRate 0.0557 Epoch: 5 Global Step: 84590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:12,783-Speed 9552.75 samples/sec Loss 7.0403 LearningRate 0.0557 Epoch: 5 Global Step: 84600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:13,875-Speed 9380.45 samples/sec Loss 6.9549 LearningRate 0.0557 Epoch: 5 Global Step: 84610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:14,961-Speed 9431.57 samples/sec Loss 7.0241 LearningRate 0.0557 Epoch: 5 Global Step: 84620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:16,010-Speed 9771.40 samples/sec Loss 7.0100 LearningRate 0.0557 Epoch: 5 Global Step: 84630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:17,160-Speed 8909.06 samples/sec Loss 7.0148 LearningRate 0.0557 Epoch: 5 Global Step: 84640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:18,264-Speed 9284.20 samples/sec Loss 7.0571 LearningRate 0.0557 Epoch: 5 Global Step: 84650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:19,362-Speed 9331.96 samples/sec Loss 6.8158 LearningRate 0.0557 Epoch: 5 Global Step: 84660 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:57:20,429-Speed 9601.40 samples/sec Loss 6.9227 LearningRate 0.0557 Epoch: 5 Global Step: 84670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:21,516-Speed 9423.51 samples/sec Loss 6.8576 LearningRate 0.0557 Epoch: 5 Global Step: 84680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:22,619-Speed 9289.98 samples/sec Loss 6.9253 LearningRate 0.0557 Epoch: 5 Global Step: 84690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:23,740-Speed 9141.56 samples/sec Loss 6.8830 LearningRate 0.0557 Epoch: 5 Global Step: 84700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:24,873-Speed 9044.96 samples/sec Loss 6.9778 LearningRate 0.0557 Epoch: 5 Global Step: 84710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:25,994-Speed 9142.19 samples/sec Loss 7.0217 LearningRate 0.0557 Epoch: 5 Global Step: 84720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:27,079-Speed 9447.00 samples/sec Loss 7.0343 LearningRate 0.0557 Epoch: 5 Global Step: 84730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:28,150-Speed 9565.27 samples/sec Loss 7.1599 LearningRate 0.0557 Epoch: 5 Global Step: 84740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:29,206-Speed 9699.95 samples/sec Loss 6.9744 LearningRate 0.0557 Epoch: 5 Global Step: 84750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:30,336-Speed 9062.44 samples/sec Loss 7.0230 LearningRate 0.0557 Epoch: 5 Global Step: 84760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:31,450-Speed 9204.33 samples/sec Loss 6.9997 LearningRate 0.0557 Epoch: 5 Global Step: 84770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:32,548-Speed 9331.02 samples/sec Loss 7.0243 LearningRate 0.0557 Epoch: 5 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:57:33,673-Speed 9108.19 samples/sec Loss 7.0351 LearningRate 0.0557 Epoch: 5 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:57:34,750-Speed 9504.97 samples/sec Loss 6.9954 LearningRate 0.0556 Epoch: 5 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:57:35,802-Speed 9754.92 samples/sec Loss 6.9257 LearningRate 0.0556 Epoch: 5 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:57:36,883-Speed 9481.21 samples/sec Loss 7.0162 LearningRate 0.0556 Epoch: 5 Global Step: 84820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:57:38,013-Speed 9066.91 samples/sec Loss 7.0506 LearningRate 0.0556 Epoch: 5 Global Step: 84830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:57:39,118-Speed 9276.20 samples/sec Loss 6.9747 LearningRate 0.0556 Epoch: 5 Global Step: 84840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:57:40,221-Speed 9288.93 samples/sec Loss 7.0184 LearningRate 0.0556 Epoch: 5 Global Step: 84850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:57:41,310-Speed 9406.12 samples/sec Loss 7.0044 LearningRate 0.0556 Epoch: 5 Global Step: 84860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:57:42,386-Speed 9523.27 samples/sec Loss 7.0102 LearningRate 0.0556 Epoch: 5 Global Step: 84870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:57:43,462-Speed 9528.78 samples/sec Loss 7.0536 LearningRate 0.0556 Epoch: 5 Global Step: 84880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:44,504-Speed 9827.15 samples/sec Loss 7.0094 LearningRate 0.0556 Epoch: 5 Global Step: 84890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:45,549-Speed 9809.40 samples/sec Loss 7.1697 LearningRate 0.0556 Epoch: 5 Global Step: 84900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:46,624-Speed 9528.79 samples/sec Loss 6.8575 LearningRate 0.0556 Epoch: 5 Global Step: 84910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:47,692-Speed 9593.54 samples/sec Loss 7.0776 LearningRate 0.0556 Epoch: 5 Global Step: 84920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:48,821-Speed 9070.33 samples/sec Loss 7.1389 LearningRate 0.0556 Epoch: 5 Global Step: 84930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:49,918-Speed 9355.78 samples/sec Loss 7.0180 LearningRate 0.0556 Epoch: 5 Global Step: 84940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:51,028-Speed 9224.32 samples/sec Loss 7.0519 LearningRate 0.0556 Epoch: 5 Global Step: 84950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:52,113-Speed 9442.30 samples/sec Loss 7.0534 LearningRate 0.0556 Epoch: 5 Global Step: 84960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:53,191-Speed 9507.92 samples/sec Loss 7.0386 LearningRate 0.0556 Epoch: 5 Global Step: 84970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:54,326-Speed 9025.27 samples/sec Loss 7.1736 LearningRate 0.0556 Epoch: 5 Global Step: 84980 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:57:55,359-Speed 9922.58 samples/sec Loss 6.9960 LearningRate 0.0556 Epoch: 5 Global Step: 84990 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:57:56,453-Speed 9369.97 samples/sec Loss 7.1713 LearningRate 0.0556 Epoch: 5 Global Step: 85000 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 14:57:57,514-Speed 9658.61 samples/sec Loss 7.0319 LearningRate 0.0556 Epoch: 5 Global Step: 85010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:58,572-Speed 9681.44 samples/sec Loss 7.0498 LearningRate 0.0555 Epoch: 5 Global Step: 85020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:57:59,683-Speed 9224.41 samples/sec Loss 7.0293 LearningRate 0.0555 Epoch: 5 Global Step: 85030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:00,734-Speed 9748.87 samples/sec Loss 7.0423 LearningRate 0.0555 Epoch: 5 Global Step: 85040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:01,825-Speed 9385.68 samples/sec Loss 7.0444 LearningRate 0.0555 Epoch: 5 Global Step: 85050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:02,904-Speed 9499.28 samples/sec Loss 7.0168 LearningRate 0.0555 Epoch: 5 Global Step: 85060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:03,979-Speed 9533.41 samples/sec Loss 7.0882 LearningRate 0.0555 Epoch: 5 Global Step: 85070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:05,037-Speed 9680.83 samples/sec Loss 7.1732 LearningRate 0.0555 Epoch: 5 Global Step: 85080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:06,116-Speed 9496.69 samples/sec Loss 6.9958 LearningRate 0.0555 Epoch: 5 Global Step: 85090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:07,178-Speed 9654.76 samples/sec Loss 7.0943 LearningRate 0.0555 Epoch: 5 Global Step: 85100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:08,237-Speed 9672.16 samples/sec Loss 7.0098 LearningRate 0.0555 Epoch: 5 Global Step: 85110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:09,324-Speed 9420.87 samples/sec Loss 7.1648 LearningRate 0.0555 Epoch: 5 Global Step: 85120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:10,434-Speed 9236.92 samples/sec Loss 7.2732 LearningRate 0.0555 Epoch: 5 Global Step: 85130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:11,547-Speed 9206.65 samples/sec Loss 7.0715 LearningRate 0.0555 Epoch: 5 Global Step: 85140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:12,650-Speed 9288.07 samples/sec Loss 7.0903 LearningRate 0.0555 Epoch: 5 Global Step: 85150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:13,731-Speed 9472.42 samples/sec Loss 7.1077 LearningRate 0.0555 Epoch: 5 Global Step: 85160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:14,810-Speed 9499.49 samples/sec Loss 7.0765 LearningRate 0.0555 Epoch: 5 Global Step: 85170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:15,857-Speed 9792.18 samples/sec Loss 7.0574 LearningRate 0.0555 Epoch: 5 Global Step: 85180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:16,940-Speed 9454.88 samples/sec Loss 7.0344 LearningRate 0.0555 Epoch: 5 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:18,026-Speed 9437.57 samples/sec Loss 7.0028 LearningRate 0.0555 Epoch: 5 Global Step: 85200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:19,097-Speed 9564.39 samples/sec Loss 7.1642 LearningRate 0.0555 Epoch: 5 Global Step: 85210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:20,135-Speed 9873.65 samples/sec Loss 7.1741 LearningRate 0.0555 Epoch: 5 Global Step: 85220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:21,214-Speed 9493.24 samples/sec Loss 7.0276 LearningRate 0.0555 Epoch: 5 Global Step: 85230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:22,300-Speed 9440.06 samples/sec Loss 6.9685 LearningRate 0.0555 Epoch: 5 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:23,381-Speed 9472.32 samples/sec Loss 6.9668 LearningRate 0.0554 Epoch: 5 Global Step: 85250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:24,485-Speed 9290.45 samples/sec Loss 7.0524 LearningRate 0.0554 Epoch: 5 Global Step: 85260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:25,579-Speed 9363.07 samples/sec Loss 7.0576 LearningRate 0.0554 Epoch: 5 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:26,676-Speed 9337.31 samples/sec Loss 7.1281 LearningRate 0.0554 Epoch: 5 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:27,746-Speed 9576.49 samples/sec Loss 7.1172 LearningRate 0.0554 Epoch: 5 Global Step: 85290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:28,839-Speed 9373.35 samples/sec Loss 7.1057 LearningRate 0.0554 Epoch: 5 Global Step: 85300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:29,900-Speed 9655.84 samples/sec Loss 7.1542 LearningRate 0.0554 Epoch: 5 Global Step: 85310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:30,981-Speed 9481.84 samples/sec Loss 7.0799 LearningRate 0.0554 Epoch: 5 Global Step: 85320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:32,056-Speed 9526.87 samples/sec Loss 7.0664 LearningRate 0.0554 Epoch: 5 Global Step: 85330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:33,139-Speed 9461.03 samples/sec Loss 7.1373 LearningRate 0.0554 Epoch: 5 Global Step: 85340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:34,209-Speed 9582.09 samples/sec Loss 7.0366 LearningRate 0.0554 Epoch: 5 Global Step: 85350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:35,320-Speed 9220.96 samples/sec Loss 7.1132 LearningRate 0.0554 Epoch: 5 Global Step: 85360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:36,416-Speed 9354.68 samples/sec Loss 7.0816 LearningRate 0.0554 Epoch: 5 Global Step: 85370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:37,516-Speed 9319.26 samples/sec Loss 6.9802 LearningRate 0.0554 Epoch: 5 Global Step: 85380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:38,584-Speed 9598.79 samples/sec Loss 7.0470 LearningRate 0.0554 Epoch: 5 Global Step: 85390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:39,647-Speed 9648.34 samples/sec Loss 6.9989 LearningRate 0.0554 Epoch: 5 Global Step: 85400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:40,733-Speed 9433.49 samples/sec Loss 7.1065 LearningRate 0.0554 Epoch: 5 Global Step: 85410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:41,814-Speed 9477.29 samples/sec Loss 7.1120 LearningRate 0.0554 Epoch: 5 Global Step: 85420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:42,869-Speed 9712.63 samples/sec Loss 7.1367 LearningRate 0.0554 Epoch: 5 Global Step: 85430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:43,943-Speed 9545.87 samples/sec Loss 7.0459 LearningRate 0.0554 Epoch: 5 Global Step: 85440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:44,999-Speed 9698.82 samples/sec Loss 7.0016 LearningRate 0.0554 Epoch: 5 Global Step: 85450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:46,056-Speed 9694.01 samples/sec Loss 7.0731 LearningRate 0.0554 Epoch: 5 Global Step: 85460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:47,147-Speed 9391.71 samples/sec Loss 7.0916 LearningRate 0.0553 Epoch: 5 Global Step: 85470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:48,255-Speed 9244.16 samples/sec Loss 7.0856 LearningRate 0.0553 Epoch: 5 Global Step: 85480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:49,343-Speed 9425.20 samples/sec Loss 7.1228 LearningRate 0.0553 Epoch: 5 Global Step: 85490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:50,426-Speed 9453.72 samples/sec Loss 6.9618 LearningRate 0.0553 Epoch: 5 Global Step: 85500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:51,576-Speed 8909.82 samples/sec Loss 7.0946 LearningRate 0.0553 Epoch: 5 Global Step: 85510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:58:52,647-Speed 9570.63 samples/sec Loss 7.0576 LearningRate 0.0553 Epoch: 5 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:53,747-Speed 9316.58 samples/sec Loss 7.1075 LearningRate 0.0553 Epoch: 5 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:54,825-Speed 9507.39 samples/sec Loss 7.1118 LearningRate 0.0553 Epoch: 5 Global Step: 85540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:55,913-Speed 9417.89 samples/sec Loss 7.0034 LearningRate 0.0553 Epoch: 5 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:56,984-Speed 9562.30 samples/sec Loss 7.1626 LearningRate 0.0553 Epoch: 5 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:58,041-Speed 9690.62 samples/sec Loss 7.0951 LearningRate 0.0553 Epoch: 5 Global Step: 85570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:58:59,123-Speed 9471.14 samples/sec Loss 7.1211 LearningRate 0.0553 Epoch: 5 Global Step: 85580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:00,237-Speed 9197.73 samples/sec Loss 7.0751 LearningRate 0.0553 Epoch: 5 Global Step: 85590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:01,307-Speed 9577.14 samples/sec Loss 7.0837 LearningRate 0.0553 Epoch: 5 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:02,354-Speed 9781.78 samples/sec Loss 7.1311 LearningRate 0.0553 Epoch: 5 Global Step: 85610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:03,381-Speed 9975.49 samples/sec Loss 7.0625 LearningRate 0.0553 Epoch: 5 Global Step: 85620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:04,423-Speed 9834.81 samples/sec Loss 7.1354 LearningRate 0.0553 Epoch: 5 Global Step: 85630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:05,481-Speed 9690.89 samples/sec Loss 6.9657 LearningRate 0.0553 Epoch: 5 Global Step: 85640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:06,576-Speed 9355.46 samples/sec Loss 7.0749 LearningRate 0.0553 Epoch: 5 Global Step: 85650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:07,669-Speed 9371.50 samples/sec Loss 7.1291 LearningRate 0.0553 Epoch: 5 Global Step: 85660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:08,730-Speed 9652.83 samples/sec Loss 7.1105 LearningRate 0.0553 Epoch: 5 Global Step: 85670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:09,843-Speed 9209.67 samples/sec Loss 7.1385 LearningRate 0.0553 Epoch: 5 Global Step: 85680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:10,912-Speed 9583.09 samples/sec Loss 7.1299 LearningRate 0.0553 Epoch: 5 Global Step: 85690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:11,973-Speed 9666.57 samples/sec Loss 7.1460 LearningRate 0.0552 Epoch: 5 Global Step: 85700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:13,056-Speed 9459.21 samples/sec Loss 7.1354 LearningRate 0.0552 Epoch: 5 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:14,146-Speed 9403.34 samples/sec Loss 7.2060 LearningRate 0.0552 Epoch: 5 Global Step: 85720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:15,246-Speed 9313.93 samples/sec Loss 7.1176 LearningRate 0.0552 Epoch: 5 Global Step: 85730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:16,341-Speed 9355.88 samples/sec Loss 7.1762 LearningRate 0.0552 Epoch: 5 Global Step: 85740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:17,433-Speed 9378.94 samples/sec Loss 7.1155 LearningRate 0.0552 Epoch: 5 Global Step: 85750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:18,529-Speed 9345.29 samples/sec Loss 7.1582 LearningRate 0.0552 Epoch: 5 Global Step: 85760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:19,623-Speed 9367.57 samples/sec Loss 7.1368 LearningRate 0.0552 Epoch: 5 Global Step: 85770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:20,722-Speed 9323.09 samples/sec Loss 6.9690 LearningRate 0.0552 Epoch: 5 Global Step: 85780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:21,811-Speed 9407.27 samples/sec Loss 7.1821 LearningRate 0.0552 Epoch: 5 Global Step: 85790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:22,952-Speed 8978.04 samples/sec Loss 7.0579 LearningRate 0.0552 Epoch: 5 Global Step: 85800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:24,060-Speed 9248.98 samples/sec Loss 7.1302 LearningRate 0.0552 Epoch: 5 Global Step: 85810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:25,135-Speed 9529.10 samples/sec Loss 7.0996 LearningRate 0.0552 Epoch: 5 Global Step: 85820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:26,227-Speed 9381.30 samples/sec Loss 7.1788 LearningRate 0.0552 Epoch: 5 Global Step: 85830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:27,288-Speed 9659.46 samples/sec Loss 7.1125 LearningRate 0.0552 Epoch: 5 Global Step: 85840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:28,345-Speed 9690.63 samples/sec Loss 7.2350 LearningRate 0.0552 Epoch: 5 Global Step: 85850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:29,374-Speed 9962.92 samples/sec Loss 7.0837 LearningRate 0.0552 Epoch: 5 Global Step: 85860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:30,435-Speed 9656.29 samples/sec Loss 7.0674 LearningRate 0.0552 Epoch: 5 Global Step: 85870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:31,528-Speed 9376.75 samples/sec Loss 7.0920 LearningRate 0.0552 Epoch: 5 Global Step: 85880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:32,634-Speed 9269.03 samples/sec Loss 7.0130 LearningRate 0.0552 Epoch: 5 Global Step: 85890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:33,752-Speed 9156.44 samples/sec Loss 7.0528 LearningRate 0.0552 Epoch: 5 Global Step: 85900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:34,808-Speed 9702.00 samples/sec Loss 7.0599 LearningRate 0.0552 Epoch: 5 Global Step: 85910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:35,885-Speed 9523.11 samples/sec Loss 7.1332 LearningRate 0.0551 Epoch: 5 Global Step: 85920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:36,977-Speed 9380.64 samples/sec Loss 7.1590 LearningRate 0.0551 Epoch: 5 Global Step: 85930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:38,058-Speed 9480.48 samples/sec Loss 7.0356 LearningRate 0.0551 Epoch: 5 Global Step: 85940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:39,108-Speed 9751.69 samples/sec Loss 7.0396 LearningRate 0.0551 Epoch: 5 Global Step: 85950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 14:59:40,151-Speed 9821.58 samples/sec Loss 7.2042 LearningRate 0.0551 Epoch: 5 Global Step: 85960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:41,271-Speed 9149.52 samples/sec Loss 7.1476 LearningRate 0.0551 Epoch: 5 Global Step: 85970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:42,367-Speed 9356.61 samples/sec Loss 7.0326 LearningRate 0.0551 Epoch: 5 Global Step: 85980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:43,468-Speed 9302.02 samples/sec Loss 7.0634 LearningRate 0.0551 Epoch: 5 Global Step: 85990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 14:59:44,549-Speed 9474.01 samples/sec Loss 7.1865 LearningRate 0.0551 Epoch: 5 Global Step: 86000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:00:06,689-[lfw][86000]XNorm: 11.891397 Training: 2022-04-11 15:00:06,690-[lfw][86000]Accuracy-Flip: 0.99533+-0.00245 Training: 2022-04-11 15:00:06,690-[lfw][86000]Accuracy-Highest: 0.99667 Training: 2022-04-11 15:00:32,500-[cfp_fp][86000]XNorm: 10.047313 Training: 2022-04-11 15:00:32,501-[cfp_fp][86000]Accuracy-Flip: 0.95586+-0.00844 Training: 2022-04-11 15:00:32,501-[cfp_fp][86000]Accuracy-Highest: 0.95586 Training: 2022-04-11 15:00:54,823-[agedb_30][86000]XNorm: 11.431738 Training: 2022-04-11 15:00:54,824-[agedb_30][86000]Accuracy-Flip: 0.95933+-0.01148 Training: 2022-04-11 15:00:54,824-[agedb_30][86000]Accuracy-Highest: 0.96300 Training: 2022-04-11 15:00:55,944-Speed 143.43 samples/sec Loss 7.1047 LearningRate 0.0551 Epoch: 5 Global Step: 86010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:00:57,038-Speed 9366.05 samples/sec Loss 7.1425 LearningRate 0.0551 Epoch: 5 Global Step: 86020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:00:58,135-Speed 9339.29 samples/sec Loss 7.1574 LearningRate 0.0551 Epoch: 5 Global Step: 86030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:00:59,194-Speed 9671.59 samples/sec Loss 7.2798 LearningRate 0.0551 Epoch: 5 Global Step: 86040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:00,251-Speed 9691.01 samples/sec Loss 7.1124 LearningRate 0.0551 Epoch: 5 Global Step: 86050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:01,304-Speed 9729.28 samples/sec Loss 7.0614 LearningRate 0.0551 Epoch: 5 Global Step: 86060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:02,362-Speed 9689.08 samples/sec Loss 7.1979 LearningRate 0.0551 Epoch: 5 Global Step: 86070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:03,414-Speed 9739.31 samples/sec Loss 7.1514 LearningRate 0.0551 Epoch: 5 Global Step: 86080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:04,459-Speed 9803.13 samples/sec Loss 7.0827 LearningRate 0.0551 Epoch: 5 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:05,509-Speed 9763.54 samples/sec Loss 7.0652 LearningRate 0.0551 Epoch: 5 Global Step: 86100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:06,575-Speed 9611.40 samples/sec Loss 7.2372 LearningRate 0.0551 Epoch: 5 Global Step: 86110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:07,641-Speed 9611.60 samples/sec Loss 7.1074 LearningRate 0.0551 Epoch: 5 Global Step: 86120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:08,697-Speed 9699.12 samples/sec Loss 7.1648 LearningRate 0.0551 Epoch: 5 Global Step: 86130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:09,733-Speed 9887.92 samples/sec Loss 7.0889 LearningRate 0.0550 Epoch: 5 Global Step: 86140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:10,843-Speed 9226.60 samples/sec Loss 7.0605 LearningRate 0.0550 Epoch: 5 Global Step: 86150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:11,896-Speed 9735.15 samples/sec Loss 7.0957 LearningRate 0.0550 Epoch: 5 Global Step: 86160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:13,004-Speed 9249.95 samples/sec Loss 7.1264 LearningRate 0.0550 Epoch: 5 Global Step: 86170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:14,122-Speed 9164.27 samples/sec Loss 7.0999 LearningRate 0.0550 Epoch: 5 Global Step: 86180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:15,211-Speed 9407.75 samples/sec Loss 7.2206 LearningRate 0.0550 Epoch: 5 Global Step: 86190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:16,265-Speed 9721.08 samples/sec Loss 7.0943 LearningRate 0.0550 Epoch: 5 Global Step: 86200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:17,366-Speed 9313.29 samples/sec Loss 7.1412 LearningRate 0.0550 Epoch: 5 Global Step: 86210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:18,435-Speed 9595.40 samples/sec Loss 7.0612 LearningRate 0.0550 Epoch: 5 Global Step: 86220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:19,531-Speed 9341.25 samples/sec Loss 7.1311 LearningRate 0.0550 Epoch: 5 Global Step: 86230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:20,608-Speed 9519.99 samples/sec Loss 7.2390 LearningRate 0.0550 Epoch: 5 Global Step: 86240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:21,738-Speed 9064.15 samples/sec Loss 7.1310 LearningRate 0.0550 Epoch: 5 Global Step: 86250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:22,819-Speed 9477.96 samples/sec Loss 7.1708 LearningRate 0.0550 Epoch: 5 Global Step: 86260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:23,886-Speed 9603.82 samples/sec Loss 7.1111 LearningRate 0.0550 Epoch: 5 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:24,962-Speed 9522.27 samples/sec Loss 7.0485 LearningRate 0.0550 Epoch: 5 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:26,103-Speed 8980.12 samples/sec Loss 7.1060 LearningRate 0.0550 Epoch: 5 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:27,209-Speed 9264.82 samples/sec Loss 7.1748 LearningRate 0.0550 Epoch: 5 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:28,323-Speed 9192.87 samples/sec Loss 7.2298 LearningRate 0.0550 Epoch: 5 Global Step: 86310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:29,395-Speed 9560.08 samples/sec Loss 7.2221 LearningRate 0.0550 Epoch: 5 Global Step: 86320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:30,491-Speed 9348.60 samples/sec Loss 7.2603 LearningRate 0.0550 Epoch: 5 Global Step: 86330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:31,582-Speed 9384.41 samples/sec Loss 7.1664 LearningRate 0.0550 Epoch: 5 Global Step: 86340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:32,630-Speed 9781.56 samples/sec Loss 7.1631 LearningRate 0.0550 Epoch: 5 Global Step: 86350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:33,713-Speed 9455.50 samples/sec Loss 7.2920 LearningRate 0.0550 Epoch: 5 Global Step: 86360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:34,807-Speed 9365.58 samples/sec Loss 7.2270 LearningRate 0.0549 Epoch: 5 Global Step: 86370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:35,871-Speed 9635.94 samples/sec Loss 7.0627 LearningRate 0.0549 Epoch: 5 Global Step: 86380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:36,986-Speed 9188.36 samples/sec Loss 7.0772 LearningRate 0.0549 Epoch: 5 Global Step: 86390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:38,077-Speed 9391.56 samples/sec Loss 7.0783 LearningRate 0.0549 Epoch: 5 Global Step: 86400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:39,110-Speed 9919.79 samples/sec Loss 7.1842 LearningRate 0.0549 Epoch: 5 Global Step: 86410 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:01:40,173-Speed 9642.26 samples/sec Loss 7.3327 LearningRate 0.0549 Epoch: 5 Global Step: 86420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:41,262-Speed 9405.27 samples/sec Loss 7.0735 LearningRate 0.0549 Epoch: 5 Global Step: 86430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:42,342-Speed 9492.16 samples/sec Loss 7.1108 LearningRate 0.0549 Epoch: 5 Global Step: 86440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:43,418-Speed 9521.26 samples/sec Loss 7.0197 LearningRate 0.0549 Epoch: 5 Global Step: 86450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:44,533-Speed 9188.37 samples/sec Loss 7.1398 LearningRate 0.0549 Epoch: 5 Global Step: 86460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:45,619-Speed 9439.92 samples/sec Loss 7.2478 LearningRate 0.0549 Epoch: 5 Global Step: 86470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:46,729-Speed 9229.52 samples/sec Loss 7.1894 LearningRate 0.0549 Epoch: 5 Global Step: 86480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:47,812-Speed 9455.36 samples/sec Loss 7.2533 LearningRate 0.0549 Epoch: 5 Global Step: 86490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:48,872-Speed 9669.32 samples/sec Loss 7.1934 LearningRate 0.0549 Epoch: 5 Global Step: 86500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:49,976-Speed 9278.58 samples/sec Loss 7.0685 LearningRate 0.0549 Epoch: 5 Global Step: 86510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:01:51,039-Speed 9639.89 samples/sec Loss 7.2252 LearningRate 0.0549 Epoch: 5 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:52,087-Speed 9779.59 samples/sec Loss 7.0688 LearningRate 0.0549 Epoch: 5 Global Step: 86530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:53,154-Speed 9601.21 samples/sec Loss 7.1797 LearningRate 0.0549 Epoch: 5 Global Step: 86540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:54,249-Speed 9353.89 samples/sec Loss 7.0998 LearningRate 0.0549 Epoch: 5 Global Step: 86550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:55,323-Speed 9549.12 samples/sec Loss 7.1535 LearningRate 0.0549 Epoch: 5 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:56,382-Speed 9670.27 samples/sec Loss 7.1630 LearningRate 0.0549 Epoch: 5 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:57,474-Speed 9387.78 samples/sec Loss 7.0776 LearningRate 0.0549 Epoch: 5 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:58,582-Speed 9246.75 samples/sec Loss 7.2690 LearningRate 0.0549 Epoch: 5 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:01:59,639-Speed 9687.82 samples/sec Loss 7.1403 LearningRate 0.0548 Epoch: 5 Global Step: 86600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:00,688-Speed 9766.86 samples/sec Loss 7.2290 LearningRate 0.0548 Epoch: 5 Global Step: 86610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:01,763-Speed 9534.15 samples/sec Loss 7.3017 LearningRate 0.0548 Epoch: 5 Global Step: 86620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:02,808-Speed 9803.08 samples/sec Loss 7.1020 LearningRate 0.0548 Epoch: 5 Global Step: 86630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:03,877-Speed 9584.27 samples/sec Loss 7.2481 LearningRate 0.0548 Epoch: 5 Global Step: 86640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:04,973-Speed 9354.40 samples/sec Loss 7.0823 LearningRate 0.0548 Epoch: 5 Global Step: 86650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:06,057-Speed 9448.61 samples/sec Loss 7.2124 LearningRate 0.0548 Epoch: 5 Global Step: 86660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:07,125-Speed 9598.82 samples/sec Loss 7.2065 LearningRate 0.0548 Epoch: 5 Global Step: 86670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:08,216-Speed 9384.44 samples/sec Loss 7.1448 LearningRate 0.0548 Epoch: 5 Global Step: 86680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:09,318-Speed 9302.66 samples/sec Loss 7.1733 LearningRate 0.0548 Epoch: 5 Global Step: 86690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:10,376-Speed 9678.11 samples/sec Loss 7.2106 LearningRate 0.0548 Epoch: 5 Global Step: 86700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:11,424-Speed 9776.70 samples/sec Loss 7.1591 LearningRate 0.0548 Epoch: 5 Global Step: 86710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:12,497-Speed 9558.52 samples/sec Loss 7.1927 LearningRate 0.0548 Epoch: 5 Global Step: 86720 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:02:13,587-Speed 9398.27 samples/sec Loss 7.1331 LearningRate 0.0548 Epoch: 5 Global Step: 86730 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:02:14,697-Speed 9235.52 samples/sec Loss 7.0612 LearningRate 0.0548 Epoch: 5 Global Step: 86740 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:02:15,816-Speed 9153.81 samples/sec Loss 7.0964 LearningRate 0.0548 Epoch: 5 Global Step: 86750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:16,937-Speed 9135.75 samples/sec Loss 7.0657 LearningRate 0.0548 Epoch: 5 Global Step: 86760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:18,023-Speed 9436.43 samples/sec Loss 7.1105 LearningRate 0.0548 Epoch: 5 Global Step: 86770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:19,126-Speed 9293.96 samples/sec Loss 7.3183 LearningRate 0.0548 Epoch: 5 Global Step: 86780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:20,227-Speed 9301.25 samples/sec Loss 7.2513 LearningRate 0.0548 Epoch: 5 Global Step: 86790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:21,313-Speed 9434.29 samples/sec Loss 7.2318 LearningRate 0.0548 Epoch: 5 Global Step: 86800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:22,363-Speed 9762.82 samples/sec Loss 7.2943 LearningRate 0.0548 Epoch: 5 Global Step: 86810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:23,434-Speed 9560.13 samples/sec Loss 7.1621 LearningRate 0.0547 Epoch: 5 Global Step: 86820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:24,456-Speed 10029.80 samples/sec Loss 7.1429 LearningRate 0.0547 Epoch: 5 Global Step: 86830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:25,536-Speed 9486.85 samples/sec Loss 7.2028 LearningRate 0.0547 Epoch: 5 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:26,628-Speed 9380.99 samples/sec Loss 7.3205 LearningRate 0.0547 Epoch: 5 Global Step: 86850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:27,687-Speed 9679.40 samples/sec Loss 7.0954 LearningRate 0.0547 Epoch: 5 Global Step: 86860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:28,761-Speed 9533.10 samples/sec Loss 7.2379 LearningRate 0.0547 Epoch: 5 Global Step: 86870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:29,805-Speed 9814.23 samples/sec Loss 7.1204 LearningRate 0.0547 Epoch: 5 Global Step: 86880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:30,898-Speed 9380.12 samples/sec Loss 7.2733 LearningRate 0.0547 Epoch: 5 Global Step: 86890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:31,947-Speed 9759.65 samples/sec Loss 7.1019 LearningRate 0.0547 Epoch: 5 Global Step: 86900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:33,025-Speed 9506.61 samples/sec Loss 7.1541 LearningRate 0.0547 Epoch: 5 Global Step: 86910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:34,118-Speed 9382.86 samples/sec Loss 7.2904 LearningRate 0.0547 Epoch: 5 Global Step: 86920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:35,177-Speed 9667.41 samples/sec Loss 7.2722 LearningRate 0.0547 Epoch: 5 Global Step: 86930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:36,212-Speed 9906.90 samples/sec Loss 7.2257 LearningRate 0.0547 Epoch: 5 Global Step: 86940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:37,305-Speed 9374.51 samples/sec Loss 7.0806 LearningRate 0.0547 Epoch: 5 Global Step: 86950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:38,354-Speed 9763.00 samples/sec Loss 7.2803 LearningRate 0.0547 Epoch: 5 Global Step: 86960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:39,424-Speed 9583.03 samples/sec Loss 7.2246 LearningRate 0.0547 Epoch: 5 Global Step: 86970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:40,465-Speed 9834.75 samples/sec Loss 7.1731 LearningRate 0.0547 Epoch: 5 Global Step: 86980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:41,585-Speed 9154.63 samples/sec Loss 7.3176 LearningRate 0.0547 Epoch: 5 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:02:42,719-Speed 9038.87 samples/sec Loss 7.2891 LearningRate 0.0547 Epoch: 5 Global Step: 87000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:43,799-Speed 9484.37 samples/sec Loss 7.2346 LearningRate 0.0547 Epoch: 5 Global Step: 87010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:44,865-Speed 9614.67 samples/sec Loss 7.1389 LearningRate 0.0547 Epoch: 5 Global Step: 87020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:45,930-Speed 9617.31 samples/sec Loss 7.2756 LearningRate 0.0547 Epoch: 5 Global Step: 87030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:47,055-Speed 9112.12 samples/sec Loss 7.2346 LearningRate 0.0547 Epoch: 5 Global Step: 87040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:48,161-Speed 9259.53 samples/sec Loss 7.2132 LearningRate 0.0546 Epoch: 5 Global Step: 87050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:49,260-Speed 9321.89 samples/sec Loss 7.1887 LearningRate 0.0546 Epoch: 5 Global Step: 87060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:50,354-Speed 9366.25 samples/sec Loss 7.2960 LearningRate 0.0546 Epoch: 5 Global Step: 87070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:51,461-Speed 9256.51 samples/sec Loss 7.1516 LearningRate 0.0546 Epoch: 5 Global Step: 87080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:52,519-Speed 9685.26 samples/sec Loss 7.2574 LearningRate 0.0546 Epoch: 5 Global Step: 87090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:02:53,629-Speed 9229.78 samples/sec Loss 7.2128 LearningRate 0.0546 Epoch: 5 Global Step: 87100 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:02:54,703-Speed 9542.18 samples/sec Loss 7.1254 LearningRate 0.0546 Epoch: 5 Global Step: 87110 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:02:55,732-Speed 9961.15 samples/sec Loss 7.3072 LearningRate 0.0546 Epoch: 5 Global Step: 87120 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:02:56,789-Speed 9694.26 samples/sec Loss 7.1832 LearningRate 0.0546 Epoch: 5 Global Step: 87130 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:02:57,839-Speed 9751.18 samples/sec Loss 7.1449 LearningRate 0.0546 Epoch: 5 Global Step: 87140 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:02:58,964-Speed 9113.81 samples/sec Loss 7.1773 LearningRate 0.0546 Epoch: 5 Global Step: 87150 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:03:00,019-Speed 9704.62 samples/sec Loss 7.1491 LearningRate 0.0546 Epoch: 5 Global Step: 87160 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:03:01,038-Speed 10061.93 samples/sec Loss 7.2938 LearningRate 0.0546 Epoch: 5 Global Step: 87170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:02,088-Speed 9754.39 samples/sec Loss 7.3057 LearningRate 0.0546 Epoch: 5 Global Step: 87180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:03,129-Speed 9840.65 samples/sec Loss 7.2098 LearningRate 0.0546 Epoch: 5 Global Step: 87190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:04,217-Speed 9422.65 samples/sec Loss 7.2344 LearningRate 0.0546 Epoch: 5 Global Step: 87200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:05,268-Speed 9745.41 samples/sec Loss 7.2436 LearningRate 0.0546 Epoch: 5 Global Step: 87210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:06,371-Speed 9288.21 samples/sec Loss 7.1597 LearningRate 0.0546 Epoch: 5 Global Step: 87220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:07,457-Speed 9435.34 samples/sec Loss 7.0765 LearningRate 0.0546 Epoch: 5 Global Step: 87230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:08,516-Speed 9678.52 samples/sec Loss 7.1177 LearningRate 0.0546 Epoch: 5 Global Step: 87240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:09,605-Speed 9407.65 samples/sec Loss 7.2296 LearningRate 0.0546 Epoch: 5 Global Step: 87250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:10,679-Speed 9540.62 samples/sec Loss 7.1531 LearningRate 0.0546 Epoch: 5 Global Step: 87260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:11,790-Speed 9226.09 samples/sec Loss 7.2920 LearningRate 0.0545 Epoch: 5 Global Step: 87270 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:03:12,872-Speed 9470.03 samples/sec Loss 7.3143 LearningRate 0.0545 Epoch: 5 Global Step: 87280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:13,921-Speed 9766.67 samples/sec Loss 7.2698 LearningRate 0.0545 Epoch: 5 Global Step: 87290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:15,017-Speed 9353.49 samples/sec Loss 7.2418 LearningRate 0.0545 Epoch: 5 Global Step: 87300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:16,103-Speed 9430.81 samples/sec Loss 7.1057 LearningRate 0.0545 Epoch: 5 Global Step: 87310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:17,154-Speed 9749.40 samples/sec Loss 7.2333 LearningRate 0.0545 Epoch: 5 Global Step: 87320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:18,219-Speed 9624.55 samples/sec Loss 7.0442 LearningRate 0.0545 Epoch: 5 Global Step: 87330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:19,289-Speed 9571.74 samples/sec Loss 7.1768 LearningRate 0.0545 Epoch: 5 Global Step: 87340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:20,345-Speed 9702.57 samples/sec Loss 7.2989 LearningRate 0.0545 Epoch: 5 Global Step: 87350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:21,416-Speed 9564.70 samples/sec Loss 7.2026 LearningRate 0.0545 Epoch: 5 Global Step: 87360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:22,519-Speed 9295.96 samples/sec Loss 7.2052 LearningRate 0.0545 Epoch: 5 Global Step: 87370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:23,578-Speed 9667.58 samples/sec Loss 7.0907 LearningRate 0.0545 Epoch: 5 Global Step: 87380 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:03:24,643-Speed 9620.47 samples/sec Loss 7.1819 LearningRate 0.0545 Epoch: 5 Global Step: 87390 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:03:25,760-Speed 9176.12 samples/sec Loss 7.2404 LearningRate 0.0545 Epoch: 5 Global Step: 87400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:26,857-Speed 9338.68 samples/sec Loss 7.3331 LearningRate 0.0545 Epoch: 5 Global Step: 87410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:27,954-Speed 9346.82 samples/sec Loss 7.2319 LearningRate 0.0545 Epoch: 5 Global Step: 87420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:29,023-Speed 9579.80 samples/sec Loss 7.3196 LearningRate 0.0545 Epoch: 5 Global Step: 87430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:30,113-Speed 9400.65 samples/sec Loss 7.4292 LearningRate 0.0545 Epoch: 5 Global Step: 87440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:31,185-Speed 9561.89 samples/sec Loss 7.3053 LearningRate 0.0545 Epoch: 5 Global Step: 87450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:32,258-Speed 9545.37 samples/sec Loss 7.2099 LearningRate 0.0545 Epoch: 5 Global Step: 87460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:33,355-Speed 9338.03 samples/sec Loss 7.1778 LearningRate 0.0545 Epoch: 5 Global Step: 87470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:34,453-Speed 9332.13 samples/sec Loss 7.2071 LearningRate 0.0545 Epoch: 5 Global Step: 87480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:35,544-Speed 9394.19 samples/sec Loss 7.1843 LearningRate 0.0545 Epoch: 5 Global Step: 87490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:36,649-Speed 9278.93 samples/sec Loss 7.1985 LearningRate 0.0544 Epoch: 5 Global Step: 87500 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:03:37,710-Speed 9649.37 samples/sec Loss 7.2056 LearningRate 0.0544 Epoch: 5 Global Step: 87510 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:03:38,776-Speed 9612.86 samples/sec Loss 7.2755 LearningRate 0.0544 Epoch: 5 Global Step: 87520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:03:39,824-Speed 9778.67 samples/sec Loss 7.1536 LearningRate 0.0544 Epoch: 5 Global Step: 87530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:40,887-Speed 9636.59 samples/sec Loss 7.1812 LearningRate 0.0544 Epoch: 5 Global Step: 87540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:41,996-Speed 9235.33 samples/sec Loss 7.1547 LearningRate 0.0544 Epoch: 5 Global Step: 87550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:03:43,098-Speed 9306.23 samples/sec Loss 7.1908 LearningRate 0.0544 Epoch: 5 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:03:44,172-Speed 9540.41 samples/sec Loss 7.1955 LearningRate 0.0544 Epoch: 5 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:03:45,231-Speed 9677.51 samples/sec Loss 7.1803 LearningRate 0.0544 Epoch: 5 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:03:46,290-Speed 9675.03 samples/sec Loss 7.2064 LearningRate 0.0544 Epoch: 5 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:03:47,374-Speed 9454.87 samples/sec Loss 7.1008 LearningRate 0.0544 Epoch: 5 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:03:48,453-Speed 9499.00 samples/sec Loss 7.2947 LearningRate 0.0544 Epoch: 5 Global Step: 87610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:03:49,528-Speed 9530.73 samples/sec Loss 7.2389 LearningRate 0.0544 Epoch: 5 Global Step: 87620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:03:50,595-Speed 9602.90 samples/sec Loss 7.1471 LearningRate 0.0544 Epoch: 5 Global Step: 87630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:03:51,628-Speed 9913.09 samples/sec Loss 7.2556 LearningRate 0.0544 Epoch: 5 Global Step: 87640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:03:52,679-Speed 9751.88 samples/sec Loss 7.1777 LearningRate 0.0544 Epoch: 5 Global Step: 87650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:53,735-Speed 9696.84 samples/sec Loss 7.2204 LearningRate 0.0544 Epoch: 5 Global Step: 87660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:54,826-Speed 9392.50 samples/sec Loss 7.1464 LearningRate 0.0544 Epoch: 5 Global Step: 87670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:55,920-Speed 9369.26 samples/sec Loss 7.2133 LearningRate 0.0544 Epoch: 5 Global Step: 87680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:56,990-Speed 9579.97 samples/sec Loss 7.1333 LearningRate 0.0544 Epoch: 5 Global Step: 87690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:58,026-Speed 9884.04 samples/sec Loss 7.2496 LearningRate 0.0544 Epoch: 5 Global Step: 87700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:03:59,126-Speed 9320.89 samples/sec Loss 7.1596 LearningRate 0.0544 Epoch: 5 Global Step: 87710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:00,215-Speed 9407.57 samples/sec Loss 7.2093 LearningRate 0.0543 Epoch: 5 Global Step: 87720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:01,263-Speed 9773.53 samples/sec Loss 7.2359 LearningRate 0.0543 Epoch: 5 Global Step: 87730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:02,332-Speed 9589.55 samples/sec Loss 7.2585 LearningRate 0.0543 Epoch: 5 Global Step: 87740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:04:03,432-Speed 9315.08 samples/sec Loss 7.3079 LearningRate 0.0543 Epoch: 5 Global Step: 87750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:04:04,504-Speed 9553.26 samples/sec Loss 7.0741 LearningRate 0.0543 Epoch: 5 Global Step: 87760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:04:05,563-Speed 9676.51 samples/sec Loss 7.3083 LearningRate 0.0543 Epoch: 5 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:04:06,664-Speed 9307.83 samples/sec Loss 7.3145 LearningRate 0.0543 Epoch: 5 Global Step: 87780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:04:07,777-Speed 9203.56 samples/sec Loss 7.3658 LearningRate 0.0543 Epoch: 5 Global Step: 87790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:04:08,861-Speed 9458.12 samples/sec Loss 7.3244 LearningRate 0.0543 Epoch: 5 Global Step: 87800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:04:09,955-Speed 9359.33 samples/sec Loss 7.2663 LearningRate 0.0543 Epoch: 5 Global Step: 87810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:04:11,031-Speed 9528.98 samples/sec Loss 7.2554 LearningRate 0.0543 Epoch: 5 Global Step: 87820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:04:12,134-Speed 9286.60 samples/sec Loss 7.1497 LearningRate 0.0543 Epoch: 5 Global Step: 87830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:04:13,210-Speed 9528.87 samples/sec Loss 7.2624 LearningRate 0.0543 Epoch: 5 Global Step: 87840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:14,332-Speed 9130.37 samples/sec Loss 7.2130 LearningRate 0.0543 Epoch: 5 Global Step: 87850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:15,366-Speed 9906.31 samples/sec Loss 7.2926 LearningRate 0.0543 Epoch: 5 Global Step: 87860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:16,422-Speed 9706.29 samples/sec Loss 7.2266 LearningRate 0.0543 Epoch: 5 Global Step: 87870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:17,471-Speed 9766.57 samples/sec Loss 7.2382 LearningRate 0.0543 Epoch: 5 Global Step: 87880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:18,518-Speed 9783.05 samples/sec Loss 7.4150 LearningRate 0.0543 Epoch: 5 Global Step: 87890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:19,619-Speed 9306.67 samples/sec Loss 7.2523 LearningRate 0.0543 Epoch: 5 Global Step: 87900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:20,712-Speed 9376.13 samples/sec Loss 7.2075 LearningRate 0.0543 Epoch: 5 Global Step: 87910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:21,779-Speed 9598.23 samples/sec Loss 7.2294 LearningRate 0.0543 Epoch: 5 Global Step: 87920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:22,876-Speed 9343.54 samples/sec Loss 7.2053 LearningRate 0.0543 Epoch: 5 Global Step: 87930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:23,978-Speed 9293.31 samples/sec Loss 7.2491 LearningRate 0.0543 Epoch: 5 Global Step: 87940 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:04:25,053-Speed 9530.18 samples/sec Loss 7.2457 LearningRate 0.0542 Epoch: 5 Global Step: 87950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:26,157-Speed 9291.01 samples/sec Loss 7.1738 LearningRate 0.0542 Epoch: 5 Global Step: 87960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:27,214-Speed 9692.31 samples/sec Loss 7.2873 LearningRate 0.0542 Epoch: 5 Global Step: 87970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:28,272-Speed 9677.21 samples/sec Loss 7.3177 LearningRate 0.0542 Epoch: 5 Global Step: 87980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:29,312-Speed 9857.06 samples/sec Loss 7.2060 LearningRate 0.0542 Epoch: 5 Global Step: 87990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:30,426-Speed 9194.74 samples/sec Loss 7.2623 LearningRate 0.0542 Epoch: 5 Global Step: 88000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:04:52,440-[lfw][88000]XNorm: 12.012908 Training: 2022-04-11 15:04:52,441-[lfw][88000]Accuracy-Flip: 0.99617+-0.00224 Training: 2022-04-11 15:04:52,441-[lfw][88000]Accuracy-Highest: 0.99667 Training: 2022-04-11 15:05:17,772-[cfp_fp][88000]XNorm: 10.177834 Training: 2022-04-11 15:05:17,773-[cfp_fp][88000]Accuracy-Flip: 0.95729+-0.00920 Training: 2022-04-11 15:05:17,774-[cfp_fp][88000]Accuracy-Highest: 0.95729 Training: 2022-04-11 15:05:39,596-[agedb_30][88000]XNorm: 11.606030 Training: 2022-04-11 15:05:39,597-[agedb_30][88000]Accuracy-Flip: 0.96317+-0.00911 Training: 2022-04-11 15:05:39,597-[agedb_30][88000]Accuracy-Highest: 0.96317 Training: 2022-04-11 15:05:40,686-Speed 145.75 samples/sec Loss 7.2709 LearningRate 0.0542 Epoch: 5 Global Step: 88010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:05:41,762-Speed 9516.47 samples/sec Loss 7.3265 LearningRate 0.0542 Epoch: 5 Global Step: 88020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:05:42,896-Speed 9033.31 samples/sec Loss 7.2012 LearningRate 0.0542 Epoch: 5 Global Step: 88030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:05:43,984-Speed 9424.56 samples/sec Loss 7.2741 LearningRate 0.0542 Epoch: 5 Global Step: 88040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:05:45,080-Speed 9344.56 samples/sec Loss 7.3115 LearningRate 0.0542 Epoch: 5 Global Step: 88050 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:46,134-Speed 9722.17 samples/sec Loss 7.3323 LearningRate 0.0542 Epoch: 5 Global Step: 88060 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:47,191-Speed 9697.37 samples/sec Loss 7.2909 LearningRate 0.0542 Epoch: 5 Global Step: 88070 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:48,305-Speed 9193.44 samples/sec Loss 7.3133 LearningRate 0.0542 Epoch: 5 Global Step: 88080 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:49,429-Speed 9114.84 samples/sec Loss 7.3610 LearningRate 0.0542 Epoch: 5 Global Step: 88090 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:50,542-Speed 9204.00 samples/sec Loss 7.2957 LearningRate 0.0542 Epoch: 5 Global Step: 88100 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:51,580-Speed 9875.04 samples/sec Loss 7.3536 LearningRate 0.0542 Epoch: 5 Global Step: 88110 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:52,646-Speed 9613.77 samples/sec Loss 7.3349 LearningRate 0.0542 Epoch: 5 Global Step: 88120 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:53,710-Speed 9624.68 samples/sec Loss 7.2476 LearningRate 0.0542 Epoch: 5 Global Step: 88130 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:54,836-Speed 9102.82 samples/sec Loss 7.3290 LearningRate 0.0542 Epoch: 5 Global Step: 88140 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:55,943-Speed 9258.07 samples/sec Loss 7.1724 LearningRate 0.0542 Epoch: 5 Global Step: 88150 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:57,033-Speed 9402.07 samples/sec Loss 7.2229 LearningRate 0.0542 Epoch: 5 Global Step: 88160 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:05:58,069-Speed 9891.88 samples/sec Loss 7.2902 LearningRate 0.0542 Epoch: 5 Global Step: 88170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:05:59,120-Speed 9743.83 samples/sec Loss 7.3238 LearningRate 0.0541 Epoch: 5 Global Step: 88180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:00,192-Speed 9561.09 samples/sec Loss 7.3496 LearningRate 0.0541 Epoch: 5 Global Step: 88190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:01,236-Speed 9819.96 samples/sec Loss 7.2609 LearningRate 0.0541 Epoch: 5 Global Step: 88200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:02,313-Speed 9506.26 samples/sec Loss 7.2888 LearningRate 0.0541 Epoch: 5 Global Step: 88210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:03,411-Speed 9330.88 samples/sec Loss 7.1464 LearningRate 0.0541 Epoch: 5 Global Step: 88220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:04,524-Speed 9208.42 samples/sec Loss 7.2707 LearningRate 0.0541 Epoch: 5 Global Step: 88230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:05,609-Speed 9442.25 samples/sec Loss 7.2336 LearningRate 0.0541 Epoch: 5 Global Step: 88240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:06,706-Speed 9339.83 samples/sec Loss 7.1428 LearningRate 0.0541 Epoch: 5 Global Step: 88250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:07,787-Speed 9479.30 samples/sec Loss 7.1598 LearningRate 0.0541 Epoch: 5 Global Step: 88260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:08,889-Speed 9299.04 samples/sec Loss 7.3662 LearningRate 0.0541 Epoch: 5 Global Step: 88270 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:06:09,939-Speed 9754.14 samples/sec Loss 7.3134 LearningRate 0.0541 Epoch: 5 Global Step: 88280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:11,012-Speed 9549.17 samples/sec Loss 7.1696 LearningRate 0.0541 Epoch: 5 Global Step: 88290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:12,080-Speed 9598.69 samples/sec Loss 7.2360 LearningRate 0.0541 Epoch: 5 Global Step: 88300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:13,152-Speed 9557.07 samples/sec Loss 7.2905 LearningRate 0.0541 Epoch: 5 Global Step: 88310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:14,207-Speed 9715.54 samples/sec Loss 7.2175 LearningRate 0.0541 Epoch: 5 Global Step: 88320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:15,264-Speed 9690.34 samples/sec Loss 7.2551 LearningRate 0.0541 Epoch: 5 Global Step: 88330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:16,293-Speed 9956.92 samples/sec Loss 7.2131 LearningRate 0.0541 Epoch: 5 Global Step: 88340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:17,369-Speed 9523.92 samples/sec Loss 7.2235 LearningRate 0.0541 Epoch: 5 Global Step: 88350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:18,463-Speed 9365.66 samples/sec Loss 7.2223 LearningRate 0.0541 Epoch: 5 Global Step: 88360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:19,527-Speed 9622.83 samples/sec Loss 7.1805 LearningRate 0.0541 Epoch: 5 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:20,619-Speed 9390.47 samples/sec Loss 7.1861 LearningRate 0.0541 Epoch: 5 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:21,725-Speed 9264.84 samples/sec Loss 7.1885 LearningRate 0.0541 Epoch: 5 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:22,812-Speed 9421.17 samples/sec Loss 7.2173 LearningRate 0.0540 Epoch: 5 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:23,877-Speed 9619.99 samples/sec Loss 7.3066 LearningRate 0.0540 Epoch: 5 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:24,943-Speed 9615.37 samples/sec Loss 7.1975 LearningRate 0.0540 Epoch: 5 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:26,045-Speed 9297.78 samples/sec Loss 7.1900 LearningRate 0.0540 Epoch: 5 Global Step: 88430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:27,090-Speed 9796.81 samples/sec Loss 7.3482 LearningRate 0.0540 Epoch: 5 Global Step: 88440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:28,184-Speed 9369.76 samples/sec Loss 7.2421 LearningRate 0.0540 Epoch: 5 Global Step: 88450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:29,248-Speed 9627.44 samples/sec Loss 7.2508 LearningRate 0.0540 Epoch: 5 Global Step: 88460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:30,335-Speed 9429.62 samples/sec Loss 7.2004 LearningRate 0.0540 Epoch: 5 Global Step: 88470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:31,359-Speed 10003.77 samples/sec Loss 7.2872 LearningRate 0.0540 Epoch: 5 Global Step: 88480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:32,404-Speed 9820.96 samples/sec Loss 7.1653 LearningRate 0.0540 Epoch: 5 Global Step: 88490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:33,509-Speed 9269.24 samples/sec Loss 7.3488 LearningRate 0.0540 Epoch: 5 Global Step: 88500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:34,599-Speed 9399.01 samples/sec Loss 7.2457 LearningRate 0.0540 Epoch: 5 Global Step: 88510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:35,703-Speed 9282.42 samples/sec Loss 7.2804 LearningRate 0.0540 Epoch: 5 Global Step: 88520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:36,795-Speed 9379.72 samples/sec Loss 7.2590 LearningRate 0.0540 Epoch: 5 Global Step: 88530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:37,881-Speed 9439.65 samples/sec Loss 7.1849 LearningRate 0.0540 Epoch: 5 Global Step: 88540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:38,978-Speed 9338.56 samples/sec Loss 7.3498 LearningRate 0.0540 Epoch: 5 Global Step: 88550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:40,065-Speed 9426.45 samples/sec Loss 7.2601 LearningRate 0.0540 Epoch: 5 Global Step: 88560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:41,147-Speed 9468.37 samples/sec Loss 7.3392 LearningRate 0.0540 Epoch: 5 Global Step: 88570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:42,248-Speed 9302.32 samples/sec Loss 7.1643 LearningRate 0.0540 Epoch: 5 Global Step: 88580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:43,359-Speed 9221.87 samples/sec Loss 7.2092 LearningRate 0.0540 Epoch: 5 Global Step: 88590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:44,442-Speed 9469.57 samples/sec Loss 7.3214 LearningRate 0.0540 Epoch: 5 Global Step: 88600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:45,508-Speed 9609.97 samples/sec Loss 7.3166 LearningRate 0.0540 Epoch: 5 Global Step: 88610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:46,610-Speed 9291.47 samples/sec Loss 7.2913 LearningRate 0.0540 Epoch: 5 Global Step: 88620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:47,682-Speed 9563.77 samples/sec Loss 7.3536 LearningRate 0.0539 Epoch: 5 Global Step: 88630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:48,757-Speed 9525.27 samples/sec Loss 7.3252 LearningRate 0.0539 Epoch: 5 Global Step: 88640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:49,859-Speed 9301.61 samples/sec Loss 7.3115 LearningRate 0.0539 Epoch: 5 Global Step: 88650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:50,924-Speed 9628.70 samples/sec Loss 7.3505 LearningRate 0.0539 Epoch: 5 Global Step: 88660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:52,013-Speed 9403.81 samples/sec Loss 7.3133 LearningRate 0.0539 Epoch: 5 Global Step: 88670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:53,094-Speed 9482.37 samples/sec Loss 7.3146 LearningRate 0.0539 Epoch: 5 Global Step: 88680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:54,159-Speed 9622.24 samples/sec Loss 7.2966 LearningRate 0.0539 Epoch: 5 Global Step: 88690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:55,287-Speed 9078.87 samples/sec Loss 7.3577 LearningRate 0.0539 Epoch: 5 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:56,365-Speed 9506.77 samples/sec Loss 7.2956 LearningRate 0.0539 Epoch: 5 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:57,402-Speed 9879.86 samples/sec Loss 7.2182 LearningRate 0.0539 Epoch: 5 Global Step: 88720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:06:58,474-Speed 9552.28 samples/sec Loss 7.2972 LearningRate 0.0539 Epoch: 5 Global Step: 88730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:06:59,546-Speed 9562.03 samples/sec Loss 7.2352 LearningRate 0.0539 Epoch: 5 Global Step: 88740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:00,618-Speed 9554.14 samples/sec Loss 7.1791 LearningRate 0.0539 Epoch: 5 Global Step: 88750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:01,710-Speed 9383.44 samples/sec Loss 7.2916 LearningRate 0.0539 Epoch: 5 Global Step: 88760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:02,769-Speed 9677.72 samples/sec Loss 7.2885 LearningRate 0.0539 Epoch: 5 Global Step: 88770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:03,875-Speed 9267.03 samples/sec Loss 7.2420 LearningRate 0.0539 Epoch: 5 Global Step: 88780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:04,942-Speed 9601.53 samples/sec Loss 7.2849 LearningRate 0.0539 Epoch: 5 Global Step: 88790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:06,031-Speed 9411.10 samples/sec Loss 7.3788 LearningRate 0.0539 Epoch: 5 Global Step: 88800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:07,095-Speed 9625.33 samples/sec Loss 7.1662 LearningRate 0.0539 Epoch: 5 Global Step: 88810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:08,166-Speed 9569.33 samples/sec Loss 7.1852 LearningRate 0.0539 Epoch: 5 Global Step: 88820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:09,230-Speed 9636.21 samples/sec Loss 7.2251 LearningRate 0.0539 Epoch: 5 Global Step: 88830 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:10,325-Speed 9355.09 samples/sec Loss 7.2746 LearningRate 0.0539 Epoch: 5 Global Step: 88840 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:11,434-Speed 9237.11 samples/sec Loss 7.2630 LearningRate 0.0539 Epoch: 5 Global Step: 88850 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:12,561-Speed 9087.21 samples/sec Loss 7.2636 LearningRate 0.0538 Epoch: 5 Global Step: 88860 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:13,669-Speed 9248.42 samples/sec Loss 7.3084 LearningRate 0.0538 Epoch: 5 Global Step: 88870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:14,720-Speed 9757.46 samples/sec Loss 7.1736 LearningRate 0.0538 Epoch: 5 Global Step: 88880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:15,765-Speed 9801.82 samples/sec Loss 7.2267 LearningRate 0.0538 Epoch: 5 Global Step: 88890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:16,853-Speed 9418.10 samples/sec Loss 7.2077 LearningRate 0.0538 Epoch: 5 Global Step: 88900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:17,937-Speed 9451.45 samples/sec Loss 7.2907 LearningRate 0.0538 Epoch: 5 Global Step: 88910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:19,018-Speed 9480.98 samples/sec Loss 7.2861 LearningRate 0.0538 Epoch: 5 Global Step: 88920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:20,071-Speed 9723.48 samples/sec Loss 7.3645 LearningRate 0.0538 Epoch: 5 Global Step: 88930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:21,186-Speed 9194.78 samples/sec Loss 7.2476 LearningRate 0.0538 Epoch: 5 Global Step: 88940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:22,297-Speed 9219.68 samples/sec Loss 7.4111 LearningRate 0.0538 Epoch: 5 Global Step: 88950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:23,375-Speed 9501.89 samples/sec Loss 7.3138 LearningRate 0.0538 Epoch: 5 Global Step: 88960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:24,528-Speed 8891.87 samples/sec Loss 7.2730 LearningRate 0.0538 Epoch: 5 Global Step: 88970 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:25,593-Speed 9616.51 samples/sec Loss 7.2165 LearningRate 0.0538 Epoch: 5 Global Step: 88980 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:26,702-Speed 9244.46 samples/sec Loss 7.2984 LearningRate 0.0538 Epoch: 5 Global Step: 88990 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:27,769-Speed 9604.21 samples/sec Loss 7.2252 LearningRate 0.0538 Epoch: 5 Global Step: 89000 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:28,853-Speed 9445.17 samples/sec Loss 7.3222 LearningRate 0.0538 Epoch: 5 Global Step: 89010 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:29,878-Speed 9995.61 samples/sec Loss 7.2213 LearningRate 0.0538 Epoch: 5 Global Step: 89020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:30,961-Speed 9468.69 samples/sec Loss 7.3860 LearningRate 0.0538 Epoch: 5 Global Step: 89030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:32,018-Speed 9690.86 samples/sec Loss 7.0981 LearningRate 0.0538 Epoch: 5 Global Step: 89040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:33,071-Speed 9732.54 samples/sec Loss 7.3127 LearningRate 0.0538 Epoch: 5 Global Step: 89050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:34,160-Speed 9413.80 samples/sec Loss 7.3130 LearningRate 0.0538 Epoch: 5 Global Step: 89060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:35,243-Speed 9453.04 samples/sec Loss 7.2717 LearningRate 0.0538 Epoch: 5 Global Step: 89070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:36,337-Speed 9366.05 samples/sec Loss 7.3004 LearningRate 0.0538 Epoch: 5 Global Step: 89080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:37,402-Speed 9622.14 samples/sec Loss 7.2294 LearningRate 0.0537 Epoch: 5 Global Step: 89090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:38,462-Speed 9667.26 samples/sec Loss 7.2639 LearningRate 0.0537 Epoch: 5 Global Step: 89100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:39,511-Speed 9771.45 samples/sec Loss 7.3230 LearningRate 0.0537 Epoch: 5 Global Step: 89110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:40,587-Speed 9518.65 samples/sec Loss 7.2323 LearningRate 0.0537 Epoch: 5 Global Step: 89120 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:41,635-Speed 9770.90 samples/sec Loss 7.3194 LearningRate 0.0537 Epoch: 5 Global Step: 89130 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:07:42,728-Speed 9383.63 samples/sec Loss 7.2325 LearningRate 0.0537 Epoch: 5 Global Step: 89140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:43,794-Speed 9613.45 samples/sec Loss 7.2575 LearningRate 0.0537 Epoch: 5 Global Step: 89150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:44,856-Speed 9646.02 samples/sec Loss 7.2808 LearningRate 0.0537 Epoch: 5 Global Step: 89160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:45,959-Speed 9289.88 samples/sec Loss 7.2186 LearningRate 0.0537 Epoch: 5 Global Step: 89170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:47,026-Speed 9606.69 samples/sec Loss 7.1181 LearningRate 0.0537 Epoch: 5 Global Step: 89180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:48,066-Speed 9854.31 samples/sec Loss 7.2453 LearningRate 0.0537 Epoch: 5 Global Step: 89190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:49,170-Speed 9282.18 samples/sec Loss 7.3231 LearningRate 0.0537 Epoch: 5 Global Step: 89200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:50,224-Speed 9713.51 samples/sec Loss 7.2986 LearningRate 0.0537 Epoch: 5 Global Step: 89210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:51,350-Speed 9109.91 samples/sec Loss 7.1935 LearningRate 0.0537 Epoch: 5 Global Step: 89220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:52,390-Speed 9846.34 samples/sec Loss 7.2490 LearningRate 0.0537 Epoch: 5 Global Step: 89230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:53,450-Speed 9661.97 samples/sec Loss 7.2426 LearningRate 0.0537 Epoch: 5 Global Step: 89240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:54,549-Speed 9324.27 samples/sec Loss 7.3008 LearningRate 0.0537 Epoch: 5 Global Step: 89250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:55,629-Speed 9487.65 samples/sec Loss 7.2556 LearningRate 0.0537 Epoch: 5 Global Step: 89260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:56,685-Speed 9701.40 samples/sec Loss 7.1662 LearningRate 0.0537 Epoch: 5 Global Step: 89270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:57,803-Speed 9165.41 samples/sec Loss 7.2939 LearningRate 0.0537 Epoch: 5 Global Step: 89280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:58,853-Speed 9758.31 samples/sec Loss 7.2763 LearningRate 0.0537 Epoch: 5 Global Step: 89290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:07:59,935-Speed 9478.03 samples/sec Loss 7.3149 LearningRate 0.0537 Epoch: 5 Global Step: 89300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:01,021-Speed 9434.07 samples/sec Loss 7.4099 LearningRate 0.0536 Epoch: 5 Global Step: 89310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:02,101-Speed 9485.92 samples/sec Loss 7.4413 LearningRate 0.0536 Epoch: 5 Global Step: 89320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:03,200-Speed 9322.91 samples/sec Loss 7.2388 LearningRate 0.0536 Epoch: 5 Global Step: 89330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:04,264-Speed 9631.38 samples/sec Loss 7.1874 LearningRate 0.0536 Epoch: 5 Global Step: 89340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:05,329-Speed 9619.27 samples/sec Loss 7.2322 LearningRate 0.0536 Epoch: 5 Global Step: 89350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:06,374-Speed 9811.87 samples/sec Loss 7.2797 LearningRate 0.0536 Epoch: 5 Global Step: 89360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:07,472-Speed 9324.83 samples/sec Loss 7.2639 LearningRate 0.0536 Epoch: 5 Global Step: 89370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:08,604-Speed 9056.85 samples/sec Loss 7.3196 LearningRate 0.0536 Epoch: 5 Global Step: 89380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:09,673-Speed 9582.38 samples/sec Loss 7.2919 LearningRate 0.0536 Epoch: 5 Global Step: 89390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:10,757-Speed 9452.77 samples/sec Loss 7.3398 LearningRate 0.0536 Epoch: 5 Global Step: 89400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:11,849-Speed 9383.10 samples/sec Loss 7.2938 LearningRate 0.0536 Epoch: 5 Global Step: 89410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:12,893-Speed 9807.40 samples/sec Loss 7.3769 LearningRate 0.0536 Epoch: 5 Global Step: 89420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:13,978-Speed 9445.97 samples/sec Loss 7.4404 LearningRate 0.0536 Epoch: 5 Global Step: 89430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:15,027-Speed 9767.00 samples/sec Loss 7.2070 LearningRate 0.0536 Epoch: 5 Global Step: 89440 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:08:16,084-Speed 9694.14 samples/sec Loss 7.2638 LearningRate 0.0536 Epoch: 5 Global Step: 89450 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:08:17,124-Speed 9856.18 samples/sec Loss 7.2539 LearningRate 0.0536 Epoch: 5 Global Step: 89460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:18,205-Speed 9479.68 samples/sec Loss 7.2972 LearningRate 0.0536 Epoch: 5 Global Step: 89470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:19,270-Speed 9624.89 samples/sec Loss 7.2805 LearningRate 0.0536 Epoch: 5 Global Step: 89480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:20,361-Speed 9393.95 samples/sec Loss 7.2298 LearningRate 0.0536 Epoch: 5 Global Step: 89490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:21,469-Speed 9251.99 samples/sec Loss 7.3183 LearningRate 0.0536 Epoch: 5 Global Step: 89500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:22,562-Speed 9371.94 samples/sec Loss 7.3187 LearningRate 0.0536 Epoch: 5 Global Step: 89510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:23,648-Speed 9434.82 samples/sec Loss 7.2868 LearningRate 0.0536 Epoch: 5 Global Step: 89520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:24,724-Speed 9522.26 samples/sec Loss 7.2625 LearningRate 0.0536 Epoch: 5 Global Step: 89530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:25,821-Speed 9335.44 samples/sec Loss 7.2291 LearningRate 0.0535 Epoch: 5 Global Step: 89540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:26,880-Speed 9670.85 samples/sec Loss 7.2066 LearningRate 0.0535 Epoch: 5 Global Step: 89550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:27,950-Speed 9576.02 samples/sec Loss 7.3333 LearningRate 0.0535 Epoch: 5 Global Step: 89560 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:08:29,021-Speed 9571.31 samples/sec Loss 7.3928 LearningRate 0.0535 Epoch: 5 Global Step: 89570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:08:30,103-Speed 9468.06 samples/sec Loss 7.3194 LearningRate 0.0535 Epoch: 5 Global Step: 89580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:08:31,196-Speed 9373.85 samples/sec Loss 7.2056 LearningRate 0.0535 Epoch: 5 Global Step: 89590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:08:32,357-Speed 8824.97 samples/sec Loss 7.2641 LearningRate 0.0535 Epoch: 5 Global Step: 89600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:08:33,441-Speed 9455.39 samples/sec Loss 7.3140 LearningRate 0.0535 Epoch: 5 Global Step: 89610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:08:34,541-Speed 9318.63 samples/sec Loss 7.2795 LearningRate 0.0535 Epoch: 5 Global Step: 89620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:08:35,646-Speed 9264.78 samples/sec Loss 7.2566 LearningRate 0.0535 Epoch: 5 Global Step: 89630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:08:36,749-Speed 9295.04 samples/sec Loss 7.3047 LearningRate 0.0535 Epoch: 5 Global Step: 89640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:08:37,839-Speed 9401.82 samples/sec Loss 7.2448 LearningRate 0.0535 Epoch: 5 Global Step: 89650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:08:38,910-Speed 9567.06 samples/sec Loss 7.1749 LearningRate 0.0535 Epoch: 5 Global Step: 89660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:08:39,973-Speed 9633.54 samples/sec Loss 7.2190 LearningRate 0.0535 Epoch: 5 Global Step: 89670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:41,026-Speed 9730.95 samples/sec Loss 7.2320 LearningRate 0.0535 Epoch: 5 Global Step: 89680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:42,082-Speed 9704.14 samples/sec Loss 7.2680 LearningRate 0.0535 Epoch: 5 Global Step: 89690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:43,103-Speed 10028.83 samples/sec Loss 7.1828 LearningRate 0.0535 Epoch: 5 Global Step: 89700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:44,207-Speed 9287.64 samples/sec Loss 7.2526 LearningRate 0.0535 Epoch: 5 Global Step: 89710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:45,318-Speed 9220.45 samples/sec Loss 7.2910 LearningRate 0.0535 Epoch: 5 Global Step: 89720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:46,413-Speed 9355.87 samples/sec Loss 7.2496 LearningRate 0.0535 Epoch: 5 Global Step: 89730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:47,441-Speed 9964.27 samples/sec Loss 7.2257 LearningRate 0.0535 Epoch: 5 Global Step: 89740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:48,521-Speed 9488.11 samples/sec Loss 7.3270 LearningRate 0.0535 Epoch: 5 Global Step: 89750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:49,604-Speed 9459.59 samples/sec Loss 7.1199 LearningRate 0.0535 Epoch: 5 Global Step: 89760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:50,666-Speed 9653.68 samples/sec Loss 7.1530 LearningRate 0.0534 Epoch: 5 Global Step: 89770 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:08:51,760-Speed 9368.35 samples/sec Loss 7.2049 LearningRate 0.0534 Epoch: 5 Global Step: 89780 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:08:52,801-Speed 9837.41 samples/sec Loss 7.3182 LearningRate 0.0534 Epoch: 5 Global Step: 89790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:08:53,878-Speed 9513.74 samples/sec Loss 7.2601 LearningRate 0.0534 Epoch: 5 Global Step: 89800 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:08:54,975-Speed 9338.35 samples/sec Loss 7.2675 LearningRate 0.0534 Epoch: 5 Global Step: 89810 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:08:56,042-Speed 9604.08 samples/sec Loss 7.3601 LearningRate 0.0534 Epoch: 5 Global Step: 89820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:57,092-Speed 9766.11 samples/sec Loss 7.2794 LearningRate 0.0534 Epoch: 5 Global Step: 89830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:58,188-Speed 9342.97 samples/sec Loss 7.2747 LearningRate 0.0534 Epoch: 5 Global Step: 89840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:08:59,240-Speed 9735.86 samples/sec Loss 7.2578 LearningRate 0.0534 Epoch: 5 Global Step: 89850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:09:00,310-Speed 9579.17 samples/sec Loss 7.3317 LearningRate 0.0534 Epoch: 5 Global Step: 89860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:09:01,377-Speed 9604.94 samples/sec Loss 7.3207 LearningRate 0.0534 Epoch: 5 Global Step: 89870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:09:02,450-Speed 9546.41 samples/sec Loss 7.3232 LearningRate 0.0534 Epoch: 5 Global Step: 89880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:09:03,510-Speed 9673.20 samples/sec Loss 7.3537 LearningRate 0.0534 Epoch: 5 Global Step: 89890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:09:04,564-Speed 9714.65 samples/sec Loss 7.2833 LearningRate 0.0534 Epoch: 5 Global Step: 89900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:09:05,623-Speed 9682.35 samples/sec Loss 7.2152 LearningRate 0.0534 Epoch: 5 Global Step: 89910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:09:06,710-Speed 9419.11 samples/sec Loss 7.2596 LearningRate 0.0534 Epoch: 5 Global Step: 89920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:09:07,777-Speed 9603.42 samples/sec Loss 7.3491 LearningRate 0.0534 Epoch: 5 Global Step: 89930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:09:08,871-Speed 9367.45 samples/sec Loss 7.3233 LearningRate 0.0534 Epoch: 5 Global Step: 89940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:09:09,957-Speed 9432.50 samples/sec Loss 7.2545 LearningRate 0.0534 Epoch: 5 Global Step: 89950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:09:11,022-Speed 9621.47 samples/sec Loss 7.2058 LearningRate 0.0534 Epoch: 5 Global Step: 89960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:09:12,132-Speed 9228.38 samples/sec Loss 7.1736 LearningRate 0.0534 Epoch: 5 Global Step: 89970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:09:13,222-Speed 9401.11 samples/sec Loss 7.1605 LearningRate 0.0534 Epoch: 5 Global Step: 89980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:09:14,315-Speed 9380.71 samples/sec Loss 7.1934 LearningRate 0.0534 Epoch: 5 Global Step: 89990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:09:15,341-Speed 9988.35 samples/sec Loss 7.0752 LearningRate 0.0533 Epoch: 5 Global Step: 90000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:09:37,542-[lfw][90000]XNorm: 11.824306 Training: 2022-04-11 15:09:37,543-[lfw][90000]Accuracy-Flip: 0.99683+-0.00293 Training: 2022-04-11 15:09:37,543-[lfw][90000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:10:02,905-[cfp_fp][90000]XNorm: 9.925898 Training: 2022-04-11 15:10:02,906-[cfp_fp][90000]Accuracy-Flip: 0.95614+-0.01136 Training: 2022-04-11 15:10:02,906-[cfp_fp][90000]Accuracy-Highest: 0.95729 Training: 2022-04-11 15:10:24,923-[agedb_30][90000]XNorm: 11.266069 Training: 2022-04-11 15:10:24,924-[agedb_30][90000]Accuracy-Flip: 0.96033+-0.01137 Training: 2022-04-11 15:10:24,924-[agedb_30][90000]Accuracy-Highest: 0.96317 Training: 2022-04-11 15:10:25,965-Speed 144.99 samples/sec Loss 7.2633 LearningRate 0.0533 Epoch: 5 Global Step: 90010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:27,027-Speed 9648.04 samples/sec Loss 7.2771 LearningRate 0.0533 Epoch: 5 Global Step: 90020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:28,103-Speed 9514.15 samples/sec Loss 7.3387 LearningRate 0.0533 Epoch: 5 Global Step: 90030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:29,217-Speed 9198.54 samples/sec Loss 7.2848 LearningRate 0.0533 Epoch: 5 Global Step: 90040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:30,288-Speed 9568.60 samples/sec Loss 7.2997 LearningRate 0.0533 Epoch: 5 Global Step: 90050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:31,326-Speed 9873.35 samples/sec Loss 7.3003 LearningRate 0.0533 Epoch: 5 Global Step: 90060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:32,412-Speed 9440.58 samples/sec Loss 7.4046 LearningRate 0.0533 Epoch: 5 Global Step: 90070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:33,506-Speed 9366.49 samples/sec Loss 7.1505 LearningRate 0.0533 Epoch: 5 Global Step: 90080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:34,584-Speed 9502.27 samples/sec Loss 7.2262 LearningRate 0.0533 Epoch: 5 Global Step: 90090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:35,668-Speed 9446.56 samples/sec Loss 7.3198 LearningRate 0.0533 Epoch: 5 Global Step: 90100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:36,770-Speed 9298.58 samples/sec Loss 7.1683 LearningRate 0.0533 Epoch: 5 Global Step: 90110 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:10:37,910-Speed 8991.45 samples/sec Loss 7.1661 LearningRate 0.0533 Epoch: 5 Global Step: 90120 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:10:38,965-Speed 9710.41 samples/sec Loss 7.1398 LearningRate 0.0533 Epoch: 5 Global Step: 90130 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:10:40,022-Speed 9689.25 samples/sec Loss 7.1928 LearningRate 0.0533 Epoch: 5 Global Step: 90140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:41,068-Speed 9797.89 samples/sec Loss 7.2951 LearningRate 0.0533 Epoch: 5 Global Step: 90150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:42,124-Speed 9704.84 samples/sec Loss 7.2890 LearningRate 0.0533 Epoch: 5 Global Step: 90160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:43,146-Speed 10023.72 samples/sec Loss 7.2098 LearningRate 0.0533 Epoch: 5 Global Step: 90170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:44,206-Speed 9670.75 samples/sec Loss 7.1408 LearningRate 0.0533 Epoch: 5 Global Step: 90180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:45,262-Speed 9695.23 samples/sec Loss 7.2612 LearningRate 0.0533 Epoch: 5 Global Step: 90190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:46,328-Speed 9616.38 samples/sec Loss 7.2398 LearningRate 0.0533 Epoch: 5 Global Step: 90200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:47,405-Speed 9518.12 samples/sec Loss 7.2555 LearningRate 0.0533 Epoch: 5 Global Step: 90210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:48,505-Speed 9313.14 samples/sec Loss 7.2702 LearningRate 0.0533 Epoch: 5 Global Step: 90220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:49,619-Speed 9197.80 samples/sec Loss 7.3979 LearningRate 0.0532 Epoch: 5 Global Step: 90230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:50,701-Speed 9468.65 samples/sec Loss 7.3166 LearningRate 0.0532 Epoch: 5 Global Step: 90240 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:10:51,771-Speed 9579.76 samples/sec Loss 7.2650 LearningRate 0.0532 Epoch: 5 Global Step: 90250 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:10:52,844-Speed 9552.13 samples/sec Loss 7.2679 LearningRate 0.0532 Epoch: 5 Global Step: 90260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:53,938-Speed 9361.52 samples/sec Loss 7.3577 LearningRate 0.0532 Epoch: 5 Global Step: 90270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:54,983-Speed 9801.47 samples/sec Loss 7.4056 LearningRate 0.0532 Epoch: 5 Global Step: 90280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:56,040-Speed 9693.52 samples/sec Loss 7.3131 LearningRate 0.0532 Epoch: 5 Global Step: 90290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:57,082-Speed 9829.20 samples/sec Loss 7.2398 LearningRate 0.0532 Epoch: 5 Global Step: 90300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:58,162-Speed 9491.69 samples/sec Loss 7.3544 LearningRate 0.0532 Epoch: 5 Global Step: 90310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:10:59,228-Speed 9611.89 samples/sec Loss 7.3471 LearningRate 0.0532 Epoch: 5 Global Step: 90320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:00,319-Speed 9391.76 samples/sec Loss 7.2652 LearningRate 0.0532 Epoch: 5 Global Step: 90330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:01,453-Speed 9036.23 samples/sec Loss 7.2111 LearningRate 0.0532 Epoch: 5 Global Step: 90340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:02,528-Speed 9530.50 samples/sec Loss 7.1966 LearningRate 0.0532 Epoch: 5 Global Step: 90350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:03,599-Speed 9574.97 samples/sec Loss 7.3015 LearningRate 0.0532 Epoch: 5 Global Step: 90360 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:04,681-Speed 9468.53 samples/sec Loss 7.2499 LearningRate 0.0532 Epoch: 5 Global Step: 90370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:05,783-Speed 9297.51 samples/sec Loss 7.3534 LearningRate 0.0532 Epoch: 5 Global Step: 90380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:06,846-Speed 9648.55 samples/sec Loss 7.3242 LearningRate 0.0532 Epoch: 5 Global Step: 90390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:07,899-Speed 9724.89 samples/sec Loss 7.2690 LearningRate 0.0532 Epoch: 5 Global Step: 90400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:08,959-Speed 9665.27 samples/sec Loss 7.2614 LearningRate 0.0532 Epoch: 5 Global Step: 90410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:09,995-Speed 9888.91 samples/sec Loss 7.2707 LearningRate 0.0532 Epoch: 5 Global Step: 90420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:11,058-Speed 9638.94 samples/sec Loss 7.2717 LearningRate 0.0532 Epoch: 5 Global Step: 90430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:12,129-Speed 9569.18 samples/sec Loss 7.2718 LearningRate 0.0532 Epoch: 5 Global Step: 90440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:13,208-Speed 9499.68 samples/sec Loss 7.3581 LearningRate 0.0532 Epoch: 5 Global Step: 90450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:14,296-Speed 9415.86 samples/sec Loss 7.2643 LearningRate 0.0531 Epoch: 5 Global Step: 90460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:15,386-Speed 9398.68 samples/sec Loss 7.2733 LearningRate 0.0531 Epoch: 5 Global Step: 90470 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:16,443-Speed 9693.50 samples/sec Loss 7.3848 LearningRate 0.0531 Epoch: 5 Global Step: 90480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:17,555-Speed 9213.13 samples/sec Loss 7.2662 LearningRate 0.0531 Epoch: 5 Global Step: 90490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:18,661-Speed 9268.84 samples/sec Loss 7.2872 LearningRate 0.0531 Epoch: 5 Global Step: 90500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:19,778-Speed 9173.77 samples/sec Loss 7.2741 LearningRate 0.0531 Epoch: 5 Global Step: 90510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:20,896-Speed 9165.91 samples/sec Loss 7.3066 LearningRate 0.0531 Epoch: 5 Global Step: 90520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:21,989-Speed 9373.77 samples/sec Loss 7.2907 LearningRate 0.0531 Epoch: 5 Global Step: 90530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:23,092-Speed 9285.95 samples/sec Loss 7.3326 LearningRate 0.0531 Epoch: 5 Global Step: 90540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:24,168-Speed 9523.38 samples/sec Loss 7.3802 LearningRate 0.0531 Epoch: 5 Global Step: 90550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:25,297-Speed 9073.78 samples/sec Loss 7.3531 LearningRate 0.0531 Epoch: 5 Global Step: 90560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:26,374-Speed 9522.20 samples/sec Loss 7.2246 LearningRate 0.0531 Epoch: 5 Global Step: 90570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:27,421-Speed 9778.27 samples/sec Loss 7.1992 LearningRate 0.0531 Epoch: 5 Global Step: 90580 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:28,474-Speed 9734.40 samples/sec Loss 7.1514 LearningRate 0.0531 Epoch: 5 Global Step: 90590 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:29,567-Speed 9370.45 samples/sec Loss 7.2038 LearningRate 0.0531 Epoch: 5 Global Step: 90600 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:30,670-Speed 9288.24 samples/sec Loss 7.2748 LearningRate 0.0531 Epoch: 5 Global Step: 90610 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:31,781-Speed 9223.87 samples/sec Loss 7.3119 LearningRate 0.0531 Epoch: 5 Global Step: 90620 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:32,878-Speed 9343.23 samples/sec Loss 7.2323 LearningRate 0.0531 Epoch: 5 Global Step: 90630 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:33,946-Speed 9586.22 samples/sec Loss 7.2638 LearningRate 0.0531 Epoch: 5 Global Step: 90640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:35,031-Speed 9448.86 samples/sec Loss 7.3030 LearningRate 0.0531 Epoch: 5 Global Step: 90650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:36,117-Speed 9430.77 samples/sec Loss 7.2508 LearningRate 0.0531 Epoch: 5 Global Step: 90660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:37,183-Speed 9617.21 samples/sec Loss 7.3668 LearningRate 0.0531 Epoch: 5 Global Step: 90670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:38,274-Speed 9388.39 samples/sec Loss 7.3343 LearningRate 0.0531 Epoch: 5 Global Step: 90680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:39,356-Speed 9469.42 samples/sec Loss 7.2343 LearningRate 0.0530 Epoch: 5 Global Step: 90690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:40,434-Speed 9510.77 samples/sec Loss 7.3088 LearningRate 0.0530 Epoch: 5 Global Step: 90700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:41,521-Speed 9424.50 samples/sec Loss 7.2661 LearningRate 0.0530 Epoch: 5 Global Step: 90710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:42,620-Speed 9320.04 samples/sec Loss 7.2227 LearningRate 0.0530 Epoch: 5 Global Step: 90720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:43,695-Speed 9532.94 samples/sec Loss 7.3142 LearningRate 0.0530 Epoch: 5 Global Step: 90730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:44,777-Speed 9476.49 samples/sec Loss 7.2932 LearningRate 0.0530 Epoch: 5 Global Step: 90740 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:45,844-Speed 9598.30 samples/sec Loss 7.2476 LearningRate 0.0530 Epoch: 5 Global Step: 90750 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:46,969-Speed 9110.54 samples/sec Loss 7.2018 LearningRate 0.0530 Epoch: 5 Global Step: 90760 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:11:48,074-Speed 9266.35 samples/sec Loss 7.2021 LearningRate 0.0530 Epoch: 5 Global Step: 90770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:49,179-Speed 9277.58 samples/sec Loss 7.2992 LearningRate 0.0530 Epoch: 5 Global Step: 90780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:50,238-Speed 9672.27 samples/sec Loss 7.2137 LearningRate 0.0530 Epoch: 5 Global Step: 90790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:51,273-Speed 9900.21 samples/sec Loss 7.3857 LearningRate 0.0530 Epoch: 5 Global Step: 90800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:52,398-Speed 9112.98 samples/sec Loss 7.1940 LearningRate 0.0530 Epoch: 5 Global Step: 90810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:53,460-Speed 9647.70 samples/sec Loss 7.2841 LearningRate 0.0530 Epoch: 5 Global Step: 90820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:54,561-Speed 9301.49 samples/sec Loss 7.1402 LearningRate 0.0530 Epoch: 5 Global Step: 90830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:55,685-Speed 9112.68 samples/sec Loss 7.2377 LearningRate 0.0530 Epoch: 5 Global Step: 90840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:56,787-Speed 9302.74 samples/sec Loss 7.2046 LearningRate 0.0530 Epoch: 5 Global Step: 90850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:57,874-Speed 9431.36 samples/sec Loss 7.2756 LearningRate 0.0530 Epoch: 5 Global Step: 90860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:11:58,974-Speed 9316.88 samples/sec Loss 7.2893 LearningRate 0.0530 Epoch: 5 Global Step: 90870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:00,095-Speed 9135.42 samples/sec Loss 7.2982 LearningRate 0.0530 Epoch: 5 Global Step: 90880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:01,135-Speed 9858.74 samples/sec Loss 7.3057 LearningRate 0.0530 Epoch: 5 Global Step: 90890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:02,205-Speed 9575.29 samples/sec Loss 7.1487 LearningRate 0.0530 Epoch: 5 Global Step: 90900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:03,336-Speed 9059.59 samples/sec Loss 7.1653 LearningRate 0.0530 Epoch: 5 Global Step: 90910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:04,449-Speed 9207.01 samples/sec Loss 7.2241 LearningRate 0.0529 Epoch: 5 Global Step: 90920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:05,520-Speed 9566.13 samples/sec Loss 7.1790 LearningRate 0.0529 Epoch: 5 Global Step: 90930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:06,542-Speed 10025.40 samples/sec Loss 7.3191 LearningRate 0.0529 Epoch: 5 Global Step: 90940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:07,632-Speed 9400.20 samples/sec Loss 7.2962 LearningRate 0.0529 Epoch: 5 Global Step: 90950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:08,685-Speed 9732.68 samples/sec Loss 7.2646 LearningRate 0.0529 Epoch: 5 Global Step: 90960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:09,742-Speed 9695.24 samples/sec Loss 7.3647 LearningRate 0.0529 Epoch: 5 Global Step: 90970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:10,796-Speed 9721.44 samples/sec Loss 7.1794 LearningRate 0.0529 Epoch: 5 Global Step: 90980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:11,893-Speed 9336.86 samples/sec Loss 7.3358 LearningRate 0.0529 Epoch: 5 Global Step: 90990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:12,972-Speed 9495.95 samples/sec Loss 7.2790 LearningRate 0.0529 Epoch: 5 Global Step: 91000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:14,016-Speed 9815.54 samples/sec Loss 7.2462 LearningRate 0.0529 Epoch: 5 Global Step: 91010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:15,058-Speed 9829.92 samples/sec Loss 7.2503 LearningRate 0.0529 Epoch: 5 Global Step: 91020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:16,134-Speed 9523.46 samples/sec Loss 7.2165 LearningRate 0.0529 Epoch: 5 Global Step: 91030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:17,193-Speed 9673.39 samples/sec Loss 7.2316 LearningRate 0.0529 Epoch: 5 Global Step: 91040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:18,235-Speed 9837.17 samples/sec Loss 7.2712 LearningRate 0.0529 Epoch: 5 Global Step: 91050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:19,317-Speed 9470.08 samples/sec Loss 7.2707 LearningRate 0.0529 Epoch: 5 Global Step: 91060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:20,427-Speed 9228.92 samples/sec Loss 7.2563 LearningRate 0.0529 Epoch: 5 Global Step: 91070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:21,479-Speed 9743.05 samples/sec Loss 7.3602 LearningRate 0.0529 Epoch: 5 Global Step: 91080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:22,546-Speed 9605.09 samples/sec Loss 7.1826 LearningRate 0.0529 Epoch: 5 Global Step: 91090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:23,591-Speed 9803.54 samples/sec Loss 7.2620 LearningRate 0.0529 Epoch: 5 Global Step: 91100 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:12:24,675-Speed 9450.49 samples/sec Loss 7.3664 LearningRate 0.0529 Epoch: 5 Global Step: 91110 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:12:25,720-Speed 9808.62 samples/sec Loss 7.2046 LearningRate 0.0529 Epoch: 5 Global Step: 91120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:26,780-Speed 9666.65 samples/sec Loss 7.2754 LearningRate 0.0529 Epoch: 5 Global Step: 91130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:27,861-Speed 9480.67 samples/sec Loss 7.2674 LearningRate 0.0528 Epoch: 5 Global Step: 91140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:28,964-Speed 9288.15 samples/sec Loss 7.1683 LearningRate 0.0528 Epoch: 5 Global Step: 91150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:30,079-Speed 9192.32 samples/sec Loss 7.2620 LearningRate 0.0528 Epoch: 5 Global Step: 91160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:31,115-Speed 9881.02 samples/sec Loss 7.2032 LearningRate 0.0528 Epoch: 5 Global Step: 91170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:32,179-Speed 9633.44 samples/sec Loss 7.3672 LearningRate 0.0528 Epoch: 5 Global Step: 91180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:33,281-Speed 9294.76 samples/sec Loss 7.1983 LearningRate 0.0528 Epoch: 5 Global Step: 91190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:34,371-Speed 9404.67 samples/sec Loss 7.2775 LearningRate 0.0528 Epoch: 5 Global Step: 91200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:35,439-Speed 9587.91 samples/sec Loss 7.2119 LearningRate 0.0528 Epoch: 5 Global Step: 91210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:36,477-Speed 9867.84 samples/sec Loss 7.2229 LearningRate 0.0528 Epoch: 5 Global Step: 91220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:37,569-Speed 9390.02 samples/sec Loss 7.3142 LearningRate 0.0528 Epoch: 5 Global Step: 91230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:38,676-Speed 9251.91 samples/sec Loss 7.1981 LearningRate 0.0528 Epoch: 5 Global Step: 91240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:39,720-Speed 9821.93 samples/sec Loss 7.3254 LearningRate 0.0528 Epoch: 5 Global Step: 91250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:40,765-Speed 9802.52 samples/sec Loss 7.3001 LearningRate 0.0528 Epoch: 5 Global Step: 91260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:41,838-Speed 9550.53 samples/sec Loss 7.2309 LearningRate 0.0528 Epoch: 5 Global Step: 91270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:42,898-Speed 9668.76 samples/sec Loss 7.1929 LearningRate 0.0528 Epoch: 5 Global Step: 91280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:43,947-Speed 9766.11 samples/sec Loss 7.2254 LearningRate 0.0528 Epoch: 5 Global Step: 91290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:45,046-Speed 9319.73 samples/sec Loss 7.3564 LearningRate 0.0528 Epoch: 5 Global Step: 91300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:46,139-Speed 9373.03 samples/sec Loss 7.3663 LearningRate 0.0528 Epoch: 5 Global Step: 91310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:47,222-Speed 9459.40 samples/sec Loss 7.3218 LearningRate 0.0528 Epoch: 5 Global Step: 91320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:12:48,323-Speed 9310.40 samples/sec Loss 7.3162 LearningRate 0.0528 Epoch: 5 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:49,394-Speed 9571.24 samples/sec Loss 7.1747 LearningRate 0.0528 Epoch: 5 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:50,460-Speed 9607.94 samples/sec Loss 7.3337 LearningRate 0.0528 Epoch: 5 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:51,542-Speed 9472.13 samples/sec Loss 7.2526 LearningRate 0.0528 Epoch: 5 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:52,605-Speed 9639.69 samples/sec Loss 7.2713 LearningRate 0.0527 Epoch: 5 Global Step: 91370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:53,662-Speed 9687.72 samples/sec Loss 7.1254 LearningRate 0.0527 Epoch: 5 Global Step: 91380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:54,737-Speed 9536.01 samples/sec Loss 7.2435 LearningRate 0.0527 Epoch: 5 Global Step: 91390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:55,815-Speed 9509.20 samples/sec Loss 7.3421 LearningRate 0.0527 Epoch: 5 Global Step: 91400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:56,904-Speed 9402.30 samples/sec Loss 7.2123 LearningRate 0.0527 Epoch: 5 Global Step: 91410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:57,976-Speed 9557.47 samples/sec Loss 7.3446 LearningRate 0.0527 Epoch: 5 Global Step: 91420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:12:59,034-Speed 9689.65 samples/sec Loss 7.3795 LearningRate 0.0527 Epoch: 5 Global Step: 91430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:00,077-Speed 9824.78 samples/sec Loss 7.2388 LearningRate 0.0527 Epoch: 5 Global Step: 91440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:01,113-Speed 9889.26 samples/sec Loss 7.3252 LearningRate 0.0527 Epoch: 5 Global Step: 91450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:02,211-Speed 9331.21 samples/sec Loss 7.3081 LearningRate 0.0527 Epoch: 5 Global Step: 91460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:03,292-Speed 9481.31 samples/sec Loss 7.3657 LearningRate 0.0527 Epoch: 5 Global Step: 91470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:04,357-Speed 9614.65 samples/sec Loss 7.2449 LearningRate 0.0527 Epoch: 5 Global Step: 91480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:05,413-Speed 9708.28 samples/sec Loss 7.3098 LearningRate 0.0527 Epoch: 5 Global Step: 91490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:06,483-Speed 9574.49 samples/sec Loss 7.4067 LearningRate 0.0527 Epoch: 5 Global Step: 91500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:07,574-Speed 9390.68 samples/sec Loss 7.3310 LearningRate 0.0527 Epoch: 5 Global Step: 91510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:08,643-Speed 9584.34 samples/sec Loss 7.4304 LearningRate 0.0527 Epoch: 5 Global Step: 91520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:09,689-Speed 9798.83 samples/sec Loss 7.4062 LearningRate 0.0527 Epoch: 5 Global Step: 91530 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:13:10,755-Speed 9610.89 samples/sec Loss 7.2591 LearningRate 0.0527 Epoch: 5 Global Step: 91540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:11,805-Speed 9758.90 samples/sec Loss 7.2415 LearningRate 0.0527 Epoch: 5 Global Step: 91550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:12,872-Speed 9594.75 samples/sec Loss 7.2952 LearningRate 0.0527 Epoch: 5 Global Step: 91560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:13,955-Speed 9468.03 samples/sec Loss 7.2804 LearningRate 0.0527 Epoch: 5 Global Step: 91570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:15,035-Speed 9483.54 samples/sec Loss 7.2771 LearningRate 0.0527 Epoch: 5 Global Step: 91580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:16,086-Speed 9746.37 samples/sec Loss 7.2460 LearningRate 0.0527 Epoch: 5 Global Step: 91590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:17,173-Speed 9433.14 samples/sec Loss 7.2716 LearningRate 0.0526 Epoch: 5 Global Step: 91600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:18,279-Speed 9260.66 samples/sec Loss 7.3761 LearningRate 0.0526 Epoch: 5 Global Step: 91610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:19,379-Speed 9315.73 samples/sec Loss 7.3919 LearningRate 0.0526 Epoch: 5 Global Step: 91620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:20,444-Speed 9627.04 samples/sec Loss 7.3235 LearningRate 0.0526 Epoch: 5 Global Step: 91630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:21,511-Speed 9600.11 samples/sec Loss 7.2882 LearningRate 0.0526 Epoch: 5 Global Step: 91640 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:13:22,583-Speed 9553.89 samples/sec Loss 7.2226 LearningRate 0.0526 Epoch: 5 Global Step: 91650 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:13:23,645-Speed 9647.12 samples/sec Loss 7.3404 LearningRate 0.0526 Epoch: 5 Global Step: 91660 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:13:24,711-Speed 9614.42 samples/sec Loss 7.3163 LearningRate 0.0526 Epoch: 5 Global Step: 91670 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:13:25,769-Speed 9677.82 samples/sec Loss 7.2647 LearningRate 0.0526 Epoch: 5 Global Step: 91680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:26,815-Speed 9806.08 samples/sec Loss 7.2928 LearningRate 0.0526 Epoch: 5 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:13:27,868-Speed 9730.12 samples/sec Loss 7.2937 LearningRate 0.0526 Epoch: 5 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:13:28,953-Speed 9438.74 samples/sec Loss 7.3811 LearningRate 0.0526 Epoch: 5 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:13:30,055-Speed 9298.82 samples/sec Loss 7.2759 LearningRate 0.0526 Epoch: 5 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:13:31,148-Speed 9374.80 samples/sec Loss 7.2121 LearningRate 0.0526 Epoch: 5 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:13:32,222-Speed 9539.14 samples/sec Loss 7.2860 LearningRate 0.0526 Epoch: 5 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:13:33,301-Speed 9494.00 samples/sec Loss 7.3403 LearningRate 0.0526 Epoch: 5 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:13:34,397-Speed 9350.86 samples/sec Loss 7.2865 LearningRate 0.0526 Epoch: 5 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:13:35,496-Speed 9319.94 samples/sec Loss 7.1998 LearningRate 0.0526 Epoch: 5 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:13:36,596-Speed 9316.88 samples/sec Loss 7.2617 LearningRate 0.0526 Epoch: 5 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:13:37,670-Speed 9538.23 samples/sec Loss 7.2545 LearningRate 0.0526 Epoch: 5 Global Step: 91790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:38,738-Speed 9589.41 samples/sec Loss 7.1446 LearningRate 0.0526 Epoch: 5 Global Step: 91800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:39,791-Speed 9741.87 samples/sec Loss 7.2807 LearningRate 0.0526 Epoch: 5 Global Step: 91810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:40,870-Speed 9495.13 samples/sec Loss 7.2407 LearningRate 0.0526 Epoch: 5 Global Step: 91820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:41,927-Speed 9690.02 samples/sec Loss 7.2696 LearningRate 0.0525 Epoch: 5 Global Step: 91830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:42,988-Speed 9651.80 samples/sec Loss 7.1557 LearningRate 0.0525 Epoch: 5 Global Step: 91840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:44,064-Speed 9528.22 samples/sec Loss 7.2475 LearningRate 0.0525 Epoch: 5 Global Step: 91850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:45,163-Speed 9318.95 samples/sec Loss 7.2133 LearningRate 0.0525 Epoch: 5 Global Step: 91860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:46,262-Speed 9326.99 samples/sec Loss 7.2308 LearningRate 0.0525 Epoch: 5 Global Step: 91870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:47,360-Speed 9324.08 samples/sec Loss 7.2054 LearningRate 0.0525 Epoch: 5 Global Step: 91880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:48,470-Speed 9237.11 samples/sec Loss 7.2057 LearningRate 0.0525 Epoch: 5 Global Step: 91890 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:13:49,555-Speed 9449.02 samples/sec Loss 7.2521 LearningRate 0.0525 Epoch: 5 Global Step: 91900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:50,628-Speed 9546.55 samples/sec Loss 7.2564 LearningRate 0.0525 Epoch: 5 Global Step: 91910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:51,714-Speed 9434.53 samples/sec Loss 7.2924 LearningRate 0.0525 Epoch: 5 Global Step: 91920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:52,754-Speed 9851.88 samples/sec Loss 7.2245 LearningRate 0.0525 Epoch: 5 Global Step: 91930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:53,830-Speed 9517.82 samples/sec Loss 7.3215 LearningRate 0.0525 Epoch: 5 Global Step: 91940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:54,934-Speed 9285.34 samples/sec Loss 7.2836 LearningRate 0.0525 Epoch: 5 Global Step: 91950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:56,018-Speed 9449.87 samples/sec Loss 7.2976 LearningRate 0.0525 Epoch: 5 Global Step: 91960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:57,055-Speed 9885.77 samples/sec Loss 7.2922 LearningRate 0.0525 Epoch: 5 Global Step: 91970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:58,136-Speed 9475.51 samples/sec Loss 7.2366 LearningRate 0.0525 Epoch: 5 Global Step: 91980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:13:59,206-Speed 9576.84 samples/sec Loss 7.2706 LearningRate 0.0525 Epoch: 5 Global Step: 91990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:14:00,265-Speed 9675.20 samples/sec Loss 7.2436 LearningRate 0.0525 Epoch: 5 Global Step: 92000 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:14:22,278-[lfw][92000]XNorm: 11.842051 Training: 2022-04-11 15:14:22,279-[lfw][92000]Accuracy-Flip: 0.99583+-0.00281 Training: 2022-04-11 15:14:22,279-[lfw][92000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:14:47,773-[cfp_fp][92000]XNorm: 9.961729 Training: 2022-04-11 15:14:47,774-[cfp_fp][92000]Accuracy-Flip: 0.95043+-0.01409 Training: 2022-04-11 15:14:47,774-[cfp_fp][92000]Accuracy-Highest: 0.95729 Training: 2022-04-11 15:15:09,776-[agedb_30][92000]XNorm: 11.278383 Training: 2022-04-11 15:15:09,776-[agedb_30][92000]Accuracy-Flip: 0.96033+-0.01113 Training: 2022-04-11 15:15:09,777-[agedb_30][92000]Accuracy-Highest: 0.96317 Training: 2022-04-11 15:15:10,862-Speed 145.05 samples/sec Loss 7.1602 LearningRate 0.0525 Epoch: 5 Global Step: 92010 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:15:11,925-Speed 9633.72 samples/sec Loss 7.3851 LearningRate 0.0525 Epoch: 5 Global Step: 92020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:13,038-Speed 9205.39 samples/sec Loss 7.2909 LearningRate 0.0525 Epoch: 5 Global Step: 92030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:14,149-Speed 9230.31 samples/sec Loss 7.2382 LearningRate 0.0525 Epoch: 5 Global Step: 92040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:15,209-Speed 9663.82 samples/sec Loss 7.4205 LearningRate 0.0525 Epoch: 5 Global Step: 92050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:16,316-Speed 9256.11 samples/sec Loss 7.2505 LearningRate 0.0524 Epoch: 5 Global Step: 92060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:17,407-Speed 9387.39 samples/sec Loss 7.2345 LearningRate 0.0524 Epoch: 5 Global Step: 92070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:18,514-Speed 9253.20 samples/sec Loss 7.4649 LearningRate 0.0524 Epoch: 5 Global Step: 92080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:19,607-Speed 9381.17 samples/sec Loss 7.2982 LearningRate 0.0524 Epoch: 5 Global Step: 92090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:20,664-Speed 9695.30 samples/sec Loss 7.3126 LearningRate 0.0524 Epoch: 5 Global Step: 92100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:21,771-Speed 9253.33 samples/sec Loss 7.2583 LearningRate 0.0524 Epoch: 5 Global Step: 92110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:22,858-Speed 9423.12 samples/sec Loss 7.1699 LearningRate 0.0524 Epoch: 5 Global Step: 92120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:23,954-Speed 9353.02 samples/sec Loss 7.3005 LearningRate 0.0524 Epoch: 5 Global Step: 92130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:25,042-Speed 9411.35 samples/sec Loss 7.3493 LearningRate 0.0524 Epoch: 5 Global Step: 92140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:26,137-Speed 9362.24 samples/sec Loss 7.2281 LearningRate 0.0524 Epoch: 5 Global Step: 92150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:27,228-Speed 9394.71 samples/sec Loss 7.3323 LearningRate 0.0524 Epoch: 5 Global Step: 92160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:28,291-Speed 9632.34 samples/sec Loss 7.2898 LearningRate 0.0524 Epoch: 5 Global Step: 92170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:29,360-Speed 9587.75 samples/sec Loss 7.2910 LearningRate 0.0524 Epoch: 5 Global Step: 92180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:30,407-Speed 9779.97 samples/sec Loss 7.2557 LearningRate 0.0524 Epoch: 5 Global Step: 92190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:31,471-Speed 9629.71 samples/sec Loss 7.3398 LearningRate 0.0524 Epoch: 5 Global Step: 92200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:32,545-Speed 9541.05 samples/sec Loss 7.3233 LearningRate 0.0524 Epoch: 5 Global Step: 92210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:33,591-Speed 9797.43 samples/sec Loss 7.2620 LearningRate 0.0524 Epoch: 5 Global Step: 92220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:34,699-Speed 9249.94 samples/sec Loss 7.2472 LearningRate 0.0524 Epoch: 5 Global Step: 92230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:35,780-Speed 9475.58 samples/sec Loss 7.3407 LearningRate 0.0524 Epoch: 5 Global Step: 92240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:36,879-Speed 9324.11 samples/sec Loss 7.2577 LearningRate 0.0524 Epoch: 5 Global Step: 92250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:37,979-Speed 9315.70 samples/sec Loss 7.3771 LearningRate 0.0524 Epoch: 5 Global Step: 92260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:39,881-Speed 5385.41 samples/sec Loss 7.2870 LearningRate 0.0524 Epoch: 5 Global Step: 92270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:40,961-Speed 9487.47 samples/sec Loss 7.2712 LearningRate 0.0524 Epoch: 5 Global Step: 92280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:42,051-Speed 9399.50 samples/sec Loss 7.2085 LearningRate 0.0524 Epoch: 5 Global Step: 92290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:43,122-Speed 9566.96 samples/sec Loss 7.1194 LearningRate 0.0523 Epoch: 5 Global Step: 92300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:44,243-Speed 9137.28 samples/sec Loss 7.3888 LearningRate 0.0523 Epoch: 5 Global Step: 92310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:45,318-Speed 9531.70 samples/sec Loss 7.2790 LearningRate 0.0523 Epoch: 5 Global Step: 92320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:46,406-Speed 9421.22 samples/sec Loss 7.2282 LearningRate 0.0523 Epoch: 5 Global Step: 92330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:47,487-Speed 9477.14 samples/sec Loss 7.3558 LearningRate 0.0523 Epoch: 5 Global Step: 92340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:48,577-Speed 9401.10 samples/sec Loss 7.2638 LearningRate 0.0523 Epoch: 5 Global Step: 92350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:49,714-Speed 9014.37 samples/sec Loss 7.2286 LearningRate 0.0523 Epoch: 5 Global Step: 92360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:50,754-Speed 9850.90 samples/sec Loss 7.1530 LearningRate 0.0523 Epoch: 5 Global Step: 92370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:15:51,803-Speed 9768.43 samples/sec Loss 7.2143 LearningRate 0.0523 Epoch: 5 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:52,857-Speed 9719.22 samples/sec Loss 7.3315 LearningRate 0.0523 Epoch: 5 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:53,916-Speed 9678.44 samples/sec Loss 7.3690 LearningRate 0.0523 Epoch: 5 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:54,949-Speed 9914.85 samples/sec Loss 7.3667 LearningRate 0.0523 Epoch: 5 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:56,033-Speed 9453.91 samples/sec Loss 7.2011 LearningRate 0.0523 Epoch: 5 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:57,100-Speed 9604.78 samples/sec Loss 7.3266 LearningRate 0.0523 Epoch: 5 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:58,217-Speed 9168.44 samples/sec Loss 7.2599 LearningRate 0.0523 Epoch: 5 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:15:59,329-Speed 9217.04 samples/sec Loss 7.3281 LearningRate 0.0523 Epoch: 5 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:16:00,403-Speed 9540.94 samples/sec Loss 7.2782 LearningRate 0.0523 Epoch: 5 Global Step: 92460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:16:01,483-Speed 9480.85 samples/sec Loss 7.2907 LearningRate 0.0523 Epoch: 5 Global Step: 92470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:16:02,521-Speed 9873.47 samples/sec Loss 7.3642 LearningRate 0.0523 Epoch: 5 Global Step: 92480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:03,607-Speed 9518.16 samples/sec Loss 7.2981 LearningRate 0.0523 Epoch: 5 Global Step: 92490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:04,680-Speed 9550.85 samples/sec Loss 7.3158 LearningRate 0.0523 Epoch: 5 Global Step: 92500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:05,744-Speed 9633.96 samples/sec Loss 7.2059 LearningRate 0.0523 Epoch: 5 Global Step: 92510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:06,792-Speed 9775.64 samples/sec Loss 7.3300 LearningRate 0.0523 Epoch: 5 Global Step: 92520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:07,882-Speed 9394.89 samples/sec Loss 7.1649 LearningRate 0.0522 Epoch: 5 Global Step: 92530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:08,962-Speed 9490.55 samples/sec Loss 7.2219 LearningRate 0.0522 Epoch: 5 Global Step: 92540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:10,046-Speed 9457.99 samples/sec Loss 7.2939 LearningRate 0.0522 Epoch: 5 Global Step: 92550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:11,102-Speed 9695.86 samples/sec Loss 7.2551 LearningRate 0.0522 Epoch: 5 Global Step: 92560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:12,175-Speed 9554.65 samples/sec Loss 7.1909 LearningRate 0.0522 Epoch: 5 Global Step: 92570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:13,254-Speed 9491.04 samples/sec Loss 7.3286 LearningRate 0.0522 Epoch: 5 Global Step: 92580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:14,354-Speed 9312.26 samples/sec Loss 7.2250 LearningRate 0.0522 Epoch: 5 Global Step: 92590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:15,428-Speed 9541.49 samples/sec Loss 7.2417 LearningRate 0.0522 Epoch: 5 Global Step: 92600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:16,516-Speed 9418.37 samples/sec Loss 7.2162 LearningRate 0.0522 Epoch: 5 Global Step: 92610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:17,609-Speed 9375.73 samples/sec Loss 7.2449 LearningRate 0.0522 Epoch: 5 Global Step: 92620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:18,670-Speed 9656.87 samples/sec Loss 7.2585 LearningRate 0.0522 Epoch: 5 Global Step: 92630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:19,716-Speed 9798.53 samples/sec Loss 7.1992 LearningRate 0.0522 Epoch: 5 Global Step: 92640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:20,797-Speed 9475.30 samples/sec Loss 7.2771 LearningRate 0.0522 Epoch: 5 Global Step: 92650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:21,887-Speed 9398.94 samples/sec Loss 7.2119 LearningRate 0.0522 Epoch: 5 Global Step: 92660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:22,983-Speed 9352.29 samples/sec Loss 7.2077 LearningRate 0.0522 Epoch: 5 Global Step: 92670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:24,042-Speed 9678.53 samples/sec Loss 7.2803 LearningRate 0.0522 Epoch: 5 Global Step: 92680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:25,109-Speed 9610.51 samples/sec Loss 7.3733 LearningRate 0.0522 Epoch: 5 Global Step: 92690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:26,177-Speed 9596.38 samples/sec Loss 7.2267 LearningRate 0.0522 Epoch: 5 Global Step: 92700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:27,234-Speed 9689.64 samples/sec Loss 7.2206 LearningRate 0.0522 Epoch: 5 Global Step: 92710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:28,293-Speed 9676.36 samples/sec Loss 7.1890 LearningRate 0.0522 Epoch: 5 Global Step: 92720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:29,362-Speed 9589.96 samples/sec Loss 7.2429 LearningRate 0.0522 Epoch: 5 Global Step: 92730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:30,448-Speed 9426.26 samples/sec Loss 7.3070 LearningRate 0.0522 Epoch: 5 Global Step: 92740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:31,532-Speed 9460.14 samples/sec Loss 7.2512 LearningRate 0.0522 Epoch: 5 Global Step: 92750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:32,584-Speed 9737.97 samples/sec Loss 7.2255 LearningRate 0.0521 Epoch: 5 Global Step: 92760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:33,636-Speed 9736.83 samples/sec Loss 7.2425 LearningRate 0.0521 Epoch: 5 Global Step: 92770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:34,748-Speed 9218.91 samples/sec Loss 7.2754 LearningRate 0.0521 Epoch: 5 Global Step: 92780 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:16:35,856-Speed 9240.61 samples/sec Loss 7.2881 LearningRate 0.0521 Epoch: 5 Global Step: 92790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:16:36,963-Speed 9255.00 samples/sec Loss 7.3647 LearningRate 0.0521 Epoch: 5 Global Step: 92800 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:16:38,070-Speed 9255.90 samples/sec Loss 7.2745 LearningRate 0.0521 Epoch: 5 Global Step: 92810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:39,128-Speed 9691.74 samples/sec Loss 7.1993 LearningRate 0.0521 Epoch: 5 Global Step: 92820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:40,195-Speed 9598.51 samples/sec Loss 7.2362 LearningRate 0.0521 Epoch: 5 Global Step: 92830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:41,253-Speed 9686.53 samples/sec Loss 7.4307 LearningRate 0.0521 Epoch: 5 Global Step: 92840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:42,312-Speed 9670.67 samples/sec Loss 7.2708 LearningRate 0.0521 Epoch: 5 Global Step: 92850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:43,336-Speed 10013.50 samples/sec Loss 7.1941 LearningRate 0.0521 Epoch: 5 Global Step: 92860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:44,375-Speed 9863.46 samples/sec Loss 7.1851 LearningRate 0.0521 Epoch: 5 Global Step: 92870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:45,458-Speed 9460.76 samples/sec Loss 7.1148 LearningRate 0.0521 Epoch: 5 Global Step: 92880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:46,569-Speed 9215.04 samples/sec Loss 7.2015 LearningRate 0.0521 Epoch: 5 Global Step: 92890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:47,641-Speed 9561.05 samples/sec Loss 7.2358 LearningRate 0.0521 Epoch: 5 Global Step: 92900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:48,701-Speed 9663.62 samples/sec Loss 7.2099 LearningRate 0.0521 Epoch: 5 Global Step: 92910 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:16:49,796-Speed 9355.32 samples/sec Loss 7.2681 LearningRate 0.0521 Epoch: 5 Global Step: 92920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:50,904-Speed 9245.90 samples/sec Loss 7.2919 LearningRate 0.0521 Epoch: 5 Global Step: 92930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:51,978-Speed 9545.35 samples/sec Loss 7.1944 LearningRate 0.0521 Epoch: 5 Global Step: 92940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:53,066-Speed 9410.94 samples/sec Loss 7.3017 LearningRate 0.0521 Epoch: 5 Global Step: 92950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:54,138-Speed 9565.28 samples/sec Loss 7.3617 LearningRate 0.0521 Epoch: 5 Global Step: 92960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:55,214-Speed 9524.67 samples/sec Loss 7.2564 LearningRate 0.0521 Epoch: 5 Global Step: 92970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:56,315-Speed 9299.10 samples/sec Loss 7.2779 LearningRate 0.0521 Epoch: 5 Global Step: 92980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:57,427-Speed 9213.12 samples/sec Loss 7.0942 LearningRate 0.0520 Epoch: 5 Global Step: 92990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:58,493-Speed 9618.16 samples/sec Loss 7.2680 LearningRate 0.0520 Epoch: 5 Global Step: 93000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:16:59,556-Speed 9634.24 samples/sec Loss 7.1932 LearningRate 0.0520 Epoch: 5 Global Step: 93010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:00,604-Speed 9778.03 samples/sec Loss 7.1949 LearningRate 0.0520 Epoch: 5 Global Step: 93020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:01,651-Speed 9802.02 samples/sec Loss 7.3069 LearningRate 0.0520 Epoch: 5 Global Step: 93030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:17:02,688-Speed 9883.56 samples/sec Loss 7.4239 LearningRate 0.0520 Epoch: 5 Global Step: 93040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:17:03,729-Speed 9848.54 samples/sec Loss 7.2749 LearningRate 0.0520 Epoch: 5 Global Step: 93050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:17:04,819-Speed 9394.47 samples/sec Loss 7.1541 LearningRate 0.0520 Epoch: 5 Global Step: 93060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:17:05,881-Speed 9652.95 samples/sec Loss 7.1868 LearningRate 0.0520 Epoch: 5 Global Step: 93070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:17:07,005-Speed 9116.80 samples/sec Loss 7.1253 LearningRate 0.0520 Epoch: 5 Global Step: 93080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:17:08,115-Speed 9222.86 samples/sec Loss 7.1741 LearningRate 0.0520 Epoch: 5 Global Step: 93090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:17:09,206-Speed 9395.58 samples/sec Loss 7.3283 LearningRate 0.0520 Epoch: 5 Global Step: 93100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:17:10,287-Speed 9476.02 samples/sec Loss 7.3402 LearningRate 0.0520 Epoch: 5 Global Step: 93110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:17:11,355-Speed 9590.11 samples/sec Loss 7.1908 LearningRate 0.0520 Epoch: 5 Global Step: 93120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:17:12,394-Speed 9866.14 samples/sec Loss 7.3524 LearningRate 0.0520 Epoch: 5 Global Step: 93130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:13,449-Speed 9709.37 samples/sec Loss 7.3364 LearningRate 0.0520 Epoch: 5 Global Step: 93140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:14,540-Speed 9399.03 samples/sec Loss 7.2243 LearningRate 0.0520 Epoch: 5 Global Step: 93150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:15,589-Speed 9768.92 samples/sec Loss 7.2968 LearningRate 0.0520 Epoch: 5 Global Step: 93160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:16,660-Speed 9566.95 samples/sec Loss 7.2604 LearningRate 0.0520 Epoch: 5 Global Step: 93170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:17,731-Speed 9564.47 samples/sec Loss 7.3094 LearningRate 0.0520 Epoch: 5 Global Step: 93180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:18,817-Speed 9437.99 samples/sec Loss 7.3440 LearningRate 0.0520 Epoch: 5 Global Step: 93190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:19,908-Speed 9390.88 samples/sec Loss 7.2467 LearningRate 0.0520 Epoch: 5 Global Step: 93200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:20,981-Speed 9546.00 samples/sec Loss 7.2548 LearningRate 0.0520 Epoch: 5 Global Step: 93210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:22,048-Speed 9601.95 samples/sec Loss 7.2363 LearningRate 0.0519 Epoch: 5 Global Step: 93220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:23,091-Speed 9825.14 samples/sec Loss 7.3356 LearningRate 0.0519 Epoch: 5 Global Step: 93230 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:17:24,195-Speed 9275.26 samples/sec Loss 7.2642 LearningRate 0.0519 Epoch: 5 Global Step: 93240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:25,306-Speed 9240.00 samples/sec Loss 7.2922 LearningRate 0.0519 Epoch: 5 Global Step: 93250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:26,359-Speed 9720.62 samples/sec Loss 7.1427 LearningRate 0.0519 Epoch: 5 Global Step: 93260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:27,447-Speed 9419.54 samples/sec Loss 7.3127 LearningRate 0.0519 Epoch: 5 Global Step: 93270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:28,517-Speed 9578.11 samples/sec Loss 7.2845 LearningRate 0.0519 Epoch: 5 Global Step: 93280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:29,572-Speed 9713.48 samples/sec Loss 7.3453 LearningRate 0.0519 Epoch: 5 Global Step: 93290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:30,668-Speed 9345.52 samples/sec Loss 7.2499 LearningRate 0.0519 Epoch: 5 Global Step: 93300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:31,729-Speed 9658.82 samples/sec Loss 7.1967 LearningRate 0.0519 Epoch: 5 Global Step: 93310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:32,809-Speed 9487.98 samples/sec Loss 7.3151 LearningRate 0.0519 Epoch: 5 Global Step: 93320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:33,918-Speed 9239.87 samples/sec Loss 7.3138 LearningRate 0.0519 Epoch: 5 Global Step: 93330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:35,007-Speed 9409.69 samples/sec Loss 7.2115 LearningRate 0.0519 Epoch: 5 Global Step: 93340 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:17:36,061-Speed 9721.18 samples/sec Loss 7.2936 LearningRate 0.0519 Epoch: 5 Global Step: 93350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:37,166-Speed 9277.01 samples/sec Loss 7.3365 LearningRate 0.0519 Epoch: 5 Global Step: 93360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:38,206-Speed 9852.20 samples/sec Loss 7.1924 LearningRate 0.0519 Epoch: 5 Global Step: 93370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:39,277-Speed 9568.94 samples/sec Loss 7.3251 LearningRate 0.0519 Epoch: 5 Global Step: 93380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:40,407-Speed 9063.10 samples/sec Loss 7.3191 LearningRate 0.0519 Epoch: 5 Global Step: 93390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:41,493-Speed 9438.64 samples/sec Loss 7.3192 LearningRate 0.0519 Epoch: 5 Global Step: 93400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:42,583-Speed 9395.85 samples/sec Loss 7.2549 LearningRate 0.0519 Epoch: 5 Global Step: 93410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:43,650-Speed 9604.60 samples/sec Loss 7.2787 LearningRate 0.0519 Epoch: 5 Global Step: 93420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:44,690-Speed 9850.47 samples/sec Loss 7.2539 LearningRate 0.0519 Epoch: 5 Global Step: 93430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:45,761-Speed 9569.79 samples/sec Loss 7.2845 LearningRate 0.0519 Epoch: 5 Global Step: 93440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:46,786-Speed 9992.50 samples/sec Loss 7.2538 LearningRate 0.0518 Epoch: 5 Global Step: 93450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:47,908-Speed 9132.72 samples/sec Loss 7.2737 LearningRate 0.0518 Epoch: 5 Global Step: 93460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:48,975-Speed 9606.25 samples/sec Loss 7.2105 LearningRate 0.0518 Epoch: 5 Global Step: 93470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:50,097-Speed 9136.62 samples/sec Loss 7.3232 LearningRate 0.0518 Epoch: 5 Global Step: 93480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:51,168-Speed 9560.28 samples/sec Loss 7.2363 LearningRate 0.0518 Epoch: 5 Global Step: 93490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:52,284-Speed 9187.42 samples/sec Loss 7.3520 LearningRate 0.0518 Epoch: 5 Global Step: 93500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:53,395-Speed 9217.09 samples/sec Loss 7.3723 LearningRate 0.0518 Epoch: 5 Global Step: 93510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:54,448-Speed 9733.05 samples/sec Loss 7.2584 LearningRate 0.0518 Epoch: 5 Global Step: 93520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:55,506-Speed 9694.70 samples/sec Loss 7.1789 LearningRate 0.0518 Epoch: 5 Global Step: 93530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:56,555-Speed 9761.88 samples/sec Loss 7.2553 LearningRate 0.0518 Epoch: 5 Global Step: 93540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:57,621-Speed 9618.80 samples/sec Loss 7.2116 LearningRate 0.0518 Epoch: 5 Global Step: 93550 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:17:58,699-Speed 9501.13 samples/sec Loss 7.2760 LearningRate 0.0518 Epoch: 5 Global Step: 93560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:17:59,772-Speed 9546.15 samples/sec Loss 7.3506 LearningRate 0.0518 Epoch: 5 Global Step: 93570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:00,840-Speed 9592.06 samples/sec Loss 7.2152 LearningRate 0.0518 Epoch: 5 Global Step: 93580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:01,905-Speed 9624.35 samples/sec Loss 7.3301 LearningRate 0.0518 Epoch: 5 Global Step: 93590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:02,990-Speed 9450.93 samples/sec Loss 7.2760 LearningRate 0.0518 Epoch: 5 Global Step: 93600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:04,033-Speed 9820.31 samples/sec Loss 7.2737 LearningRate 0.0518 Epoch: 5 Global Step: 93610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:05,076-Speed 9823.62 samples/sec Loss 7.1812 LearningRate 0.0518 Epoch: 5 Global Step: 93620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:06,152-Speed 9517.47 samples/sec Loss 7.4139 LearningRate 0.0518 Epoch: 5 Global Step: 93630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:07,225-Speed 9547.19 samples/sec Loss 7.2643 LearningRate 0.0518 Epoch: 5 Global Step: 93640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:08,255-Speed 9954.53 samples/sec Loss 7.2473 LearningRate 0.0518 Epoch: 5 Global Step: 93650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:09,341-Speed 9433.99 samples/sec Loss 7.0295 LearningRate 0.0518 Epoch: 5 Global Step: 93660 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:18:10,434-Speed 9370.37 samples/sec Loss 7.2255 LearningRate 0.0518 Epoch: 5 Global Step: 93670 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:18:11,531-Speed 9339.28 samples/sec Loss 7.2820 LearningRate 0.0517 Epoch: 5 Global Step: 93680 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:18:12,605-Speed 9541.49 samples/sec Loss 7.2097 LearningRate 0.0517 Epoch: 5 Global Step: 93690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:13,643-Speed 9876.00 samples/sec Loss 7.1641 LearningRate 0.0517 Epoch: 5 Global Step: 93700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:14,698-Speed 9715.92 samples/sec Loss 7.3156 LearningRate 0.0517 Epoch: 5 Global Step: 93710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:15,725-Speed 9981.00 samples/sec Loss 7.3010 LearningRate 0.0517 Epoch: 5 Global Step: 93720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:16,792-Speed 9601.43 samples/sec Loss 7.2330 LearningRate 0.0517 Epoch: 5 Global Step: 93730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:17,916-Speed 9115.53 samples/sec Loss 7.2863 LearningRate 0.0517 Epoch: 5 Global Step: 93740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:19,010-Speed 9366.61 samples/sec Loss 7.3060 LearningRate 0.0517 Epoch: 5 Global Step: 93750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:20,110-Speed 9308.64 samples/sec Loss 7.2117 LearningRate 0.0517 Epoch: 5 Global Step: 93760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:21,184-Speed 9542.86 samples/sec Loss 7.3899 LearningRate 0.0517 Epoch: 5 Global Step: 93770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:22,271-Speed 9423.31 samples/sec Loss 7.2149 LearningRate 0.0517 Epoch: 5 Global Step: 93780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:23,337-Speed 9615.21 samples/sec Loss 7.2892 LearningRate 0.0517 Epoch: 5 Global Step: 93790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:18:24,416-Speed 9488.33 samples/sec Loss 7.3211 LearningRate 0.0517 Epoch: 5 Global Step: 93800 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:18:25,529-Speed 9209.54 samples/sec Loss 7.2420 LearningRate 0.0517 Epoch: 5 Global Step: 93810 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:18:26,602-Speed 9550.59 samples/sec Loss 7.2322 LearningRate 0.0517 Epoch: 5 Global Step: 93820 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:18:27,645-Speed 9825.13 samples/sec Loss 7.2794 LearningRate 0.0517 Epoch: 5 Global Step: 93830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:18:28,704-Speed 9676.30 samples/sec Loss 7.2325 LearningRate 0.0517 Epoch: 5 Global Step: 93840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:18:29,829-Speed 9107.12 samples/sec Loss 7.2570 LearningRate 0.0517 Epoch: 5 Global Step: 93850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:18:30,899-Speed 9574.17 samples/sec Loss 7.2495 LearningRate 0.0517 Epoch: 5 Global Step: 93860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:18:31,983-Speed 9452.69 samples/sec Loss 7.1998 LearningRate 0.0517 Epoch: 5 Global Step: 93870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:18:33,044-Speed 9665.47 samples/sec Loss 7.3332 LearningRate 0.0517 Epoch: 5 Global Step: 93880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:18:34,128-Speed 9450.58 samples/sec Loss 7.2981 LearningRate 0.0517 Epoch: 5 Global Step: 93890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:18:35,200-Speed 9554.88 samples/sec Loss 7.2709 LearningRate 0.0517 Epoch: 5 Global Step: 93900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:18:36,264-Speed 9629.07 samples/sec Loss 7.3341 LearningRate 0.0517 Epoch: 5 Global Step: 93910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:18:37,393-Speed 9080.94 samples/sec Loss 7.3058 LearningRate 0.0516 Epoch: 5 Global Step: 93920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:18:38,437-Speed 9808.58 samples/sec Loss 7.3302 LearningRate 0.0516 Epoch: 5 Global Step: 93930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:39,476-Speed 9864.15 samples/sec Loss 7.2548 LearningRate 0.0516 Epoch: 5 Global Step: 93940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:40,570-Speed 9364.15 samples/sec Loss 7.2119 LearningRate 0.0516 Epoch: 5 Global Step: 93950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:41,605-Speed 9900.52 samples/sec Loss 7.3203 LearningRate 0.0516 Epoch: 5 Global Step: 93960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:42,637-Speed 9929.31 samples/sec Loss 7.3086 LearningRate 0.0516 Epoch: 5 Global Step: 93970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:43,666-Speed 9957.14 samples/sec Loss 7.2817 LearningRate 0.0516 Epoch: 5 Global Step: 93980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:44,741-Speed 9530.02 samples/sec Loss 7.3340 LearningRate 0.0516 Epoch: 5 Global Step: 93990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:18:45,804-Speed 9638.35 samples/sec Loss 7.1080 LearningRate 0.0516 Epoch: 5 Global Step: 94000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:19:07,695-[lfw][94000]XNorm: 11.538984 Training: 2022-04-11 15:19:07,696-[lfw][94000]Accuracy-Flip: 0.99617+-0.00248 Training: 2022-04-11 15:19:07,696-[lfw][94000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:19:32,991-[cfp_fp][94000]XNorm: 9.872888 Training: 2022-04-11 15:19:32,991-[cfp_fp][94000]Accuracy-Flip: 0.95314+-0.01144 Training: 2022-04-11 15:19:32,992-[cfp_fp][94000]Accuracy-Highest: 0.95729 Training: 2022-04-11 15:19:54,812-[agedb_30][94000]XNorm: 11.215298 Training: 2022-04-11 15:19:54,813-[agedb_30][94000]Accuracy-Flip: 0.96233+-0.00810 Training: 2022-04-11 15:19:54,814-[agedb_30][94000]Accuracy-Highest: 0.96317 Training: 2022-04-11 15:19:55,872-Speed 146.15 samples/sec Loss 7.2117 LearningRate 0.0516 Epoch: 5 Global Step: 94010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:19:56,921-Speed 9771.10 samples/sec Loss 7.3214 LearningRate 0.0516 Epoch: 5 Global Step: 94020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:19:57,977-Speed 9700.86 samples/sec Loss 7.3417 LearningRate 0.0516 Epoch: 5 Global Step: 94030 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:19:59,068-Speed 9390.78 samples/sec Loss 7.2362 LearningRate 0.0516 Epoch: 5 Global Step: 94040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:00,103-Speed 9902.39 samples/sec Loss 7.1644 LearningRate 0.0516 Epoch: 5 Global Step: 94050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:01,180-Speed 9509.39 samples/sec Loss 7.1496 LearningRate 0.0516 Epoch: 5 Global Step: 94060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:02,275-Speed 9359.37 samples/sec Loss 7.2382 LearningRate 0.0516 Epoch: 5 Global Step: 94070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:03,357-Speed 9472.75 samples/sec Loss 7.2335 LearningRate 0.0516 Epoch: 5 Global Step: 94080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:04,417-Speed 9658.28 samples/sec Loss 7.2954 LearningRate 0.0516 Epoch: 5 Global Step: 94090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:05,476-Speed 9678.90 samples/sec Loss 7.3134 LearningRate 0.0516 Epoch: 5 Global Step: 94100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:06,545-Speed 9582.45 samples/sec Loss 7.3488 LearningRate 0.0516 Epoch: 5 Global Step: 94110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:07,603-Speed 9686.34 samples/sec Loss 7.2306 LearningRate 0.0516 Epoch: 5 Global Step: 94120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:08,667-Speed 9628.94 samples/sec Loss 7.2696 LearningRate 0.0516 Epoch: 5 Global Step: 94130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:09,719-Speed 9737.17 samples/sec Loss 7.2409 LearningRate 0.0516 Epoch: 5 Global Step: 94140 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:20:10,824-Speed 9277.21 samples/sec Loss 7.1750 LearningRate 0.0515 Epoch: 5 Global Step: 94150 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:20:11,897-Speed 9542.99 samples/sec Loss 7.2882 LearningRate 0.0515 Epoch: 5 Global Step: 94160 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:20:12,980-Speed 9457.88 samples/sec Loss 7.2684 LearningRate 0.0515 Epoch: 5 Global Step: 94170 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:20:14,077-Speed 9345.11 samples/sec Loss 7.3992 LearningRate 0.0515 Epoch: 5 Global Step: 94180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:15,151-Speed 9546.95 samples/sec Loss 7.3612 LearningRate 0.0515 Epoch: 5 Global Step: 94190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:16,208-Speed 9693.34 samples/sec Loss 7.3759 LearningRate 0.0515 Epoch: 5 Global Step: 94200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:17,301-Speed 9368.73 samples/sec Loss 7.2933 LearningRate 0.0515 Epoch: 5 Global Step: 94210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:18,391-Speed 9402.93 samples/sec Loss 7.2655 LearningRate 0.0515 Epoch: 5 Global Step: 94220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:19,494-Speed 9290.57 samples/sec Loss 7.1740 LearningRate 0.0515 Epoch: 5 Global Step: 94230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:20,585-Speed 9388.90 samples/sec Loss 7.3317 LearningRate 0.0515 Epoch: 5 Global Step: 94240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:21,660-Speed 9529.46 samples/sec Loss 7.2499 LearningRate 0.0515 Epoch: 5 Global Step: 94250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:22,738-Speed 9508.95 samples/sec Loss 7.2038 LearningRate 0.0515 Epoch: 5 Global Step: 94260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:23,832-Speed 9363.31 samples/sec Loss 7.3085 LearningRate 0.0515 Epoch: 5 Global Step: 94270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:24,906-Speed 9540.89 samples/sec Loss 7.3423 LearningRate 0.0515 Epoch: 5 Global Step: 94280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:26,000-Speed 9366.77 samples/sec Loss 7.2689 LearningRate 0.0515 Epoch: 5 Global Step: 94290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:27,066-Speed 9616.61 samples/sec Loss 7.2511 LearningRate 0.0515 Epoch: 5 Global Step: 94300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:28,134-Speed 9598.32 samples/sec Loss 7.1060 LearningRate 0.0515 Epoch: 5 Global Step: 94310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:29,207-Speed 9552.05 samples/sec Loss 7.2853 LearningRate 0.0515 Epoch: 5 Global Step: 94320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:30,326-Speed 9155.18 samples/sec Loss 7.2832 LearningRate 0.0515 Epoch: 5 Global Step: 94330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:31,436-Speed 9229.25 samples/sec Loss 7.2086 LearningRate 0.0515 Epoch: 5 Global Step: 94340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:32,559-Speed 9123.13 samples/sec Loss 7.3774 LearningRate 0.0515 Epoch: 5 Global Step: 94350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:33,693-Speed 9033.82 samples/sec Loss 7.1820 LearningRate 0.0515 Epoch: 5 Global Step: 94360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:34,796-Speed 9293.34 samples/sec Loss 7.2552 LearningRate 0.0515 Epoch: 5 Global Step: 94370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:35,866-Speed 9570.97 samples/sec Loss 7.3527 LearningRate 0.0514 Epoch: 5 Global Step: 94380 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:20:36,975-Speed 9241.56 samples/sec Loss 7.3019 LearningRate 0.0514 Epoch: 5 Global Step: 94390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:38,052-Speed 9513.80 samples/sec Loss 7.2157 LearningRate 0.0514 Epoch: 5 Global Step: 94400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:39,108-Speed 9703.53 samples/sec Loss 7.3023 LearningRate 0.0514 Epoch: 5 Global Step: 94410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:40,166-Speed 9687.92 samples/sec Loss 7.2761 LearningRate 0.0514 Epoch: 5 Global Step: 94420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:41,292-Speed 9099.78 samples/sec Loss 7.3058 LearningRate 0.0514 Epoch: 5 Global Step: 94430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:42,374-Speed 9465.49 samples/sec Loss 7.2746 LearningRate 0.0514 Epoch: 5 Global Step: 94440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:43,410-Speed 9887.51 samples/sec Loss 7.2957 LearningRate 0.0514 Epoch: 5 Global Step: 94450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:44,478-Speed 9595.77 samples/sec Loss 7.2323 LearningRate 0.0514 Epoch: 5 Global Step: 94460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:45,612-Speed 9033.43 samples/sec Loss 7.2612 LearningRate 0.0514 Epoch: 5 Global Step: 94470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:46,721-Speed 9240.46 samples/sec Loss 7.1993 LearningRate 0.0514 Epoch: 5 Global Step: 94480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:47,825-Speed 9282.49 samples/sec Loss 7.3439 LearningRate 0.0514 Epoch: 5 Global Step: 94490 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:20:48,943-Speed 9161.13 samples/sec Loss 7.1951 LearningRate 0.0514 Epoch: 5 Global Step: 94500 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:20:50,043-Speed 9320.34 samples/sec Loss 7.1476 LearningRate 0.0514 Epoch: 5 Global Step: 94510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:51,119-Speed 9520.22 samples/sec Loss 7.1563 LearningRate 0.0514 Epoch: 5 Global Step: 94520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:52,195-Speed 9518.75 samples/sec Loss 7.2916 LearningRate 0.0514 Epoch: 5 Global Step: 94530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:53,284-Speed 9406.28 samples/sec Loss 7.3658 LearningRate 0.0514 Epoch: 5 Global Step: 94540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:54,394-Speed 9234.89 samples/sec Loss 7.3338 LearningRate 0.0514 Epoch: 5 Global Step: 94550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:55,479-Speed 9450.19 samples/sec Loss 7.2311 LearningRate 0.0514 Epoch: 5 Global Step: 94560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:56,547-Speed 9590.75 samples/sec Loss 7.3451 LearningRate 0.0514 Epoch: 5 Global Step: 94570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:57,653-Speed 9265.32 samples/sec Loss 7.2642 LearningRate 0.0514 Epoch: 5 Global Step: 94580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:58,731-Speed 9498.89 samples/sec Loss 7.2415 LearningRate 0.0514 Epoch: 5 Global Step: 94590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:20:59,831-Speed 9322.02 samples/sec Loss 7.2722 LearningRate 0.0514 Epoch: 5 Global Step: 94600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:00,889-Speed 9684.50 samples/sec Loss 7.2643 LearningRate 0.0513 Epoch: 5 Global Step: 94610 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:21:01,990-Speed 9305.71 samples/sec Loss 7.1472 LearningRate 0.0513 Epoch: 5 Global Step: 94620 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:21:03,077-Speed 9425.90 samples/sec Loss 7.2660 LearningRate 0.0513 Epoch: 5 Global Step: 94630 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:21:04,154-Speed 9516.08 samples/sec Loss 7.2548 LearningRate 0.0513 Epoch: 5 Global Step: 94640 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:21:05,218-Speed 9624.51 samples/sec Loss 7.3149 LearningRate 0.0513 Epoch: 5 Global Step: 94650 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:21:06,293-Speed 9531.35 samples/sec Loss 7.2639 LearningRate 0.0513 Epoch: 5 Global Step: 94660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:07,384-Speed 9395.13 samples/sec Loss 7.2155 LearningRate 0.0513 Epoch: 5 Global Step: 94670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:08,503-Speed 9149.06 samples/sec Loss 7.2347 LearningRate 0.0513 Epoch: 5 Global Step: 94680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:09,556-Speed 9732.60 samples/sec Loss 7.2148 LearningRate 0.0513 Epoch: 5 Global Step: 94690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:10,606-Speed 9755.72 samples/sec Loss 7.3323 LearningRate 0.0513 Epoch: 5 Global Step: 94700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:11,648-Speed 9836.78 samples/sec Loss 7.2262 LearningRate 0.0513 Epoch: 5 Global Step: 94710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:12,724-Speed 9523.34 samples/sec Loss 7.1859 LearningRate 0.0513 Epoch: 5 Global Step: 94720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:13,860-Speed 9020.17 samples/sec Loss 7.1787 LearningRate 0.0513 Epoch: 5 Global Step: 94730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:14,956-Speed 9352.84 samples/sec Loss 7.2890 LearningRate 0.0513 Epoch: 5 Global Step: 94740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:16,033-Speed 9514.92 samples/sec Loss 7.1948 LearningRate 0.0513 Epoch: 5 Global Step: 94750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:17,086-Speed 9730.02 samples/sec Loss 7.2450 LearningRate 0.0513 Epoch: 5 Global Step: 94760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:18,162-Speed 9520.07 samples/sec Loss 7.2357 LearningRate 0.0513 Epoch: 5 Global Step: 94770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:19,235-Speed 9550.79 samples/sec Loss 7.2985 LearningRate 0.0513 Epoch: 5 Global Step: 94780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:20,309-Speed 9540.56 samples/sec Loss 7.1532 LearningRate 0.0513 Epoch: 5 Global Step: 94790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:21,410-Speed 9298.76 samples/sec Loss 7.2447 LearningRate 0.0513 Epoch: 5 Global Step: 94800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:22,450-Speed 9859.47 samples/sec Loss 7.1616 LearningRate 0.0513 Epoch: 5 Global Step: 94810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:23,571-Speed 9135.87 samples/sec Loss 7.2865 LearningRate 0.0513 Epoch: 5 Global Step: 94820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:24,669-Speed 9331.52 samples/sec Loss 7.2699 LearningRate 0.0513 Epoch: 5 Global Step: 94830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:25,731-Speed 9656.61 samples/sec Loss 7.2241 LearningRate 0.0513 Epoch: 5 Global Step: 94840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:26,842-Speed 9217.62 samples/sec Loss 7.3052 LearningRate 0.0512 Epoch: 5 Global Step: 94850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:27,911-Speed 9585.31 samples/sec Loss 7.1922 LearningRate 0.0512 Epoch: 5 Global Step: 94860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:28,980-Speed 9587.74 samples/sec Loss 7.1621 LearningRate 0.0512 Epoch: 5 Global Step: 94870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:30,076-Speed 9345.64 samples/sec Loss 7.3112 LearningRate 0.0512 Epoch: 5 Global Step: 94880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:31,150-Speed 9544.74 samples/sec Loss 7.1854 LearningRate 0.0512 Epoch: 5 Global Step: 94890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:32,222-Speed 9561.54 samples/sec Loss 7.1720 LearningRate 0.0512 Epoch: 5 Global Step: 94900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:33,292-Speed 9577.90 samples/sec Loss 7.3005 LearningRate 0.0512 Epoch: 5 Global Step: 94910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:34,358-Speed 9612.17 samples/sec Loss 7.3070 LearningRate 0.0512 Epoch: 5 Global Step: 94920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:35,462-Speed 9277.16 samples/sec Loss 7.0983 LearningRate 0.0512 Epoch: 5 Global Step: 94930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:36,578-Speed 9184.50 samples/sec Loss 7.1559 LearningRate 0.0512 Epoch: 5 Global Step: 94940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:37,661-Speed 9458.66 samples/sec Loss 7.3197 LearningRate 0.0512 Epoch: 5 Global Step: 94950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:38,719-Speed 9681.56 samples/sec Loss 7.2492 LearningRate 0.0512 Epoch: 5 Global Step: 94960 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:21:39,765-Speed 9794.92 samples/sec Loss 7.1292 LearningRate 0.0512 Epoch: 5 Global Step: 94970 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:21:42,719-Speed 3467.93 samples/sec Loss 7.1801 LearningRate 0.0512 Epoch: 5 Global Step: 94980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:44,677-Speed 5230.68 samples/sec Loss 7.1309 LearningRate 0.0512 Epoch: 5 Global Step: 94990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:45,715-Speed 9875.57 samples/sec Loss 7.2145 LearningRate 0.0512 Epoch: 5 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:46,822-Speed 9252.04 samples/sec Loss 7.2219 LearningRate 0.0512 Epoch: 5 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:47,891-Speed 9585.62 samples/sec Loss 7.2389 LearningRate 0.0512 Epoch: 5 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:49,025-Speed 9036.77 samples/sec Loss 7.2147 LearningRate 0.0512 Epoch: 5 Global Step: 95030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:50,119-Speed 9362.66 samples/sec Loss 7.2867 LearningRate 0.0512 Epoch: 5 Global Step: 95040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:51,222-Speed 9289.38 samples/sec Loss 7.2651 LearningRate 0.0512 Epoch: 5 Global Step: 95050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:52,272-Speed 9764.61 samples/sec Loss 7.2537 LearningRate 0.0512 Epoch: 5 Global Step: 95060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:53,366-Speed 9361.73 samples/sec Loss 7.3272 LearningRate 0.0512 Epoch: 5 Global Step: 95070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:54,433-Speed 9598.39 samples/sec Loss 7.2592 LearningRate 0.0511 Epoch: 5 Global Step: 95080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:55,548-Speed 9192.58 samples/sec Loss 7.2688 LearningRate 0.0511 Epoch: 5 Global Step: 95090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:21:56,632-Speed 9453.87 samples/sec Loss 7.3311 LearningRate 0.0511 Epoch: 5 Global Step: 95100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:57,709-Speed 9511.89 samples/sec Loss 7.2136 LearningRate 0.0511 Epoch: 5 Global Step: 95110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:58,811-Speed 9299.18 samples/sec Loss 7.2886 LearningRate 0.0511 Epoch: 5 Global Step: 95120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:21:59,909-Speed 9330.42 samples/sec Loss 7.3114 LearningRate 0.0511 Epoch: 5 Global Step: 95130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:01,017-Speed 9250.46 samples/sec Loss 7.1660 LearningRate 0.0511 Epoch: 5 Global Step: 95140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:02,395-Speed 7434.65 samples/sec Loss 7.2405 LearningRate 0.0511 Epoch: 5 Global Step: 95150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:03,489-Speed 9365.19 samples/sec Loss 7.1711 LearningRate 0.0511 Epoch: 5 Global Step: 95160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:04,569-Speed 9489.43 samples/sec Loss 7.1515 LearningRate 0.0511 Epoch: 5 Global Step: 95170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:05,641-Speed 9550.65 samples/sec Loss 7.1315 LearningRate 0.0511 Epoch: 5 Global Step: 95180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:06,774-Speed 9048.53 samples/sec Loss 7.2060 LearningRate 0.0511 Epoch: 5 Global Step: 95190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:07,829-Speed 9710.80 samples/sec Loss 7.2559 LearningRate 0.0511 Epoch: 5 Global Step: 95200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:08,926-Speed 9339.82 samples/sec Loss 7.2107 LearningRate 0.0511 Epoch: 5 Global Step: 95210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:09,993-Speed 9598.58 samples/sec Loss 7.2119 LearningRate 0.0511 Epoch: 5 Global Step: 95220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:11,068-Speed 9536.21 samples/sec Loss 7.2244 LearningRate 0.0511 Epoch: 5 Global Step: 95230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:12,189-Speed 9139.23 samples/sec Loss 7.2400 LearningRate 0.0511 Epoch: 5 Global Step: 95240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:13,293-Speed 9276.68 samples/sec Loss 7.1775 LearningRate 0.0511 Epoch: 5 Global Step: 95250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:14,357-Speed 9634.67 samples/sec Loss 7.1845 LearningRate 0.0511 Epoch: 5 Global Step: 95260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:15,429-Speed 9554.01 samples/sec Loss 7.2781 LearningRate 0.0511 Epoch: 5 Global Step: 95270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:22:16,506-Speed 9514.23 samples/sec Loss 7.2358 LearningRate 0.0511 Epoch: 5 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:22:17,579-Speed 9546.77 samples/sec Loss 7.2445 LearningRate 0.0511 Epoch: 5 Global Step: 95290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:22:18,677-Speed 9332.97 samples/sec Loss 7.2149 LearningRate 0.0511 Epoch: 5 Global Step: 95300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:22:19,753-Speed 9521.58 samples/sec Loss 7.2909 LearningRate 0.0510 Epoch: 5 Global Step: 95310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:22:20,822-Speed 9585.37 samples/sec Loss 7.2423 LearningRate 0.0510 Epoch: 5 Global Step: 95320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:22:21,914-Speed 9383.89 samples/sec Loss 7.1773 LearningRate 0.0510 Epoch: 5 Global Step: 95330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:22:23,024-Speed 9228.92 samples/sec Loss 7.1499 LearningRate 0.0510 Epoch: 5 Global Step: 95340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:22:24,136-Speed 9211.22 samples/sec Loss 7.1335 LearningRate 0.0510 Epoch: 5 Global Step: 95350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:22:25,217-Speed 9484.29 samples/sec Loss 7.3404 LearningRate 0.0510 Epoch: 5 Global Step: 95360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:22:26,333-Speed 9178.73 samples/sec Loss 7.1902 LearningRate 0.0510 Epoch: 5 Global Step: 95370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:27,424-Speed 9392.02 samples/sec Loss 7.2168 LearningRate 0.0510 Epoch: 5 Global Step: 95380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:28,510-Speed 9430.76 samples/sec Loss 7.2904 LearningRate 0.0510 Epoch: 5 Global Step: 95390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:29,579-Speed 9588.50 samples/sec Loss 7.1936 LearningRate 0.0510 Epoch: 5 Global Step: 95400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:30,683-Speed 9281.94 samples/sec Loss 7.0944 LearningRate 0.0510 Epoch: 5 Global Step: 95410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:31,780-Speed 9343.25 samples/sec Loss 7.1618 LearningRate 0.0510 Epoch: 5 Global Step: 95420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:32,817-Speed 9882.40 samples/sec Loss 7.1852 LearningRate 0.0510 Epoch: 5 Global Step: 95430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:33,891-Speed 9538.90 samples/sec Loss 7.2356 LearningRate 0.0510 Epoch: 5 Global Step: 95440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:34,993-Speed 9296.46 samples/sec Loss 7.2145 LearningRate 0.0510 Epoch: 5 Global Step: 95450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:36,106-Speed 9202.97 samples/sec Loss 7.1091 LearningRate 0.0510 Epoch: 5 Global Step: 95460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:37,187-Speed 9481.73 samples/sec Loss 7.3430 LearningRate 0.0510 Epoch: 5 Global Step: 95470 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:22:38,303-Speed 9178.20 samples/sec Loss 7.3053 LearningRate 0.0510 Epoch: 5 Global Step: 95480 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:22:39,393-Speed 9402.90 samples/sec Loss 7.2268 LearningRate 0.0510 Epoch: 5 Global Step: 95490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:40,542-Speed 8921.25 samples/sec Loss 7.2200 LearningRate 0.0510 Epoch: 5 Global Step: 95500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:41,650-Speed 9240.26 samples/sec Loss 7.1882 LearningRate 0.0510 Epoch: 5 Global Step: 95510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:42,770-Speed 9151.24 samples/sec Loss 7.2520 LearningRate 0.0510 Epoch: 5 Global Step: 95520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:43,880-Speed 9230.65 samples/sec Loss 7.3749 LearningRate 0.0510 Epoch: 5 Global Step: 95530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:44,956-Speed 9519.72 samples/sec Loss 7.1719 LearningRate 0.0510 Epoch: 5 Global Step: 95540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:46,047-Speed 9393.84 samples/sec Loss 7.2464 LearningRate 0.0509 Epoch: 5 Global Step: 95550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:47,120-Speed 9545.81 samples/sec Loss 7.2253 LearningRate 0.0509 Epoch: 5 Global Step: 95560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:48,205-Speed 9449.49 samples/sec Loss 7.1994 LearningRate 0.0509 Epoch: 5 Global Step: 95570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:49,295-Speed 9396.29 samples/sec Loss 7.1146 LearningRate 0.0509 Epoch: 5 Global Step: 95580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:50,351-Speed 9707.05 samples/sec Loss 7.3213 LearningRate 0.0509 Epoch: 5 Global Step: 95590 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:22:51,410-Speed 9676.41 samples/sec Loss 7.2016 LearningRate 0.0509 Epoch: 5 Global Step: 95600 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:22:52,500-Speed 9402.46 samples/sec Loss 7.1851 LearningRate 0.0509 Epoch: 5 Global Step: 95610 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:22:53,581-Speed 9471.02 samples/sec Loss 7.3399 LearningRate 0.0509 Epoch: 5 Global Step: 95620 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:22:54,660-Speed 9496.81 samples/sec Loss 7.3452 LearningRate 0.0509 Epoch: 5 Global Step: 95630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:55,768-Speed 9255.61 samples/sec Loss 7.2640 LearningRate 0.0509 Epoch: 5 Global Step: 95640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:56,838-Speed 9578.58 samples/sec Loss 7.3817 LearningRate 0.0509 Epoch: 5 Global Step: 95650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:57,917-Speed 9490.44 samples/sec Loss 7.2700 LearningRate 0.0509 Epoch: 5 Global Step: 95660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:22:58,995-Speed 9501.18 samples/sec Loss 7.1833 LearningRate 0.0509 Epoch: 5 Global Step: 95670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:00,089-Speed 9369.17 samples/sec Loss 7.1516 LearningRate 0.0509 Epoch: 5 Global Step: 95680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:01,139-Speed 9754.26 samples/sec Loss 7.2402 LearningRate 0.0509 Epoch: 5 Global Step: 95690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:02,244-Speed 9278.68 samples/sec Loss 7.2611 LearningRate 0.0509 Epoch: 5 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:23:03,295-Speed 9751.33 samples/sec Loss 7.2000 LearningRate 0.0509 Epoch: 5 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:23:04,358-Speed 9635.59 samples/sec Loss 7.2946 LearningRate 0.0509 Epoch: 5 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:23:05,456-Speed 9330.66 samples/sec Loss 7.1466 LearningRate 0.0509 Epoch: 5 Global Step: 95730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:23:06,550-Speed 9366.16 samples/sec Loss 7.1815 LearningRate 0.0509 Epoch: 5 Global Step: 95740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:23:08,568-Speed 5075.06 samples/sec Loss 7.2086 LearningRate 0.0509 Epoch: 5 Global Step: 95750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:23:09,627-Speed 9677.49 samples/sec Loss 7.2612 LearningRate 0.0509 Epoch: 5 Global Step: 95760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:23:11,523-Speed 5403.30 samples/sec Loss 7.1271 LearningRate 0.0509 Epoch: 5 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:23:13,400-Speed 5456.82 samples/sec Loss 7.2394 LearningRate 0.0508 Epoch: 5 Global Step: 95780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:23:14,453-Speed 9737.68 samples/sec Loss 7.2291 LearningRate 0.0508 Epoch: 5 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 15:23:15,508-Speed 9704.42 samples/sec Loss 7.2288 LearningRate 0.0508 Epoch: 5 Global Step: 95800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:16,564-Speed 9709.95 samples/sec Loss 7.1911 LearningRate 0.0508 Epoch: 5 Global Step: 95810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:17,660-Speed 9340.21 samples/sec Loss 7.1800 LearningRate 0.0508 Epoch: 5 Global Step: 95820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:18,762-Speed 9299.31 samples/sec Loss 7.1810 LearningRate 0.0508 Epoch: 5 Global Step: 95830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:19,849-Speed 9429.27 samples/sec Loss 7.1374 LearningRate 0.0508 Epoch: 5 Global Step: 95840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:20,934-Speed 9444.67 samples/sec Loss 7.2284 LearningRate 0.0508 Epoch: 5 Global Step: 95850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:22,011-Speed 9507.38 samples/sec Loss 7.1055 LearningRate 0.0508 Epoch: 5 Global Step: 95860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:23,112-Speed 9309.04 samples/sec Loss 7.2248 LearningRate 0.0508 Epoch: 5 Global Step: 95870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:24,189-Speed 9505.79 samples/sec Loss 7.2472 LearningRate 0.0508 Epoch: 5 Global Step: 95880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:25,272-Speed 9467.69 samples/sec Loss 7.3475 LearningRate 0.0508 Epoch: 5 Global Step: 95890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:26,355-Speed 9461.13 samples/sec Loss 7.2769 LearningRate 0.0508 Epoch: 5 Global Step: 95900 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:23:27,420-Speed 9616.10 samples/sec Loss 7.2730 LearningRate 0.0508 Epoch: 5 Global Step: 95910 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:23:28,496-Speed 9528.61 samples/sec Loss 7.1783 LearningRate 0.0508 Epoch: 5 Global Step: 95920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:29,533-Speed 9878.30 samples/sec Loss 7.1702 LearningRate 0.0508 Epoch: 5 Global Step: 95930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:30,615-Speed 9468.97 samples/sec Loss 7.2998 LearningRate 0.0508 Epoch: 5 Global Step: 95940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:31,740-Speed 9109.15 samples/sec Loss 7.1866 LearningRate 0.0508 Epoch: 5 Global Step: 95950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:32,820-Speed 9488.35 samples/sec Loss 7.1852 LearningRate 0.0508 Epoch: 5 Global Step: 95960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:33,880-Speed 9663.48 samples/sec Loss 7.2241 LearningRate 0.0508 Epoch: 5 Global Step: 95970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:34,925-Speed 9804.56 samples/sec Loss 7.1063 LearningRate 0.0508 Epoch: 5 Global Step: 95980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:35,965-Speed 9852.13 samples/sec Loss 7.2969 LearningRate 0.0508 Epoch: 5 Global Step: 95990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:37,046-Speed 9483.49 samples/sec Loss 7.1934 LearningRate 0.0508 Epoch: 5 Global Step: 96000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:23:59,103-[lfw][96000]XNorm: 11.380481 Training: 2022-04-11 15:23:59,103-[lfw][96000]Accuracy-Flip: 0.99517+-0.00241 Training: 2022-04-11 15:23:59,104-[lfw][96000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:24:24,641-[cfp_fp][96000]XNorm: 9.756551 Training: 2022-04-11 15:24:24,642-[cfp_fp][96000]Accuracy-Flip: 0.95471+-0.01250 Training: 2022-04-11 15:24:24,642-[cfp_fp][96000]Accuracy-Highest: 0.95729 Training: 2022-04-11 15:24:46,695-[agedb_30][96000]XNorm: 11.009636 Training: 2022-04-11 15:24:46,695-[agedb_30][96000]Accuracy-Flip: 0.95867+-0.00862 Training: 2022-04-11 15:24:46,696-[agedb_30][96000]Accuracy-Highest: 0.96317 Training: 2022-04-11 15:24:47,788-Speed 144.75 samples/sec Loss 7.1743 LearningRate 0.0507 Epoch: 5 Global Step: 96010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:48,851-Speed 9641.02 samples/sec Loss 7.3405 LearningRate 0.0507 Epoch: 5 Global Step: 96020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:49,989-Speed 9006.71 samples/sec Loss 7.1791 LearningRate 0.0507 Epoch: 5 Global Step: 96030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:51,103-Speed 9193.31 samples/sec Loss 7.1436 LearningRate 0.0507 Epoch: 5 Global Step: 96040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:52,198-Speed 9362.88 samples/sec Loss 7.1933 LearningRate 0.0507 Epoch: 5 Global Step: 96050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:53,243-Speed 9801.07 samples/sec Loss 7.2383 LearningRate 0.0507 Epoch: 5 Global Step: 96060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:54,319-Speed 9518.21 samples/sec Loss 7.3854 LearningRate 0.0507 Epoch: 5 Global Step: 96070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:55,407-Speed 9416.94 samples/sec Loss 7.3345 LearningRate 0.0507 Epoch: 5 Global Step: 96080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:56,485-Speed 9511.59 samples/sec Loss 7.2706 LearningRate 0.0507 Epoch: 5 Global Step: 96090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:57,562-Speed 9518.22 samples/sec Loss 7.1717 LearningRate 0.0507 Epoch: 5 Global Step: 96100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:58,620-Speed 9683.48 samples/sec Loss 7.2561 LearningRate 0.0507 Epoch: 5 Global Step: 96110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:24:59,656-Speed 9882.05 samples/sec Loss 7.1685 LearningRate 0.0507 Epoch: 5 Global Step: 96120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:00,799-Speed 8967.34 samples/sec Loss 7.2044 LearningRate 0.0507 Epoch: 5 Global Step: 96130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:01,905-Speed 9258.36 samples/sec Loss 7.2961 LearningRate 0.0507 Epoch: 5 Global Step: 96140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:02,998-Speed 9380.22 samples/sec Loss 7.1834 LearningRate 0.0507 Epoch: 5 Global Step: 96150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:04,084-Speed 9430.13 samples/sec Loss 7.2434 LearningRate 0.0507 Epoch: 5 Global Step: 96160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:05,207-Speed 9125.98 samples/sec Loss 7.1095 LearningRate 0.0507 Epoch: 5 Global Step: 96170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:06,290-Speed 9466.03 samples/sec Loss 7.2423 LearningRate 0.0507 Epoch: 5 Global Step: 96180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:07,401-Speed 9217.30 samples/sec Loss 7.3537 LearningRate 0.0507 Epoch: 5 Global Step: 96190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:08,457-Speed 9703.81 samples/sec Loss 7.1609 LearningRate 0.0507 Epoch: 5 Global Step: 96200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:09,534-Speed 9512.64 samples/sec Loss 7.2767 LearningRate 0.0507 Epoch: 5 Global Step: 96210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:10,640-Speed 9264.40 samples/sec Loss 7.2812 LearningRate 0.0507 Epoch: 5 Global Step: 96220 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:25:11,726-Speed 9434.83 samples/sec Loss 7.2207 LearningRate 0.0507 Epoch: 5 Global Step: 96230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:12,820-Speed 9366.68 samples/sec Loss 7.2261 LearningRate 0.0507 Epoch: 5 Global Step: 96240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:13,913-Speed 9369.59 samples/sec Loss 7.1392 LearningRate 0.0506 Epoch: 5 Global Step: 96250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:15,006-Speed 9382.72 samples/sec Loss 7.2908 LearningRate 0.0506 Epoch: 5 Global Step: 96260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:16,112-Speed 9259.34 samples/sec Loss 7.2625 LearningRate 0.0506 Epoch: 5 Global Step: 96270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:17,237-Speed 9109.07 samples/sec Loss 7.0717 LearningRate 0.0506 Epoch: 5 Global Step: 96280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:18,309-Speed 9558.24 samples/sec Loss 7.1139 LearningRate 0.0506 Epoch: 5 Global Step: 96290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:19,393-Speed 9456.01 samples/sec Loss 7.2635 LearningRate 0.0506 Epoch: 5 Global Step: 96300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:20,455-Speed 9649.30 samples/sec Loss 7.2037 LearningRate 0.0506 Epoch: 5 Global Step: 96310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:21,529-Speed 9533.82 samples/sec Loss 7.1489 LearningRate 0.0506 Epoch: 5 Global Step: 96320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:22,652-Speed 9124.73 samples/sec Loss 7.3309 LearningRate 0.0506 Epoch: 5 Global Step: 96330 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:25:23,718-Speed 9614.89 samples/sec Loss 7.2123 LearningRate 0.0506 Epoch: 5 Global Step: 96340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:24,798-Speed 9483.15 samples/sec Loss 7.2087 LearningRate 0.0506 Epoch: 5 Global Step: 96350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:25,892-Speed 9369.47 samples/sec Loss 7.2535 LearningRate 0.0506 Epoch: 5 Global Step: 96360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:26,988-Speed 9347.47 samples/sec Loss 7.2190 LearningRate 0.0506 Epoch: 5 Global Step: 96370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:28,111-Speed 9120.61 samples/sec Loss 7.2488 LearningRate 0.0506 Epoch: 5 Global Step: 96380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:29,209-Speed 9336.39 samples/sec Loss 7.1824 LearningRate 0.0506 Epoch: 5 Global Step: 96390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:30,372-Speed 8807.69 samples/sec Loss 7.2009 LearningRate 0.0506 Epoch: 5 Global Step: 96400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:31,485-Speed 9200.80 samples/sec Loss 7.1208 LearningRate 0.0506 Epoch: 5 Global Step: 96410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:32,599-Speed 9200.38 samples/sec Loss 7.1604 LearningRate 0.0506 Epoch: 5 Global Step: 96420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:33,699-Speed 9315.45 samples/sec Loss 7.2575 LearningRate 0.0506 Epoch: 5 Global Step: 96430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:34,790-Speed 9393.89 samples/sec Loss 7.1527 LearningRate 0.0506 Epoch: 5 Global Step: 96440 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:25:35,822-Speed 9923.56 samples/sec Loss 7.1546 LearningRate 0.0506 Epoch: 5 Global Step: 96450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:36,931-Speed 9239.94 samples/sec Loss 7.1514 LearningRate 0.0506 Epoch: 5 Global Step: 96460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:38,009-Speed 9503.75 samples/sec Loss 7.2613 LearningRate 0.0506 Epoch: 5 Global Step: 96470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:39,068-Speed 9678.71 samples/sec Loss 7.1566 LearningRate 0.0505 Epoch: 5 Global Step: 96480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:40,127-Speed 9678.41 samples/sec Loss 7.1983 LearningRate 0.0505 Epoch: 5 Global Step: 96490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:41,180-Speed 9723.76 samples/sec Loss 7.2273 LearningRate 0.0505 Epoch: 5 Global Step: 96500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:42,277-Speed 9343.29 samples/sec Loss 7.3361 LearningRate 0.0505 Epoch: 5 Global Step: 96510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:43,346-Speed 9581.36 samples/sec Loss 7.2559 LearningRate 0.0505 Epoch: 5 Global Step: 96520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:44,384-Speed 9873.16 samples/sec Loss 7.1762 LearningRate 0.0505 Epoch: 5 Global Step: 96530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:45,418-Speed 9908.84 samples/sec Loss 7.1950 LearningRate 0.0505 Epoch: 5 Global Step: 96540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:46,476-Speed 9689.05 samples/sec Loss 7.2399 LearningRate 0.0505 Epoch: 5 Global Step: 96550 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:25:47,551-Speed 9531.16 samples/sec Loss 7.3249 LearningRate 0.0505 Epoch: 5 Global Step: 96560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:48,632-Speed 9475.06 samples/sec Loss 7.1950 LearningRate 0.0505 Epoch: 5 Global Step: 96570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:49,739-Speed 9256.28 samples/sec Loss 7.2380 LearningRate 0.0505 Epoch: 5 Global Step: 96580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:50,839-Speed 9314.89 samples/sec Loss 7.1161 LearningRate 0.0505 Epoch: 5 Global Step: 96590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:51,922-Speed 9455.71 samples/sec Loss 7.3035 LearningRate 0.0505 Epoch: 5 Global Step: 96600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:53,011-Speed 9411.32 samples/sec Loss 7.1060 LearningRate 0.0505 Epoch: 5 Global Step: 96610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:54,109-Speed 9328.77 samples/sec Loss 7.3096 LearningRate 0.0505 Epoch: 5 Global Step: 96620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:55,213-Speed 9280.99 samples/sec Loss 7.2105 LearningRate 0.0505 Epoch: 5 Global Step: 96630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:56,265-Speed 9751.33 samples/sec Loss 7.2210 LearningRate 0.0505 Epoch: 5 Global Step: 96640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:57,340-Speed 9526.09 samples/sec Loss 7.3183 LearningRate 0.0505 Epoch: 5 Global Step: 96650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:58,393-Speed 9730.36 samples/sec Loss 7.2727 LearningRate 0.0505 Epoch: 5 Global Step: 96660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:25:59,476-Speed 9459.81 samples/sec Loss 7.1588 LearningRate 0.0505 Epoch: 5 Global Step: 96670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:00,559-Speed 9467.24 samples/sec Loss 7.2935 LearningRate 0.0505 Epoch: 5 Global Step: 96680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:01,611-Speed 9736.23 samples/sec Loss 7.2827 LearningRate 0.0505 Epoch: 5 Global Step: 96690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:02,703-Speed 9387.60 samples/sec Loss 7.2052 LearningRate 0.0505 Epoch: 5 Global Step: 96700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:03,754-Speed 9741.10 samples/sec Loss 7.2308 LearningRate 0.0505 Epoch: 5 Global Step: 96710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:04,838-Speed 9455.79 samples/sec Loss 7.2463 LearningRate 0.0504 Epoch: 5 Global Step: 96720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:05,913-Speed 9527.10 samples/sec Loss 7.1357 LearningRate 0.0504 Epoch: 5 Global Step: 96730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:06,974-Speed 9658.53 samples/sec Loss 7.1473 LearningRate 0.0504 Epoch: 5 Global Step: 96740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:08,047-Speed 9550.25 samples/sec Loss 7.2076 LearningRate 0.0504 Epoch: 5 Global Step: 96750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:09,169-Speed 9134.70 samples/sec Loss 7.2625 LearningRate 0.0504 Epoch: 5 Global Step: 96760 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:10,273-Speed 9275.96 samples/sec Loss 7.2599 LearningRate 0.0504 Epoch: 5 Global Step: 96770 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:11,378-Speed 9267.52 samples/sec Loss 7.1840 LearningRate 0.0504 Epoch: 5 Global Step: 96780 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:12,456-Speed 9511.24 samples/sec Loss 7.1098 LearningRate 0.0504 Epoch: 5 Global Step: 96790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:13,517-Speed 9656.17 samples/sec Loss 7.2385 LearningRate 0.0504 Epoch: 5 Global Step: 96800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:14,584-Speed 9598.84 samples/sec Loss 7.1214 LearningRate 0.0504 Epoch: 5 Global Step: 96810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:15,637-Speed 9739.97 samples/sec Loss 7.2192 LearningRate 0.0504 Epoch: 5 Global Step: 96820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:16,666-Speed 9960.29 samples/sec Loss 7.1819 LearningRate 0.0504 Epoch: 5 Global Step: 96830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:17,721-Speed 9705.40 samples/sec Loss 7.1518 LearningRate 0.0504 Epoch: 5 Global Step: 96840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:18,824-Speed 9291.15 samples/sec Loss 7.1400 LearningRate 0.0504 Epoch: 5 Global Step: 96850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:19,908-Speed 9455.08 samples/sec Loss 7.2898 LearningRate 0.0504 Epoch: 5 Global Step: 96860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:20,962-Speed 9714.05 samples/sec Loss 7.3221 LearningRate 0.0504 Epoch: 5 Global Step: 96870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:22,062-Speed 9316.64 samples/sec Loss 7.2841 LearningRate 0.0504 Epoch: 5 Global Step: 96880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:23,137-Speed 9527.58 samples/sec Loss 7.2167 LearningRate 0.0504 Epoch: 5 Global Step: 96890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:24,225-Speed 9417.55 samples/sec Loss 7.3017 LearningRate 0.0504 Epoch: 5 Global Step: 96900 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:25,315-Speed 9398.26 samples/sec Loss 7.1732 LearningRate 0.0504 Epoch: 5 Global Step: 96910 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:26,387-Speed 9560.79 samples/sec Loss 7.1389 LearningRate 0.0504 Epoch: 5 Global Step: 96920 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:27,460-Speed 9554.94 samples/sec Loss 7.2703 LearningRate 0.0504 Epoch: 5 Global Step: 96930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:28,534-Speed 9531.20 samples/sec Loss 7.1259 LearningRate 0.0504 Epoch: 5 Global Step: 96940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:29,622-Speed 9417.98 samples/sec Loss 7.1453 LearningRate 0.0503 Epoch: 5 Global Step: 96950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:30,693-Speed 9572.85 samples/sec Loss 7.0821 LearningRate 0.0503 Epoch: 5 Global Step: 96960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:31,736-Speed 9815.26 samples/sec Loss 7.2795 LearningRate 0.0503 Epoch: 5 Global Step: 96970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:32,832-Speed 9364.28 samples/sec Loss 7.0830 LearningRate 0.0503 Epoch: 5 Global Step: 96980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:33,964-Speed 9051.14 samples/sec Loss 7.2427 LearningRate 0.0503 Epoch: 5 Global Step: 96990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:35,053-Speed 9409.15 samples/sec Loss 7.2598 LearningRate 0.0503 Epoch: 5 Global Step: 97000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:36,125-Speed 9559.53 samples/sec Loss 7.3036 LearningRate 0.0503 Epoch: 5 Global Step: 97010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:37,183-Speed 9679.43 samples/sec Loss 7.1941 LearningRate 0.0503 Epoch: 5 Global Step: 97020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:38,246-Speed 9643.01 samples/sec Loss 7.2606 LearningRate 0.0503 Epoch: 5 Global Step: 97030 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:39,317-Speed 9565.50 samples/sec Loss 7.2891 LearningRate 0.0503 Epoch: 5 Global Step: 97040 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:40,434-Speed 9173.10 samples/sec Loss 7.2167 LearningRate 0.0503 Epoch: 5 Global Step: 97050 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:41,530-Speed 9345.77 samples/sec Loss 7.1342 LearningRate 0.0503 Epoch: 5 Global Step: 97060 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:42,618-Speed 9416.82 samples/sec Loss 7.2010 LearningRate 0.0503 Epoch: 5 Global Step: 97070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:43,671-Speed 9726.93 samples/sec Loss 7.1337 LearningRate 0.0503 Epoch: 5 Global Step: 97080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:44,749-Speed 9504.26 samples/sec Loss 7.1794 LearningRate 0.0503 Epoch: 5 Global Step: 97090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:45,804-Speed 9719.39 samples/sec Loss 7.1227 LearningRate 0.0503 Epoch: 5 Global Step: 97100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:46,930-Speed 9097.58 samples/sec Loss 7.1903 LearningRate 0.0503 Epoch: 5 Global Step: 97110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:48,004-Speed 9539.87 samples/sec Loss 7.2422 LearningRate 0.0503 Epoch: 5 Global Step: 97120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:49,070-Speed 9613.59 samples/sec Loss 7.2460 LearningRate 0.0503 Epoch: 5 Global Step: 97130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:50,140-Speed 9574.68 samples/sec Loss 7.3515 LearningRate 0.0503 Epoch: 5 Global Step: 97140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:51,230-Speed 9399.25 samples/sec Loss 7.2400 LearningRate 0.0503 Epoch: 5 Global Step: 97150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:52,283-Speed 9726.29 samples/sec Loss 7.2929 LearningRate 0.0503 Epoch: 5 Global Step: 97160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 15:26:53,371-Speed 9419.42 samples/sec Loss 7.3127 LearningRate 0.0503 Epoch: 5 Global Step: 97170 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-04-11 15:26:54,455-Speed 9450.82 samples/sec Loss 7.2349 LearningRate 0.0503 Epoch: 5 Global Step: 97180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:26:55,519-Speed 9627.37 samples/sec Loss 7.2452 LearningRate 0.0502 Epoch: 5 Global Step: 97190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:26:56,638-Speed 9165.71 samples/sec Loss 7.2908 LearningRate 0.0502 Epoch: 5 Global Step: 97200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:26:57,680-Speed 9833.63 samples/sec Loss 7.1538 LearningRate 0.0502 Epoch: 5 Global Step: 97210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:26:58,750-Speed 9576.39 samples/sec Loss 7.2549 LearningRate 0.0502 Epoch: 5 Global Step: 97220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:26:59,841-Speed 9390.12 samples/sec Loss 7.3053 LearningRate 0.0502 Epoch: 5 Global Step: 97230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:00,926-Speed 9438.77 samples/sec Loss 7.1610 LearningRate 0.0502 Epoch: 5 Global Step: 97240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:02,055-Speed 9073.58 samples/sec Loss 7.1817 LearningRate 0.0502 Epoch: 5 Global Step: 97250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:03,188-Speed 9049.17 samples/sec Loss 7.2178 LearningRate 0.0502 Epoch: 5 Global Step: 97260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:04,240-Speed 9733.84 samples/sec Loss 7.2154 LearningRate 0.0502 Epoch: 5 Global Step: 97270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:05,336-Speed 9354.65 samples/sec Loss 7.0629 LearningRate 0.0502 Epoch: 5 Global Step: 97280 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:27:06,415-Speed 9488.09 samples/sec Loss 7.2974 LearningRate 0.0502 Epoch: 5 Global Step: 97290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:07,544-Speed 9078.56 samples/sec Loss 7.1511 LearningRate 0.0502 Epoch: 5 Global Step: 97300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:08,602-Speed 9683.15 samples/sec Loss 7.1657 LearningRate 0.0502 Epoch: 5 Global Step: 97310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:09,718-Speed 9182.98 samples/sec Loss 7.1312 LearningRate 0.0502 Epoch: 5 Global Step: 97320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:10,761-Speed 9822.72 samples/sec Loss 7.2228 LearningRate 0.0502 Epoch: 5 Global Step: 97330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:11,869-Speed 9247.35 samples/sec Loss 7.1717 LearningRate 0.0502 Epoch: 5 Global Step: 97340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:13,000-Speed 9054.13 samples/sec Loss 7.2045 LearningRate 0.0502 Epoch: 5 Global Step: 97350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:14,114-Speed 9197.62 samples/sec Loss 7.0325 LearningRate 0.0502 Epoch: 5 Global Step: 97360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:15,220-Speed 9273.28 samples/sec Loss 7.2191 LearningRate 0.0502 Epoch: 5 Global Step: 97370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:16,331-Speed 9217.29 samples/sec Loss 7.1898 LearningRate 0.0502 Epoch: 5 Global Step: 97380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:17,420-Speed 9412.01 samples/sec Loss 7.2279 LearningRate 0.0502 Epoch: 5 Global Step: 97390 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:27:18,514-Speed 9364.35 samples/sec Loss 7.1886 LearningRate 0.0502 Epoch: 5 Global Step: 97400 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:27:19,603-Speed 9411.66 samples/sec Loss 7.1346 LearningRate 0.0502 Epoch: 5 Global Step: 97410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:20,655-Speed 9732.47 samples/sec Loss 7.1692 LearningRate 0.0501 Epoch: 5 Global Step: 97420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:21,752-Speed 9345.84 samples/sec Loss 7.1837 LearningRate 0.0501 Epoch: 5 Global Step: 97430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:22,827-Speed 9523.76 samples/sec Loss 7.1700 LearningRate 0.0501 Epoch: 5 Global Step: 97440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:23,915-Speed 9423.36 samples/sec Loss 7.2556 LearningRate 0.0501 Epoch: 5 Global Step: 97450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:25,010-Speed 9356.78 samples/sec Loss 7.1507 LearningRate 0.0501 Epoch: 5 Global Step: 97460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:26,097-Speed 9423.06 samples/sec Loss 7.0979 LearningRate 0.0501 Epoch: 5 Global Step: 97470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:27,212-Speed 9191.72 samples/sec Loss 7.2846 LearningRate 0.0501 Epoch: 5 Global Step: 97480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:28,288-Speed 9525.02 samples/sec Loss 7.2369 LearningRate 0.0501 Epoch: 5 Global Step: 97490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:29,372-Speed 9450.36 samples/sec Loss 7.1664 LearningRate 0.0501 Epoch: 5 Global Step: 97500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:30,444-Speed 9552.12 samples/sec Loss 7.1054 LearningRate 0.0501 Epoch: 5 Global Step: 97510 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:27:31,481-Speed 9881.16 samples/sec Loss 7.1505 LearningRate 0.0501 Epoch: 5 Global Step: 97520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:32,539-Speed 9687.59 samples/sec Loss 7.1081 LearningRate 0.0501 Epoch: 5 Global Step: 97530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:33,625-Speed 9433.57 samples/sec Loss 7.1958 LearningRate 0.0501 Epoch: 5 Global Step: 97540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:34,753-Speed 9082.09 samples/sec Loss 7.1827 LearningRate 0.0501 Epoch: 5 Global Step: 97550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:35,818-Speed 9628.21 samples/sec Loss 7.2700 LearningRate 0.0501 Epoch: 5 Global Step: 97560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:36,898-Speed 9484.29 samples/sec Loss 7.1665 LearningRate 0.0501 Epoch: 5 Global Step: 97570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:37,989-Speed 9393.25 samples/sec Loss 7.1497 LearningRate 0.0501 Epoch: 5 Global Step: 97580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:39,073-Speed 9452.77 samples/sec Loss 7.1935 LearningRate 0.0501 Epoch: 5 Global Step: 97590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:40,144-Speed 9571.14 samples/sec Loss 7.2459 LearningRate 0.0501 Epoch: 5 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:27:41,208-Speed 9626.65 samples/sec Loss 7.1491 LearningRate 0.0501 Epoch: 5 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:27:42,259-Speed 9747.05 samples/sec Loss 7.1793 LearningRate 0.0501 Epoch: 5 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:27:43,315-Speed 9703.18 samples/sec Loss 7.2410 LearningRate 0.0501 Epoch: 5 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:27:44,363-Speed 9769.93 samples/sec Loss 7.2333 LearningRate 0.0501 Epoch: 5 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:27:45,445-Speed 9474.67 samples/sec Loss 7.2103 LearningRate 0.0501 Epoch: 5 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:27:46,506-Speed 9663.56 samples/sec Loss 7.1485 LearningRate 0.0500 Epoch: 5 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:27:47,603-Speed 9337.26 samples/sec Loss 7.1413 LearningRate 0.0500 Epoch: 5 Global Step: 97670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:27:48,728-Speed 9106.23 samples/sec Loss 7.2206 LearningRate 0.0500 Epoch: 5 Global Step: 97680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:27:49,807-Speed 9493.98 samples/sec Loss 7.0966 LearningRate 0.0500 Epoch: 5 Global Step: 97690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:27:50,843-Speed 9892.84 samples/sec Loss 7.1590 LearningRate 0.0500 Epoch: 5 Global Step: 97700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:51,907-Speed 9631.47 samples/sec Loss 7.1917 LearningRate 0.0500 Epoch: 5 Global Step: 97710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:52,949-Speed 9829.19 samples/sec Loss 7.2038 LearningRate 0.0500 Epoch: 5 Global Step: 97720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:54,069-Speed 9142.54 samples/sec Loss 7.2131 LearningRate 0.0500 Epoch: 5 Global Step: 97730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:55,151-Speed 9473.87 samples/sec Loss 7.2299 LearningRate 0.0500 Epoch: 5 Global Step: 97740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:56,258-Speed 9264.75 samples/sec Loss 7.0421 LearningRate 0.0500 Epoch: 5 Global Step: 97750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:57,404-Speed 8938.95 samples/sec Loss 7.2244 LearningRate 0.0500 Epoch: 5 Global Step: 97760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:58,460-Speed 9705.84 samples/sec Loss 7.2462 LearningRate 0.0500 Epoch: 5 Global Step: 97770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:27:59,503-Speed 9825.19 samples/sec Loss 7.2485 LearningRate 0.0500 Epoch: 5 Global Step: 97780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:00,596-Speed 9369.90 samples/sec Loss 7.2243 LearningRate 0.0500 Epoch: 5 Global Step: 97790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:01,697-Speed 9305.62 samples/sec Loss 7.2120 LearningRate 0.0500 Epoch: 5 Global Step: 97800 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:28:02,785-Speed 9422.74 samples/sec Loss 7.2168 LearningRate 0.0500 Epoch: 5 Global Step: 97810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:03,862-Speed 9511.39 samples/sec Loss 7.2019 LearningRate 0.0500 Epoch: 5 Global Step: 97820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:04,928-Speed 9608.70 samples/sec Loss 7.2587 LearningRate 0.0500 Epoch: 5 Global Step: 97830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:06,024-Speed 9353.63 samples/sec Loss 7.2685 LearningRate 0.0500 Epoch: 5 Global Step: 97840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:07,129-Speed 9269.80 samples/sec Loss 7.1431 LearningRate 0.0500 Epoch: 5 Global Step: 97850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:08,230-Speed 9304.51 samples/sec Loss 7.2586 LearningRate 0.0500 Epoch: 5 Global Step: 97860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:09,325-Speed 9355.11 samples/sec Loss 7.2250 LearningRate 0.0500 Epoch: 5 Global Step: 97870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:10,444-Speed 9158.49 samples/sec Loss 7.1324 LearningRate 0.0500 Epoch: 5 Global Step: 97880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:11,577-Speed 9045.04 samples/sec Loss 7.0383 LearningRate 0.0500 Epoch: 5 Global Step: 97890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:12,657-Speed 9483.82 samples/sec Loss 7.2495 LearningRate 0.0499 Epoch: 5 Global Step: 97900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:13,742-Speed 9440.17 samples/sec Loss 7.1768 LearningRate 0.0499 Epoch: 5 Global Step: 97910 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:28:14,834-Speed 9390.16 samples/sec Loss 7.1275 LearningRate 0.0499 Epoch: 5 Global Step: 97920 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:28:15,940-Speed 9267.73 samples/sec Loss 7.1562 LearningRate 0.0499 Epoch: 5 Global Step: 97930 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:28:17,027-Speed 9424.93 samples/sec Loss 7.2049 LearningRate 0.0499 Epoch: 5 Global Step: 97940 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:28:18,093-Speed 9603.39 samples/sec Loss 7.0857 LearningRate 0.0499 Epoch: 5 Global Step: 97950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:19,187-Speed 9366.47 samples/sec Loss 7.1705 LearningRate 0.0499 Epoch: 5 Global Step: 97960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:20,287-Speed 9312.76 samples/sec Loss 7.1475 LearningRate 0.0499 Epoch: 5 Global Step: 97970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:21,365-Speed 9504.06 samples/sec Loss 7.1852 LearningRate 0.0499 Epoch: 5 Global Step: 97980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:22,474-Speed 9242.13 samples/sec Loss 7.2440 LearningRate 0.0499 Epoch: 5 Global Step: 97990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:23,585-Speed 9223.17 samples/sec Loss 7.1621 LearningRate 0.0499 Epoch: 5 Global Step: 98000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:28:45,655-[lfw][98000]XNorm: 11.242619 Training: 2022-04-11 15:28:45,656-[lfw][98000]Accuracy-Flip: 0.99450+-0.00299 Training: 2022-04-11 15:28:45,657-[lfw][98000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:29:11,162-[cfp_fp][98000]XNorm: 9.602968 Training: 2022-04-11 15:29:11,163-[cfp_fp][98000]Accuracy-Flip: 0.95414+-0.01179 Training: 2022-04-11 15:29:11,164-[cfp_fp][98000]Accuracy-Highest: 0.95729 Training: 2022-04-11 15:29:33,200-[agedb_30][98000]XNorm: 10.926428 Training: 2022-04-11 15:29:33,201-[agedb_30][98000]Accuracy-Flip: 0.95817+-0.00935 Training: 2022-04-11 15:29:33,201-[agedb_30][98000]Accuracy-Highest: 0.96317 Training: 2022-04-11 15:29:34,286-Speed 144.84 samples/sec Loss 7.0843 LearningRate 0.0499 Epoch: 5 Global Step: 98010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:29:35,359-Speed 9548.15 samples/sec Loss 7.1387 LearningRate 0.0499 Epoch: 5 Global Step: 98020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:29:36,422-Speed 9637.52 samples/sec Loss 7.1830 LearningRate 0.0499 Epoch: 5 Global Step: 98030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:29:37,510-Speed 9417.36 samples/sec Loss 7.1920 LearningRate 0.0499 Epoch: 5 Global Step: 98040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:29:38,612-Speed 9301.58 samples/sec Loss 7.1992 LearningRate 0.0499 Epoch: 5 Global Step: 98050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:29:39,695-Speed 9458.50 samples/sec Loss 7.0547 LearningRate 0.0499 Epoch: 5 Global Step: 98060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:29:40,788-Speed 9377.14 samples/sec Loss 7.1584 LearningRate 0.0499 Epoch: 5 Global Step: 98070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:29:41,848-Speed 9669.64 samples/sec Loss 7.2171 LearningRate 0.0499 Epoch: 5 Global Step: 98080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:29:42,944-Speed 9344.31 samples/sec Loss 7.2410 LearningRate 0.0499 Epoch: 5 Global Step: 98090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:29:44,034-Speed 9399.45 samples/sec Loss 7.1006 LearningRate 0.0499 Epoch: 5 Global Step: 98100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:29:45,119-Speed 9443.36 samples/sec Loss 7.0862 LearningRate 0.0499 Epoch: 5 Global Step: 98110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:46,197-Speed 9505.40 samples/sec Loss 7.2215 LearningRate 0.0499 Epoch: 5 Global Step: 98120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:47,308-Speed 9225.59 samples/sec Loss 7.1109 LearningRate 0.0498 Epoch: 5 Global Step: 98130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:48,399-Speed 9387.93 samples/sec Loss 7.2681 LearningRate 0.0498 Epoch: 5 Global Step: 98140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:49,487-Speed 9418.27 samples/sec Loss 7.1271 LearningRate 0.0498 Epoch: 5 Global Step: 98150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:50,575-Speed 9411.66 samples/sec Loss 7.1265 LearningRate 0.0498 Epoch: 5 Global Step: 98160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:51,656-Speed 9477.92 samples/sec Loss 7.0474 LearningRate 0.0498 Epoch: 5 Global Step: 98170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:52,689-Speed 9918.90 samples/sec Loss 7.2549 LearningRate 0.0498 Epoch: 5 Global Step: 98180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:53,770-Speed 9482.20 samples/sec Loss 7.1418 LearningRate 0.0498 Epoch: 5 Global Step: 98190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:54,817-Speed 9791.18 samples/sec Loss 7.1389 LearningRate 0.0498 Epoch: 5 Global Step: 98200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:55,913-Speed 9348.35 samples/sec Loss 7.1150 LearningRate 0.0498 Epoch: 5 Global Step: 98210 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:29:56,981-Speed 9589.99 samples/sec Loss 7.2225 LearningRate 0.0498 Epoch: 5 Global Step: 98220 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:29:58,015-Speed 9909.68 samples/sec Loss 7.1891 LearningRate 0.0498 Epoch: 5 Global Step: 98230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:29:59,123-Speed 9244.56 samples/sec Loss 7.0658 LearningRate 0.0498 Epoch: 5 Global Step: 98240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:00,262-Speed 8995.43 samples/sec Loss 7.2784 LearningRate 0.0498 Epoch: 5 Global Step: 98250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:01,308-Speed 9798.02 samples/sec Loss 7.1840 LearningRate 0.0498 Epoch: 5 Global Step: 98260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:02,390-Speed 9473.85 samples/sec Loss 7.2719 LearningRate 0.0498 Epoch: 5 Global Step: 98270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:03,480-Speed 9399.18 samples/sec Loss 7.1324 LearningRate 0.0498 Epoch: 5 Global Step: 98280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:04,576-Speed 9350.21 samples/sec Loss 7.1741 LearningRate 0.0498 Epoch: 5 Global Step: 98290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:05,708-Speed 9049.77 samples/sec Loss 7.2224 LearningRate 0.0498 Epoch: 5 Global Step: 98300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:06,800-Speed 9388.54 samples/sec Loss 7.1624 LearningRate 0.0498 Epoch: 5 Global Step: 98310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:07,876-Speed 9520.08 samples/sec Loss 7.1452 LearningRate 0.0498 Epoch: 5 Global Step: 98320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:08,952-Speed 9527.03 samples/sec Loss 7.1856 LearningRate 0.0498 Epoch: 5 Global Step: 98330 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:30:09,989-Speed 9876.36 samples/sec Loss 7.1211 LearningRate 0.0498 Epoch: 5 Global Step: 98340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:11,046-Speed 9691.30 samples/sec Loss 7.2812 LearningRate 0.0498 Epoch: 5 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:12,170-Speed 9114.05 samples/sec Loss 7.1984 LearningRate 0.0498 Epoch: 5 Global Step: 98360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:13,239-Speed 9586.21 samples/sec Loss 7.1640 LearningRate 0.0497 Epoch: 5 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:14,312-Speed 9549.29 samples/sec Loss 7.1900 LearningRate 0.0497 Epoch: 5 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:15,409-Speed 9341.19 samples/sec Loss 7.2164 LearningRate 0.0497 Epoch: 5 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:16,480-Speed 9566.15 samples/sec Loss 7.0084 LearningRate 0.0497 Epoch: 5 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:17,552-Speed 9554.24 samples/sec Loss 7.2203 LearningRate 0.0497 Epoch: 5 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:18,592-Speed 9850.39 samples/sec Loss 7.0973 LearningRate 0.0497 Epoch: 5 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:19,654-Speed 9647.03 samples/sec Loss 7.1246 LearningRate 0.0497 Epoch: 5 Global Step: 98430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:20,749-Speed 9362.25 samples/sec Loss 7.1131 LearningRate 0.0497 Epoch: 5 Global Step: 98440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:21,847-Speed 9334.44 samples/sec Loss 7.1684 LearningRate 0.0497 Epoch: 5 Global Step: 98450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:22,926-Speed 9495.30 samples/sec Loss 7.1186 LearningRate 0.0497 Epoch: 5 Global Step: 98460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:23,968-Speed 9831.94 samples/sec Loss 7.1953 LearningRate 0.0497 Epoch: 5 Global Step: 98470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:25,075-Speed 9250.95 samples/sec Loss 7.0961 LearningRate 0.0497 Epoch: 5 Global Step: 98480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:26,156-Speed 9483.54 samples/sec Loss 7.2199 LearningRate 0.0497 Epoch: 5 Global Step: 98490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:27,223-Speed 9598.37 samples/sec Loss 7.1680 LearningRate 0.0497 Epoch: 5 Global Step: 98500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:28,282-Speed 9676.38 samples/sec Loss 7.2294 LearningRate 0.0497 Epoch: 5 Global Step: 98510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:29,364-Speed 9463.50 samples/sec Loss 7.1796 LearningRate 0.0497 Epoch: 5 Global Step: 98520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:30,453-Speed 9408.62 samples/sec Loss 7.1495 LearningRate 0.0497 Epoch: 5 Global Step: 98530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:31,584-Speed 9061.78 samples/sec Loss 7.2211 LearningRate 0.0497 Epoch: 5 Global Step: 98540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:32,627-Speed 9825.92 samples/sec Loss 7.1797 LearningRate 0.0497 Epoch: 5 Global Step: 98550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:30:33,744-Speed 9171.51 samples/sec Loss 7.1974 LearningRate 0.0497 Epoch: 5 Global Step: 98560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:34,847-Speed 9289.19 samples/sec Loss 7.1880 LearningRate 0.0497 Epoch: 5 Global Step: 98570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:35,929-Speed 9469.73 samples/sec Loss 7.2616 LearningRate 0.0497 Epoch: 5 Global Step: 98580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:37,013-Speed 9458.57 samples/sec Loss 7.2973 LearningRate 0.0497 Epoch: 5 Global Step: 98590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:38,104-Speed 9394.94 samples/sec Loss 7.1204 LearningRate 0.0497 Epoch: 5 Global Step: 98600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:39,176-Speed 9555.03 samples/sec Loss 7.2634 LearningRate 0.0496 Epoch: 5 Global Step: 98610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:40,229-Speed 9729.43 samples/sec Loss 7.1121 LearningRate 0.0496 Epoch: 5 Global Step: 98620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:41,294-Speed 9620.11 samples/sec Loss 7.3123 LearningRate 0.0496 Epoch: 5 Global Step: 98630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:42,397-Speed 9291.36 samples/sec Loss 7.1080 LearningRate 0.0496 Epoch: 5 Global Step: 98640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:43,477-Speed 9483.48 samples/sec Loss 6.9837 LearningRate 0.0496 Epoch: 5 Global Step: 98650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:44,605-Speed 9089.42 samples/sec Loss 7.1595 LearningRate 0.0496 Epoch: 5 Global Step: 98660 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:30:45,696-Speed 9386.22 samples/sec Loss 7.2427 LearningRate 0.0496 Epoch: 5 Global Step: 98670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:46,786-Speed 9398.42 samples/sec Loss 7.3039 LearningRate 0.0496 Epoch: 5 Global Step: 98680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:47,867-Speed 9483.44 samples/sec Loss 7.1276 LearningRate 0.0496 Epoch: 5 Global Step: 98690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:48,973-Speed 9257.26 samples/sec Loss 7.2526 LearningRate 0.0496 Epoch: 5 Global Step: 98700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:50,032-Speed 9676.72 samples/sec Loss 7.0086 LearningRate 0.0496 Epoch: 5 Global Step: 98710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:51,121-Speed 9413.38 samples/sec Loss 7.2582 LearningRate 0.0496 Epoch: 5 Global Step: 98720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:52,261-Speed 8987.53 samples/sec Loss 7.1471 LearningRate 0.0496 Epoch: 5 Global Step: 98730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:53,396-Speed 9024.04 samples/sec Loss 7.1007 LearningRate 0.0496 Epoch: 5 Global Step: 98740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:54,482-Speed 9432.79 samples/sec Loss 7.1801 LearningRate 0.0496 Epoch: 5 Global Step: 98750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:55,568-Speed 9440.13 samples/sec Loss 7.1681 LearningRate 0.0496 Epoch: 5 Global Step: 98760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:56,661-Speed 9370.62 samples/sec Loss 7.1479 LearningRate 0.0496 Epoch: 5 Global Step: 98770 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:30:57,730-Speed 9586.39 samples/sec Loss 7.1730 LearningRate 0.0496 Epoch: 5 Global Step: 98780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:58,825-Speed 9360.03 samples/sec Loss 7.0826 LearningRate 0.0496 Epoch: 5 Global Step: 98790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:30:59,896-Speed 9566.07 samples/sec Loss 7.2082 LearningRate 0.0496 Epoch: 5 Global Step: 98800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:31:00,954-Speed 9678.75 samples/sec Loss 7.1871 LearningRate 0.0496 Epoch: 5 Global Step: 98810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:31:02,038-Speed 9461.92 samples/sec Loss 7.2206 LearningRate 0.0496 Epoch: 5 Global Step: 98820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:31:03,107-Speed 9581.26 samples/sec Loss 7.1762 LearningRate 0.0496 Epoch: 5 Global Step: 98830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:31:04,188-Speed 9480.36 samples/sec Loss 7.1960 LearningRate 0.0495 Epoch: 5 Global Step: 98840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:31:05,255-Speed 9598.43 samples/sec Loss 7.1867 LearningRate 0.0495 Epoch: 5 Global Step: 98850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:31:06,353-Speed 9330.22 samples/sec Loss 7.1797 LearningRate 0.0495 Epoch: 5 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:31:07,438-Speed 9444.65 samples/sec Loss 7.1689 LearningRate 0.0495 Epoch: 5 Global Step: 98870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:31:08,548-Speed 9234.76 samples/sec Loss 7.2660 LearningRate 0.0495 Epoch: 5 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:31:09,624-Speed 9514.48 samples/sec Loss 7.2012 LearningRate 0.0495 Epoch: 5 Global Step: 98890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:31:10,728-Speed 9288.10 samples/sec Loss 7.1435 LearningRate 0.0495 Epoch: 5 Global Step: 98900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:11,828-Speed 9308.03 samples/sec Loss 7.1368 LearningRate 0.0495 Epoch: 5 Global Step: 98910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:12,926-Speed 9333.13 samples/sec Loss 7.1583 LearningRate 0.0495 Epoch: 5 Global Step: 98920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:14,061-Speed 9024.59 samples/sec Loss 7.1144 LearningRate 0.0495 Epoch: 5 Global Step: 98930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:15,150-Speed 9409.37 samples/sec Loss 7.1878 LearningRate 0.0495 Epoch: 5 Global Step: 98940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:16,222-Speed 9558.77 samples/sec Loss 7.2765 LearningRate 0.0495 Epoch: 5 Global Step: 98950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:17,278-Speed 9708.82 samples/sec Loss 7.2308 LearningRate 0.0495 Epoch: 5 Global Step: 98960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:18,394-Speed 9177.31 samples/sec Loss 7.0797 LearningRate 0.0495 Epoch: 5 Global Step: 98970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:19,519-Speed 9108.16 samples/sec Loss 7.3203 LearningRate 0.0495 Epoch: 5 Global Step: 98980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:20,613-Speed 9369.86 samples/sec Loss 7.2466 LearningRate 0.0495 Epoch: 5 Global Step: 98990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:21,667-Speed 9715.97 samples/sec Loss 7.0735 LearningRate 0.0495 Epoch: 5 Global Step: 99000 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:31:22,761-Speed 9364.14 samples/sec Loss 7.2686 LearningRate 0.0495 Epoch: 5 Global Step: 99010 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:31:23,841-Speed 9487.53 samples/sec Loss 7.2727 LearningRate 0.0495 Epoch: 5 Global Step: 99020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:24,958-Speed 9170.58 samples/sec Loss 7.0370 LearningRate 0.0495 Epoch: 5 Global Step: 99030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:26,063-Speed 9278.24 samples/sec Loss 7.1270 LearningRate 0.0495 Epoch: 5 Global Step: 99040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:27,162-Speed 9317.93 samples/sec Loss 7.2064 LearningRate 0.0495 Epoch: 5 Global Step: 99050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:28,254-Speed 9388.45 samples/sec Loss 7.1879 LearningRate 0.0495 Epoch: 5 Global Step: 99060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:29,325-Speed 9565.90 samples/sec Loss 7.2728 LearningRate 0.0495 Epoch: 5 Global Step: 99070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:30,419-Speed 9363.36 samples/sec Loss 7.1670 LearningRate 0.0494 Epoch: 5 Global Step: 99080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:31,514-Speed 9360.56 samples/sec Loss 7.1482 LearningRate 0.0494 Epoch: 5 Global Step: 99090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:32,588-Speed 9534.71 samples/sec Loss 7.1468 LearningRate 0.0494 Epoch: 5 Global Step: 99100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:33,647-Speed 9673.93 samples/sec Loss 7.1440 LearningRate 0.0494 Epoch: 5 Global Step: 99110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:34,741-Speed 9370.78 samples/sec Loss 7.0612 LearningRate 0.0494 Epoch: 5 Global Step: 99120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:35,839-Speed 9337.24 samples/sec Loss 7.0986 LearningRate 0.0494 Epoch: 5 Global Step: 99130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:36,945-Speed 9261.22 samples/sec Loss 7.1776 LearningRate 0.0494 Epoch: 5 Global Step: 99140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:38,063-Speed 9167.43 samples/sec Loss 7.1946 LearningRate 0.0494 Epoch: 5 Global Step: 99150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:39,154-Speed 9390.18 samples/sec Loss 7.0341 LearningRate 0.0494 Epoch: 5 Global Step: 99160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:40,248-Speed 9364.56 samples/sec Loss 7.2615 LearningRate 0.0494 Epoch: 5 Global Step: 99170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:41,388-Speed 8984.36 samples/sec Loss 7.2406 LearningRate 0.0494 Epoch: 5 Global Step: 99180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:42,479-Speed 9395.61 samples/sec Loss 7.0872 LearningRate 0.0494 Epoch: 5 Global Step: 99190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:43,558-Speed 9490.23 samples/sec Loss 7.1940 LearningRate 0.0494 Epoch: 5 Global Step: 99200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:44,653-Speed 9362.68 samples/sec Loss 7.1506 LearningRate 0.0494 Epoch: 5 Global Step: 99210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:45,724-Speed 9568.06 samples/sec Loss 7.1784 LearningRate 0.0494 Epoch: 5 Global Step: 99220 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:31:46,790-Speed 9604.04 samples/sec Loss 7.1711 LearningRate 0.0494 Epoch: 5 Global Step: 99230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:47,896-Speed 9263.19 samples/sec Loss 7.0766 LearningRate 0.0494 Epoch: 5 Global Step: 99240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:48,996-Speed 9317.75 samples/sec Loss 7.1122 LearningRate 0.0494 Epoch: 5 Global Step: 99250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:50,089-Speed 9381.19 samples/sec Loss 7.2151 LearningRate 0.0494 Epoch: 5 Global Step: 99260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:51,196-Speed 9255.86 samples/sec Loss 7.1016 LearningRate 0.0494 Epoch: 5 Global Step: 99270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:52,301-Speed 9267.54 samples/sec Loss 7.1409 LearningRate 0.0494 Epoch: 5 Global Step: 99280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:53,469-Speed 8774.67 samples/sec Loss 7.2589 LearningRate 0.0494 Epoch: 5 Global Step: 99290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:54,553-Speed 9452.29 samples/sec Loss 7.1528 LearningRate 0.0494 Epoch: 5 Global Step: 99300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:55,628-Speed 9531.47 samples/sec Loss 7.0630 LearningRate 0.0494 Epoch: 5 Global Step: 99310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:56,710-Speed 9470.23 samples/sec Loss 7.1356 LearningRate 0.0493 Epoch: 5 Global Step: 99320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:31:57,797-Speed 9421.19 samples/sec Loss 7.2461 LearningRate 0.0493 Epoch: 5 Global Step: 99330 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:31:58,925-Speed 9084.08 samples/sec Loss 7.1477 LearningRate 0.0493 Epoch: 5 Global Step: 99340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:32:00,022-Speed 9337.17 samples/sec Loss 7.1434 LearningRate 0.0493 Epoch: 5 Global Step: 99350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:01,118-Speed 9353.60 samples/sec Loss 7.1724 LearningRate 0.0493 Epoch: 5 Global Step: 99360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:02,201-Speed 9465.97 samples/sec Loss 7.1509 LearningRate 0.0493 Epoch: 5 Global Step: 99370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:03,294-Speed 9376.52 samples/sec Loss 7.2702 LearningRate 0.0493 Epoch: 5 Global Step: 99380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:04,422-Speed 9078.32 samples/sec Loss 7.0814 LearningRate 0.0493 Epoch: 5 Global Step: 99390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:05,523-Speed 9310.70 samples/sec Loss 7.1887 LearningRate 0.0493 Epoch: 5 Global Step: 99400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:06,625-Speed 9289.74 samples/sec Loss 7.1149 LearningRate 0.0493 Epoch: 5 Global Step: 99410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:07,727-Speed 9303.58 samples/sec Loss 7.0609 LearningRate 0.0493 Epoch: 5 Global Step: 99420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:08,792-Speed 9623.57 samples/sec Loss 7.1522 LearningRate 0.0493 Epoch: 5 Global Step: 99430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:09,873-Speed 9473.65 samples/sec Loss 7.1623 LearningRate 0.0493 Epoch: 5 Global Step: 99440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:10,954-Speed 9478.89 samples/sec Loss 7.2333 LearningRate 0.0493 Epoch: 5 Global Step: 99450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:12,060-Speed 9266.19 samples/sec Loss 7.1453 LearningRate 0.0493 Epoch: 5 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:13,141-Speed 9482.61 samples/sec Loss 7.1095 LearningRate 0.0493 Epoch: 5 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:14,194-Speed 9725.75 samples/sec Loss 7.2565 LearningRate 0.0493 Epoch: 5 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:15,265-Speed 9567.49 samples/sec Loss 7.0223 LearningRate 0.0493 Epoch: 5 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:16,369-Speed 9278.14 samples/sec Loss 7.2566 LearningRate 0.0493 Epoch: 5 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:17,462-Speed 9375.10 samples/sec Loss 7.3551 LearningRate 0.0493 Epoch: 5 Global Step: 99510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:18,560-Speed 9335.20 samples/sec Loss 7.2019 LearningRate 0.0493 Epoch: 5 Global Step: 99520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:19,655-Speed 9352.34 samples/sec Loss 7.1748 LearningRate 0.0493 Epoch: 5 Global Step: 99530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:20,716-Speed 9654.91 samples/sec Loss 7.1640 LearningRate 0.0493 Epoch: 5 Global Step: 99540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:21,752-Speed 9900.08 samples/sec Loss 7.1392 LearningRate 0.0493 Epoch: 5 Global Step: 99550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:22,823-Speed 9559.43 samples/sec Loss 7.1114 LearningRate 0.0492 Epoch: 5 Global Step: 99560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:23,858-Speed 9897.67 samples/sec Loss 7.1829 LearningRate 0.0492 Epoch: 5 Global Step: 99570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:24,975-Speed 9175.96 samples/sec Loss 7.0953 LearningRate 0.0492 Epoch: 5 Global Step: 99580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:26,063-Speed 9415.22 samples/sec Loss 7.0941 LearningRate 0.0492 Epoch: 5 Global Step: 99590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:27,139-Speed 9525.17 samples/sec Loss 7.0166 LearningRate 0.0492 Epoch: 5 Global Step: 99600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:28,166-Speed 9970.45 samples/sec Loss 7.1227 LearningRate 0.0492 Epoch: 5 Global Step: 99610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:29,202-Speed 9890.28 samples/sec Loss 7.0066 LearningRate 0.0492 Epoch: 5 Global Step: 99620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:30,247-Speed 9805.04 samples/sec Loss 7.2607 LearningRate 0.0492 Epoch: 5 Global Step: 99630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:31,338-Speed 9404.99 samples/sec Loss 7.2163 LearningRate 0.0492 Epoch: 5 Global Step: 99640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:32,436-Speed 9329.98 samples/sec Loss 7.1134 LearningRate 0.0492 Epoch: 5 Global Step: 99650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:33,483-Speed 9779.00 samples/sec Loss 7.0900 LearningRate 0.0492 Epoch: 5 Global Step: 99660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:34,579-Speed 9352.78 samples/sec Loss 7.2089 LearningRate 0.0492 Epoch: 5 Global Step: 99670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:35,723-Speed 8953.69 samples/sec Loss 7.0691 LearningRate 0.0492 Epoch: 5 Global Step: 99680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:36,792-Speed 9582.95 samples/sec Loss 7.1981 LearningRate 0.0492 Epoch: 5 Global Step: 99690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:37,846-Speed 9723.50 samples/sec Loss 7.2326 LearningRate 0.0492 Epoch: 5 Global Step: 99700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:38,895-Speed 9768.98 samples/sec Loss 7.2606 LearningRate 0.0492 Epoch: 5 Global Step: 99710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:39,988-Speed 9381.42 samples/sec Loss 7.2275 LearningRate 0.0492 Epoch: 5 Global Step: 99720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:41,081-Speed 9374.17 samples/sec Loss 7.2022 LearningRate 0.0492 Epoch: 5 Global Step: 99730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:42,192-Speed 9215.26 samples/sec Loss 7.0780 LearningRate 0.0492 Epoch: 5 Global Step: 99740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:43,240-Speed 9780.08 samples/sec Loss 7.1420 LearningRate 0.0492 Epoch: 5 Global Step: 99750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:44,268-Speed 9965.40 samples/sec Loss 7.1065 LearningRate 0.0492 Epoch: 5 Global Step: 99760 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:32:45,388-Speed 9147.08 samples/sec Loss 7.1549 LearningRate 0.0492 Epoch: 5 Global Step: 99770 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:32:46,457-Speed 9589.24 samples/sec Loss 7.1441 LearningRate 0.0492 Epoch: 5 Global Step: 99780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:47,569-Speed 9207.95 samples/sec Loss 7.0667 LearningRate 0.0491 Epoch: 5 Global Step: 99790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:48,647-Speed 9507.57 samples/sec Loss 7.1829 LearningRate 0.0491 Epoch: 5 Global Step: 99800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:49,745-Speed 9333.71 samples/sec Loss 7.1955 LearningRate 0.0491 Epoch: 5 Global Step: 99810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:50,809-Speed 9636.33 samples/sec Loss 7.1572 LearningRate 0.0491 Epoch: 5 Global Step: 99820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:51,852-Speed 9820.86 samples/sec Loss 7.1531 LearningRate 0.0491 Epoch: 5 Global Step: 99830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:52,892-Speed 9847.27 samples/sec Loss 7.2235 LearningRate 0.0491 Epoch: 5 Global Step: 99840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:53,976-Speed 9458.36 samples/sec Loss 7.1820 LearningRate 0.0491 Epoch: 5 Global Step: 99850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:55,103-Speed 9091.72 samples/sec Loss 7.2633 LearningRate 0.0491 Epoch: 5 Global Step: 99860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:56,147-Speed 9809.59 samples/sec Loss 7.2740 LearningRate 0.0491 Epoch: 5 Global Step: 99870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:32:57,241-Speed 9367.37 samples/sec Loss 7.2660 LearningRate 0.0491 Epoch: 5 Global Step: 99880 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:32:58,331-Speed 9396.98 samples/sec Loss 7.2134 LearningRate 0.0491 Epoch: 5 Global Step: 99890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:32:59,427-Speed 9347.58 samples/sec Loss 7.0772 LearningRate 0.0491 Epoch: 5 Global Step: 99900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:33:00,503-Speed 9527.56 samples/sec Loss 7.1660 LearningRate 0.0491 Epoch: 5 Global Step: 99910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:33:01,548-Speed 9803.17 samples/sec Loss 7.1317 LearningRate 0.0491 Epoch: 5 Global Step: 99920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:33:02,621-Speed 9552.31 samples/sec Loss 7.0761 LearningRate 0.0491 Epoch: 5 Global Step: 99930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:33:03,718-Speed 9336.33 samples/sec Loss 7.0238 LearningRate 0.0491 Epoch: 5 Global Step: 99940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:33:04,775-Speed 9698.29 samples/sec Loss 7.1338 LearningRate 0.0491 Epoch: 5 Global Step: 99950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:33:05,871-Speed 9342.08 samples/sec Loss 7.0742 LearningRate 0.0491 Epoch: 5 Global Step: 99960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:33:06,958-Speed 9425.32 samples/sec Loss 7.1054 LearningRate 0.0491 Epoch: 5 Global Step: 99970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:33:08,038-Speed 9496.39 samples/sec Loss 7.1670 LearningRate 0.0491 Epoch: 5 Global Step: 99980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:33:09,129-Speed 9387.60 samples/sec Loss 7.1467 LearningRate 0.0491 Epoch: 5 Global Step: 99990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:33:10,193-Speed 9632.83 samples/sec Loss 7.1642 LearningRate 0.0491 Epoch: 5 Global Step: 100000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:33:32,597-[lfw][100000]XNorm: 11.507255 Training: 2022-04-11 15:33:32,598-[lfw][100000]Accuracy-Flip: 0.99550+-0.00289 Training: 2022-04-11 15:33:32,598-[lfw][100000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:33:58,166-[cfp_fp][100000]XNorm: 9.890199 Training: 2022-04-11 15:33:58,167-[cfp_fp][100000]Accuracy-Flip: 0.95857+-0.00852 Training: 2022-04-11 15:33:58,167-[cfp_fp][100000]Accuracy-Highest: 0.95857 Training: 2022-04-11 15:34:19,909-[agedb_30][100000]XNorm: 11.149160 Training: 2022-04-11 15:34:19,909-[agedb_30][100000]Accuracy-Flip: 0.96133+-0.00862 Training: 2022-04-11 15:34:19,910-[agedb_30][100000]Accuracy-Highest: 0.96317 Training: 2022-04-11 15:34:20,979-Speed 144.66 samples/sec Loss 7.1048 LearningRate 0.0491 Epoch: 5 Global Step: 100010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:34:22,042-Speed 9639.67 samples/sec Loss 7.1667 LearningRate 0.0491 Epoch: 5 Global Step: 100020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:34:23,132-Speed 9404.87 samples/sec Loss 7.0967 LearningRate 0.0490 Epoch: 5 Global Step: 100030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:34:24,237-Speed 9272.13 samples/sec Loss 7.1738 LearningRate 0.0490 Epoch: 5 Global Step: 100040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:34:25,294-Speed 9688.74 samples/sec Loss 7.0243 LearningRate 0.0490 Epoch: 5 Global Step: 100050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:34:26,330-Speed 9890.55 samples/sec Loss 7.2009 LearningRate 0.0490 Epoch: 5 Global Step: 100060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:34:27,399-Speed 9590.82 samples/sec Loss 7.1478 LearningRate 0.0490 Epoch: 5 Global Step: 100070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:34:28,561-Speed 8813.23 samples/sec Loss 7.1320 LearningRate 0.0490 Epoch: 5 Global Step: 100080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:34:29,625-Speed 9636.35 samples/sec Loss 7.1917 LearningRate 0.0490 Epoch: 5 Global Step: 100090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:34:30,718-Speed 9371.02 samples/sec Loss 7.0783 LearningRate 0.0490 Epoch: 5 Global Step: 100100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:34:31,783-Speed 9625.16 samples/sec Loss 7.1734 LearningRate 0.0490 Epoch: 5 Global Step: 100110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:34:32,883-Speed 9315.57 samples/sec Loss 7.1691 LearningRate 0.0490 Epoch: 5 Global Step: 100120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:34:33,962-Speed 9490.33 samples/sec Loss 7.1552 LearningRate 0.0490 Epoch: 5 Global Step: 100130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:34:35,371-Speed 7270.06 samples/sec Loss 7.1646 LearningRate 0.0490 Epoch: 5 Global Step: 100140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:02,552-Speed 376.76 samples/sec Loss 6.7753 LearningRate 0.0490 Epoch: 6 Global Step: 100150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:03,980-Speed 7178.25 samples/sec Loss 6.2424 LearningRate 0.0490 Epoch: 6 Global Step: 100160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:05,889-Speed 5364.24 samples/sec Loss 6.3282 LearningRate 0.0490 Epoch: 6 Global Step: 100170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:07,182-Speed 7925.56 samples/sec Loss 6.3607 LearningRate 0.0490 Epoch: 6 Global Step: 100180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:08,292-Speed 9239.98 samples/sec Loss 6.3298 LearningRate 0.0490 Epoch: 6 Global Step: 100190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:09,711-Speed 7220.88 samples/sec Loss 6.3390 LearningRate 0.0490 Epoch: 6 Global Step: 100200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:10,823-Speed 9209.55 samples/sec Loss 6.2855 LearningRate 0.0490 Epoch: 6 Global Step: 100210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:11,878-Speed 9711.06 samples/sec Loss 6.2769 LearningRate 0.0490 Epoch: 6 Global Step: 100220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:12,935-Speed 9693.40 samples/sec Loss 6.2828 LearningRate 0.0490 Epoch: 6 Global Step: 100230 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:35:14,032-Speed 9345.05 samples/sec Loss 6.3434 LearningRate 0.0490 Epoch: 6 Global Step: 100240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:15,135-Speed 9293.54 samples/sec Loss 6.3957 LearningRate 0.0490 Epoch: 6 Global Step: 100250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:16,244-Speed 9238.27 samples/sec Loss 6.3142 LearningRate 0.0490 Epoch: 6 Global Step: 100260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:17,338-Speed 9364.52 samples/sec Loss 6.3122 LearningRate 0.0489 Epoch: 6 Global Step: 100270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:18,441-Speed 9289.52 samples/sec Loss 6.3668 LearningRate 0.0489 Epoch: 6 Global Step: 100280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:19,938-Speed 6840.11 samples/sec Loss 6.3780 LearningRate 0.0489 Epoch: 6 Global Step: 100290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:21,262-Speed 7735.77 samples/sec Loss 6.3039 LearningRate 0.0489 Epoch: 6 Global Step: 100300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:22,526-Speed 8106.14 samples/sec Loss 6.4319 LearningRate 0.0489 Epoch: 6 Global Step: 100310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:23,817-Speed 7937.90 samples/sec Loss 6.3998 LearningRate 0.0489 Epoch: 6 Global Step: 100320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:24,900-Speed 9468.76 samples/sec Loss 6.2926 LearningRate 0.0489 Epoch: 6 Global Step: 100330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:25,948-Speed 9777.15 samples/sec Loss 6.3244 LearningRate 0.0489 Epoch: 6 Global Step: 100340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:35:27,040-Speed 9383.36 samples/sec Loss 6.4247 LearningRate 0.0489 Epoch: 6 Global Step: 100350 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:35:28,127-Speed 9423.63 samples/sec Loss 6.4300 LearningRate 0.0489 Epoch: 6 Global Step: 100360 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:35:29,236-Speed 9237.22 samples/sec Loss 6.3512 LearningRate 0.0489 Epoch: 6 Global Step: 100370 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:35:30,352-Speed 9183.83 samples/sec Loss 6.2816 LearningRate 0.0489 Epoch: 6 Global Step: 100380 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:35:31,389-Speed 9881.33 samples/sec Loss 6.3437 LearningRate 0.0489 Epoch: 6 Global Step: 100390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:32,480-Speed 9393.61 samples/sec Loss 6.3935 LearningRate 0.0489 Epoch: 6 Global Step: 100400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:33,543-Speed 9637.26 samples/sec Loss 6.2890 LearningRate 0.0489 Epoch: 6 Global Step: 100410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:34,655-Speed 9213.66 samples/sec Loss 6.4398 LearningRate 0.0489 Epoch: 6 Global Step: 100420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:35,712-Speed 9689.33 samples/sec Loss 6.3645 LearningRate 0.0489 Epoch: 6 Global Step: 100430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:36,836-Speed 9122.16 samples/sec Loss 6.4034 LearningRate 0.0489 Epoch: 6 Global Step: 100440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:38,129-Speed 7920.95 samples/sec Loss 6.3778 LearningRate 0.0489 Epoch: 6 Global Step: 100450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:39,254-Speed 9113.49 samples/sec Loss 6.3774 LearningRate 0.0489 Epoch: 6 Global Step: 100460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:40,335-Speed 9478.14 samples/sec Loss 6.4234 LearningRate 0.0489 Epoch: 6 Global Step: 100470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:41,410-Speed 9524.59 samples/sec Loss 6.3922 LearningRate 0.0489 Epoch: 6 Global Step: 100480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:42,488-Speed 9506.76 samples/sec Loss 6.4126 LearningRate 0.0489 Epoch: 6 Global Step: 100490 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:35:43,538-Speed 9758.38 samples/sec Loss 6.3613 LearningRate 0.0489 Epoch: 6 Global Step: 100500 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:35:44,573-Speed 9906.36 samples/sec Loss 6.4050 LearningRate 0.0488 Epoch: 6 Global Step: 100510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:45,925-Speed 7579.08 samples/sec Loss 6.4746 LearningRate 0.0488 Epoch: 6 Global Step: 100520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:46,996-Speed 9567.18 samples/sec Loss 6.4278 LearningRate 0.0488 Epoch: 6 Global Step: 100530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:48,111-Speed 9187.47 samples/sec Loss 6.3896 LearningRate 0.0488 Epoch: 6 Global Step: 100540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:49,149-Speed 9868.33 samples/sec Loss 6.4544 LearningRate 0.0488 Epoch: 6 Global Step: 100550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:50,212-Speed 9637.53 samples/sec Loss 6.4487 LearningRate 0.0488 Epoch: 6 Global Step: 100560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:51,269-Speed 9694.78 samples/sec Loss 6.4908 LearningRate 0.0488 Epoch: 6 Global Step: 100570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:52,333-Speed 9629.93 samples/sec Loss 6.4951 LearningRate 0.0488 Epoch: 6 Global Step: 100580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:53,372-Speed 9863.17 samples/sec Loss 6.4594 LearningRate 0.0488 Epoch: 6 Global Step: 100590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:54,424-Speed 9742.98 samples/sec Loss 6.4653 LearningRate 0.0488 Epoch: 6 Global Step: 100600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:55,537-Speed 9199.97 samples/sec Loss 6.5609 LearningRate 0.0488 Epoch: 6 Global Step: 100610 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:35:56,628-Speed 9398.89 samples/sec Loss 6.4365 LearningRate 0.0488 Epoch: 6 Global Step: 100620 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:35:57,715-Speed 9420.43 samples/sec Loss 6.4960 LearningRate 0.0488 Epoch: 6 Global Step: 100630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:58,757-Speed 9837.65 samples/sec Loss 6.4455 LearningRate 0.0488 Epoch: 6 Global Step: 100640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:35:59,806-Speed 9763.98 samples/sec Loss 6.5129 LearningRate 0.0488 Epoch: 6 Global Step: 100650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:00,891-Speed 9439.20 samples/sec Loss 6.5260 LearningRate 0.0488 Epoch: 6 Global Step: 100660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:01,995-Speed 9283.74 samples/sec Loss 6.3969 LearningRate 0.0488 Epoch: 6 Global Step: 100670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:03,091-Speed 9349.33 samples/sec Loss 6.5071 LearningRate 0.0488 Epoch: 6 Global Step: 100680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:04,170-Speed 9497.12 samples/sec Loss 6.5894 LearningRate 0.0488 Epoch: 6 Global Step: 100690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:05,213-Speed 9819.38 samples/sec Loss 6.4779 LearningRate 0.0488 Epoch: 6 Global Step: 100700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:06,251-Speed 9871.37 samples/sec Loss 6.5123 LearningRate 0.0488 Epoch: 6 Global Step: 100710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:07,338-Speed 9431.93 samples/sec Loss 6.4144 LearningRate 0.0488 Epoch: 6 Global Step: 100720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:08,416-Speed 9508.28 samples/sec Loss 6.4426 LearningRate 0.0488 Epoch: 6 Global Step: 100730 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:09,492-Speed 9522.11 samples/sec Loss 6.5083 LearningRate 0.0488 Epoch: 6 Global Step: 100740 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:10,528-Speed 9889.63 samples/sec Loss 6.4722 LearningRate 0.0487 Epoch: 6 Global Step: 100750 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:11,650-Speed 9125.85 samples/sec Loss 6.5401 LearningRate 0.0487 Epoch: 6 Global Step: 100760 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:12,762-Speed 9219.28 samples/sec Loss 6.4676 LearningRate 0.0487 Epoch: 6 Global Step: 100770 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:13,855-Speed 9368.88 samples/sec Loss 6.4805 LearningRate 0.0487 Epoch: 6 Global Step: 100780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:14,925-Speed 9577.09 samples/sec Loss 6.4548 LearningRate 0.0487 Epoch: 6 Global Step: 100790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:16,024-Speed 9325.31 samples/sec Loss 6.4793 LearningRate 0.0487 Epoch: 6 Global Step: 100800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:17,124-Speed 9312.37 samples/sec Loss 6.5249 LearningRate 0.0487 Epoch: 6 Global Step: 100810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:18,177-Speed 9731.54 samples/sec Loss 6.5479 LearningRate 0.0487 Epoch: 6 Global Step: 100820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:19,278-Speed 9309.13 samples/sec Loss 6.4979 LearningRate 0.0487 Epoch: 6 Global Step: 100830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:20,409-Speed 9056.21 samples/sec Loss 6.4764 LearningRate 0.0487 Epoch: 6 Global Step: 100840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:21,479-Speed 9580.44 samples/sec Loss 6.5081 LearningRate 0.0487 Epoch: 6 Global Step: 100850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:22,538-Speed 9672.36 samples/sec Loss 6.5621 LearningRate 0.0487 Epoch: 6 Global Step: 100860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:23,595-Speed 9688.93 samples/sec Loss 6.4956 LearningRate 0.0487 Epoch: 6 Global Step: 100870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:24,662-Speed 9604.35 samples/sec Loss 6.6318 LearningRate 0.0487 Epoch: 6 Global Step: 100880 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:25,730-Speed 9594.27 samples/sec Loss 6.5095 LearningRate 0.0487 Epoch: 6 Global Step: 100890 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:26,810-Speed 9489.08 samples/sec Loss 6.5175 LearningRate 0.0487 Epoch: 6 Global Step: 100900 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:27,852-Speed 9833.48 samples/sec Loss 6.4967 LearningRate 0.0487 Epoch: 6 Global Step: 100910 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:28,922-Speed 9580.31 samples/sec Loss 6.4652 LearningRate 0.0487 Epoch: 6 Global Step: 100920 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:29,987-Speed 9617.94 samples/sec Loss 6.4706 LearningRate 0.0487 Epoch: 6 Global Step: 100930 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:31,062-Speed 9527.20 samples/sec Loss 6.5497 LearningRate 0.0487 Epoch: 6 Global Step: 100940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:32,133-Speed 9569.79 samples/sec Loss 6.5145 LearningRate 0.0487 Epoch: 6 Global Step: 100950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:33,233-Speed 9315.67 samples/sec Loss 6.5337 LearningRate 0.0487 Epoch: 6 Global Step: 100960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:34,287-Speed 9717.54 samples/sec Loss 6.5009 LearningRate 0.0487 Epoch: 6 Global Step: 100970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:35,332-Speed 9805.03 samples/sec Loss 6.6729 LearningRate 0.0487 Epoch: 6 Global Step: 100980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:36,419-Speed 9425.52 samples/sec Loss 6.4456 LearningRate 0.0486 Epoch: 6 Global Step: 100990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:37,512-Speed 9371.80 samples/sec Loss 6.4487 LearningRate 0.0486 Epoch: 6 Global Step: 101000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:38,580-Speed 9599.16 samples/sec Loss 6.6140 LearningRate 0.0486 Epoch: 6 Global Step: 101010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:39,658-Speed 9503.75 samples/sec Loss 6.5603 LearningRate 0.0486 Epoch: 6 Global Step: 101020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:40,695-Speed 9884.83 samples/sec Loss 6.4152 LearningRate 0.0486 Epoch: 6 Global Step: 101030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:41,759-Speed 9623.54 samples/sec Loss 6.5477 LearningRate 0.0486 Epoch: 6 Global Step: 101040 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:42,836-Speed 9517.53 samples/sec Loss 6.5355 LearningRate 0.0486 Epoch: 6 Global Step: 101050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:43,928-Speed 9386.50 samples/sec Loss 6.5196 LearningRate 0.0486 Epoch: 6 Global Step: 101060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:45,037-Speed 9235.47 samples/sec Loss 6.5840 LearningRate 0.0486 Epoch: 6 Global Step: 101070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:46,120-Speed 9465.46 samples/sec Loss 6.4129 LearningRate 0.0486 Epoch: 6 Global Step: 101080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:47,220-Speed 9312.04 samples/sec Loss 6.5196 LearningRate 0.0486 Epoch: 6 Global Step: 101090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:48,363-Speed 8961.98 samples/sec Loss 6.4706 LearningRate 0.0486 Epoch: 6 Global Step: 101100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:49,430-Speed 9600.36 samples/sec Loss 6.5300 LearningRate 0.0486 Epoch: 6 Global Step: 101110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:50,474-Speed 9822.29 samples/sec Loss 6.5907 LearningRate 0.0486 Epoch: 6 Global Step: 101120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:51,516-Speed 9824.58 samples/sec Loss 6.6160 LearningRate 0.0486 Epoch: 6 Global Step: 101130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:52,632-Speed 9181.20 samples/sec Loss 6.5209 LearningRate 0.0486 Epoch: 6 Global Step: 101140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:53,728-Speed 9355.44 samples/sec Loss 6.5055 LearningRate 0.0486 Epoch: 6 Global Step: 101150 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:54,788-Speed 9657.50 samples/sec Loss 6.6357 LearningRate 0.0486 Epoch: 6 Global Step: 101160 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:55,881-Speed 9373.17 samples/sec Loss 6.4682 LearningRate 0.0486 Epoch: 6 Global Step: 101170 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:56,989-Speed 9255.26 samples/sec Loss 6.5654 LearningRate 0.0486 Epoch: 6 Global Step: 101180 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:36:58,047-Speed 9682.61 samples/sec Loss 6.4819 LearningRate 0.0486 Epoch: 6 Global Step: 101190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:36:59,117-Speed 9577.50 samples/sec Loss 6.5545 LearningRate 0.0486 Epoch: 6 Global Step: 101200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:00,225-Speed 9246.77 samples/sec Loss 6.4913 LearningRate 0.0486 Epoch: 6 Global Step: 101210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:01,355-Speed 9068.19 samples/sec Loss 6.6145 LearningRate 0.0486 Epoch: 6 Global Step: 101220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:02,413-Speed 9681.68 samples/sec Loss 6.5433 LearningRate 0.0485 Epoch: 6 Global Step: 101230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:03,485-Speed 9558.40 samples/sec Loss 6.6456 LearningRate 0.0485 Epoch: 6 Global Step: 101240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:04,573-Speed 9423.50 samples/sec Loss 6.6238 LearningRate 0.0485 Epoch: 6 Global Step: 101250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:05,628-Speed 9711.49 samples/sec Loss 6.5620 LearningRate 0.0485 Epoch: 6 Global Step: 101260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:06,679-Speed 9748.44 samples/sec Loss 6.5409 LearningRate 0.0485 Epoch: 6 Global Step: 101270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:07,763-Speed 9448.88 samples/sec Loss 6.5808 LearningRate 0.0485 Epoch: 6 Global Step: 101280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:08,811-Speed 9782.03 samples/sec Loss 6.5807 LearningRate 0.0485 Epoch: 6 Global Step: 101290 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:37:09,905-Speed 9359.97 samples/sec Loss 6.5752 LearningRate 0.0485 Epoch: 6 Global Step: 101300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:11,007-Speed 9297.69 samples/sec Loss 6.4794 LearningRate 0.0485 Epoch: 6 Global Step: 101310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:12,131-Speed 9119.99 samples/sec Loss 6.5959 LearningRate 0.0485 Epoch: 6 Global Step: 101320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:13,158-Speed 9972.61 samples/sec Loss 6.5398 LearningRate 0.0485 Epoch: 6 Global Step: 101330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:14,229-Speed 9569.92 samples/sec Loss 6.6226 LearningRate 0.0485 Epoch: 6 Global Step: 101340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:15,322-Speed 9378.78 samples/sec Loss 6.5804 LearningRate 0.0485 Epoch: 6 Global Step: 101350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:16,409-Speed 9425.86 samples/sec Loss 6.7295 LearningRate 0.0485 Epoch: 6 Global Step: 101360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:17,473-Speed 9632.17 samples/sec Loss 6.6135 LearningRate 0.0485 Epoch: 6 Global Step: 101370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:18,521-Speed 9776.18 samples/sec Loss 6.5503 LearningRate 0.0485 Epoch: 6 Global Step: 101380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:19,593-Speed 9555.06 samples/sec Loss 6.6333 LearningRate 0.0485 Epoch: 6 Global Step: 101390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:20,670-Speed 9509.91 samples/sec Loss 6.6415 LearningRate 0.0485 Epoch: 6 Global Step: 101400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:21,779-Speed 9242.09 samples/sec Loss 6.6337 LearningRate 0.0485 Epoch: 6 Global Step: 101410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:22,850-Speed 9567.84 samples/sec Loss 6.6279 LearningRate 0.0485 Epoch: 6 Global Step: 101420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:23,926-Speed 9518.06 samples/sec Loss 6.4970 LearningRate 0.0485 Epoch: 6 Global Step: 101430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:25,028-Speed 9298.01 samples/sec Loss 6.5496 LearningRate 0.0485 Epoch: 6 Global Step: 101440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:26,122-Speed 9369.87 samples/sec Loss 6.5622 LearningRate 0.0485 Epoch: 6 Global Step: 101450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:27,177-Speed 9714.34 samples/sec Loss 6.6251 LearningRate 0.0485 Epoch: 6 Global Step: 101460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:28,257-Speed 9483.62 samples/sec Loss 6.6904 LearningRate 0.0484 Epoch: 6 Global Step: 101470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:29,339-Speed 9468.91 samples/sec Loss 6.5904 LearningRate 0.0484 Epoch: 6 Global Step: 101480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:30,459-Speed 9145.28 samples/sec Loss 6.6516 LearningRate 0.0484 Epoch: 6 Global Step: 101490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:31,538-Speed 9500.48 samples/sec Loss 6.5304 LearningRate 0.0484 Epoch: 6 Global Step: 101500 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:37:32,607-Speed 9576.99 samples/sec Loss 6.6002 LearningRate 0.0484 Epoch: 6 Global Step: 101510 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:37:33,678-Speed 9567.17 samples/sec Loss 6.6424 LearningRate 0.0484 Epoch: 6 Global Step: 101520 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:37:34,765-Speed 9424.99 samples/sec Loss 6.5580 LearningRate 0.0484 Epoch: 6 Global Step: 101530 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:37:35,821-Speed 9707.09 samples/sec Loss 6.4866 LearningRate 0.0484 Epoch: 6 Global Step: 101540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:36,914-Speed 9380.89 samples/sec Loss 6.6339 LearningRate 0.0484 Epoch: 6 Global Step: 101550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:38,032-Speed 9163.63 samples/sec Loss 6.6213 LearningRate 0.0484 Epoch: 6 Global Step: 101560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:39,112-Speed 9486.21 samples/sec Loss 6.6471 LearningRate 0.0484 Epoch: 6 Global Step: 101570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:40,179-Speed 9601.06 samples/sec Loss 6.5920 LearningRate 0.0484 Epoch: 6 Global Step: 101580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:41,231-Speed 9737.91 samples/sec Loss 6.6761 LearningRate 0.0484 Epoch: 6 Global Step: 101590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:42,307-Speed 9526.54 samples/sec Loss 6.6535 LearningRate 0.0484 Epoch: 6 Global Step: 101600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:43,436-Speed 9076.30 samples/sec Loss 6.6055 LearningRate 0.0484 Epoch: 6 Global Step: 101610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:44,467-Speed 9939.14 samples/sec Loss 6.5502 LearningRate 0.0484 Epoch: 6 Global Step: 101620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:45,539-Speed 9555.76 samples/sec Loss 6.6003 LearningRate 0.0484 Epoch: 6 Global Step: 101630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:46,643-Speed 9279.69 samples/sec Loss 6.6016 LearningRate 0.0484 Epoch: 6 Global Step: 101640 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:37:47,761-Speed 9167.45 samples/sec Loss 6.6613 LearningRate 0.0484 Epoch: 6 Global Step: 101650 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:37:48,877-Speed 9175.06 samples/sec Loss 6.5654 LearningRate 0.0484 Epoch: 6 Global Step: 101660 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:37:49,963-Speed 9438.90 samples/sec Loss 6.6255 LearningRate 0.0484 Epoch: 6 Global Step: 101670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:51,061-Speed 9329.70 samples/sec Loss 6.6824 LearningRate 0.0484 Epoch: 6 Global Step: 101680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:52,147-Speed 9434.46 samples/sec Loss 6.5799 LearningRate 0.0484 Epoch: 6 Global Step: 101690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:53,200-Speed 9729.26 samples/sec Loss 6.5968 LearningRate 0.0484 Epoch: 6 Global Step: 101700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:54,274-Speed 9546.25 samples/sec Loss 6.6680 LearningRate 0.0483 Epoch: 6 Global Step: 101710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:55,358-Speed 9449.37 samples/sec Loss 6.6267 LearningRate 0.0483 Epoch: 6 Global Step: 101720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:56,483-Speed 9107.53 samples/sec Loss 6.5633 LearningRate 0.0483 Epoch: 6 Global Step: 101730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:57,570-Speed 9429.24 samples/sec Loss 6.7236 LearningRate 0.0483 Epoch: 6 Global Step: 101740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:58,616-Speed 9791.06 samples/sec Loss 6.5795 LearningRate 0.0483 Epoch: 6 Global Step: 101750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:37:59,669-Speed 9735.15 samples/sec Loss 6.6946 LearningRate 0.0483 Epoch: 6 Global Step: 101760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:00,793-Speed 9108.83 samples/sec Loss 6.6274 LearningRate 0.0483 Epoch: 6 Global Step: 101770 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:38:01,895-Speed 9302.35 samples/sec Loss 6.6020 LearningRate 0.0483 Epoch: 6 Global Step: 101780 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:38:02,972-Speed 9515.26 samples/sec Loss 6.6383 LearningRate 0.0483 Epoch: 6 Global Step: 101790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:04,036-Speed 9624.70 samples/sec Loss 6.6594 LearningRate 0.0483 Epoch: 6 Global Step: 101800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:05,150-Speed 9194.98 samples/sec Loss 6.6773 LearningRate 0.0483 Epoch: 6 Global Step: 101810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:06,251-Speed 9309.84 samples/sec Loss 6.6852 LearningRate 0.0483 Epoch: 6 Global Step: 101820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:07,350-Speed 9325.85 samples/sec Loss 6.5721 LearningRate 0.0483 Epoch: 6 Global Step: 101830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:08,411-Speed 9653.72 samples/sec Loss 6.6321 LearningRate 0.0483 Epoch: 6 Global Step: 101840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:09,496-Speed 9442.28 samples/sec Loss 6.5596 LearningRate 0.0483 Epoch: 6 Global Step: 101850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:10,566-Speed 9576.17 samples/sec Loss 6.5883 LearningRate 0.0483 Epoch: 6 Global Step: 101860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:11,686-Speed 9154.61 samples/sec Loss 6.5801 LearningRate 0.0483 Epoch: 6 Global Step: 101870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:12,778-Speed 9383.16 samples/sec Loss 6.6351 LearningRate 0.0483 Epoch: 6 Global Step: 101880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:13,854-Speed 9518.82 samples/sec Loss 6.7735 LearningRate 0.0483 Epoch: 6 Global Step: 101890 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:38:14,973-Speed 9160.46 samples/sec Loss 6.6537 LearningRate 0.0483 Epoch: 6 Global Step: 101900 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:38:16,070-Speed 9341.73 samples/sec Loss 6.7815 LearningRate 0.0483 Epoch: 6 Global Step: 101910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:17,166-Speed 9344.60 samples/sec Loss 6.7402 LearningRate 0.0483 Epoch: 6 Global Step: 101920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:18,277-Speed 9226.21 samples/sec Loss 6.7623 LearningRate 0.0483 Epoch: 6 Global Step: 101930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:19,359-Speed 9463.21 samples/sec Loss 6.5822 LearningRate 0.0483 Epoch: 6 Global Step: 101940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:20,466-Speed 9259.29 samples/sec Loss 6.6126 LearningRate 0.0482 Epoch: 6 Global Step: 101950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:21,554-Speed 9415.52 samples/sec Loss 6.7340 LearningRate 0.0482 Epoch: 6 Global Step: 101960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:22,683-Speed 9068.41 samples/sec Loss 6.6381 LearningRate 0.0482 Epoch: 6 Global Step: 101970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:23,782-Speed 9329.41 samples/sec Loss 6.5873 LearningRate 0.0482 Epoch: 6 Global Step: 101980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:24,877-Speed 9353.97 samples/sec Loss 6.6644 LearningRate 0.0482 Epoch: 6 Global Step: 101990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:26,000-Speed 9129.17 samples/sec Loss 6.7062 LearningRate 0.0482 Epoch: 6 Global Step: 102000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:38:48,138-[lfw][102000]XNorm: 11.438304 Training: 2022-04-11 15:38:48,139-[lfw][102000]Accuracy-Flip: 0.99550+-0.00269 Training: 2022-04-11 15:38:48,140-[lfw][102000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:39:13,728-[cfp_fp][102000]XNorm: 9.747645 Training: 2022-04-11 15:39:13,729-[cfp_fp][102000]Accuracy-Flip: 0.95600+-0.01104 Training: 2022-04-11 15:39:13,729-[cfp_fp][102000]Accuracy-Highest: 0.95857 Training: 2022-04-11 15:39:35,829-[agedb_30][102000]XNorm: 11.100109 Training: 2022-04-11 15:39:35,829-[agedb_30][102000]Accuracy-Flip: 0.96483+-0.00880 Training: 2022-04-11 15:39:35,829-[agedb_30][102000]Accuracy-Highest: 0.96483 Training: 2022-04-11 15:39:36,888-Speed 144.45 samples/sec Loss 6.6428 LearningRate 0.0482 Epoch: 6 Global Step: 102010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:37,969-Speed 9483.99 samples/sec Loss 6.6569 LearningRate 0.0482 Epoch: 6 Global Step: 102020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:39,068-Speed 9317.22 samples/sec Loss 6.6365 LearningRate 0.0482 Epoch: 6 Global Step: 102030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:40,148-Speed 9489.25 samples/sec Loss 6.7594 LearningRate 0.0482 Epoch: 6 Global Step: 102040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:41,221-Speed 9547.13 samples/sec Loss 6.7113 LearningRate 0.0482 Epoch: 6 Global Step: 102050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:42,285-Speed 9635.72 samples/sec Loss 6.7044 LearningRate 0.0482 Epoch: 6 Global Step: 102060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:43,340-Speed 9711.96 samples/sec Loss 6.6895 LearningRate 0.0482 Epoch: 6 Global Step: 102070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:44,464-Speed 9115.68 samples/sec Loss 6.6305 LearningRate 0.0482 Epoch: 6 Global Step: 102080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:45,538-Speed 9536.83 samples/sec Loss 6.7176 LearningRate 0.0482 Epoch: 6 Global Step: 102090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:46,624-Speed 9434.92 samples/sec Loss 6.6154 LearningRate 0.0482 Epoch: 6 Global Step: 102100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:47,687-Speed 9638.99 samples/sec Loss 6.5471 LearningRate 0.0482 Epoch: 6 Global Step: 102110 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:39:48,749-Speed 9646.33 samples/sec Loss 6.7426 LearningRate 0.0482 Epoch: 6 Global Step: 102120 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:39:49,822-Speed 9555.87 samples/sec Loss 6.6356 LearningRate 0.0482 Epoch: 6 Global Step: 102130 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:39:50,920-Speed 9332.79 samples/sec Loss 6.6582 LearningRate 0.0482 Epoch: 6 Global Step: 102140 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:39:51,990-Speed 9574.50 samples/sec Loss 6.6904 LearningRate 0.0482 Epoch: 6 Global Step: 102150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:39:53,060-Speed 9576.30 samples/sec Loss 6.7233 LearningRate 0.0482 Epoch: 6 Global Step: 102160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:39:54,104-Speed 9810.84 samples/sec Loss 6.6794 LearningRate 0.0482 Epoch: 6 Global Step: 102170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:39:55,167-Speed 9635.86 samples/sec Loss 6.7043 LearningRate 0.0482 Epoch: 6 Global Step: 102180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:39:56,215-Speed 9778.85 samples/sec Loss 6.6325 LearningRate 0.0481 Epoch: 6 Global Step: 102190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:39:57,267-Speed 9742.60 samples/sec Loss 6.7565 LearningRate 0.0481 Epoch: 6 Global Step: 102200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:39:58,317-Speed 9755.29 samples/sec Loss 6.7049 LearningRate 0.0481 Epoch: 6 Global Step: 102210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:39:59,393-Speed 9521.93 samples/sec Loss 6.5913 LearningRate 0.0481 Epoch: 6 Global Step: 102220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:40:00,481-Speed 9416.93 samples/sec Loss 6.7253 LearningRate 0.0481 Epoch: 6 Global Step: 102230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:40:01,561-Speed 9491.69 samples/sec Loss 6.6418 LearningRate 0.0481 Epoch: 6 Global Step: 102240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:40:02,636-Speed 9535.73 samples/sec Loss 6.6971 LearningRate 0.0481 Epoch: 6 Global Step: 102250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:40:03,698-Speed 9643.03 samples/sec Loss 6.4956 LearningRate 0.0481 Epoch: 6 Global Step: 102260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:04,755-Speed 9694.25 samples/sec Loss 6.6694 LearningRate 0.0481 Epoch: 6 Global Step: 102270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:05,840-Speed 9448.73 samples/sec Loss 6.7681 LearningRate 0.0481 Epoch: 6 Global Step: 102280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:06,919-Speed 9492.56 samples/sec Loss 6.6669 LearningRate 0.0481 Epoch: 6 Global Step: 102290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:08,002-Speed 9456.10 samples/sec Loss 6.7102 LearningRate 0.0481 Epoch: 6 Global Step: 102300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:09,050-Speed 9781.15 samples/sec Loss 6.5827 LearningRate 0.0481 Epoch: 6 Global Step: 102310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:10,126-Speed 9516.17 samples/sec Loss 6.7632 LearningRate 0.0481 Epoch: 6 Global Step: 102320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:11,226-Speed 9321.47 samples/sec Loss 6.8106 LearningRate 0.0481 Epoch: 6 Global Step: 102330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:12,304-Speed 9510.47 samples/sec Loss 6.7675 LearningRate 0.0481 Epoch: 6 Global Step: 102340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:13,389-Speed 9445.20 samples/sec Loss 6.6808 LearningRate 0.0481 Epoch: 6 Global Step: 102350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:14,492-Speed 9286.04 samples/sec Loss 6.6957 LearningRate 0.0481 Epoch: 6 Global Step: 102360 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:15,588-Speed 9357.79 samples/sec Loss 6.6908 LearningRate 0.0481 Epoch: 6 Global Step: 102370 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:16,627-Speed 9861.83 samples/sec Loss 6.7873 LearningRate 0.0481 Epoch: 6 Global Step: 102380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:17,704-Speed 9514.69 samples/sec Loss 6.6692 LearningRate 0.0481 Epoch: 6 Global Step: 102390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:18,785-Speed 9479.87 samples/sec Loss 6.6677 LearningRate 0.0481 Epoch: 6 Global Step: 102400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:19,822-Speed 9874.85 samples/sec Loss 6.6354 LearningRate 0.0481 Epoch: 6 Global Step: 102410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:20,945-Speed 9127.94 samples/sec Loss 6.6537 LearningRate 0.0481 Epoch: 6 Global Step: 102420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:22,022-Speed 9514.38 samples/sec Loss 6.6628 LearningRate 0.0480 Epoch: 6 Global Step: 102430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:23,082-Speed 9669.87 samples/sec Loss 6.8360 LearningRate 0.0480 Epoch: 6 Global Step: 102440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:24,152-Speed 9576.37 samples/sec Loss 6.6450 LearningRate 0.0480 Epoch: 6 Global Step: 102450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:25,225-Speed 9548.10 samples/sec Loss 6.7586 LearningRate 0.0480 Epoch: 6 Global Step: 102460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:26,300-Speed 9533.90 samples/sec Loss 6.7151 LearningRate 0.0480 Epoch: 6 Global Step: 102470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:27,423-Speed 9122.91 samples/sec Loss 6.7168 LearningRate 0.0480 Epoch: 6 Global Step: 102480 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:28,496-Speed 9543.73 samples/sec Loss 6.6742 LearningRate 0.0480 Epoch: 6 Global Step: 102490 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:29,584-Speed 9434.51 samples/sec Loss 6.6666 LearningRate 0.0480 Epoch: 6 Global Step: 102500 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:30,702-Speed 9162.67 samples/sec Loss 6.8178 LearningRate 0.0480 Epoch: 6 Global Step: 102510 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:31,791-Speed 9412.39 samples/sec Loss 6.6427 LearningRate 0.0480 Epoch: 6 Global Step: 102520 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:32,851-Speed 9663.51 samples/sec Loss 6.7447 LearningRate 0.0480 Epoch: 6 Global Step: 102530 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:33,936-Speed 9442.55 samples/sec Loss 6.7688 LearningRate 0.0480 Epoch: 6 Global Step: 102540 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:35,019-Speed 9459.63 samples/sec Loss 6.7407 LearningRate 0.0480 Epoch: 6 Global Step: 102550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:36,093-Speed 9537.45 samples/sec Loss 6.7314 LearningRate 0.0480 Epoch: 6 Global Step: 102560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:37,199-Speed 9274.68 samples/sec Loss 6.7051 LearningRate 0.0480 Epoch: 6 Global Step: 102570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:38,248-Speed 9763.51 samples/sec Loss 6.7368 LearningRate 0.0480 Epoch: 6 Global Step: 102580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:39,290-Speed 9831.99 samples/sec Loss 6.7351 LearningRate 0.0480 Epoch: 6 Global Step: 102590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:40,418-Speed 9082.20 samples/sec Loss 6.7309 LearningRate 0.0480 Epoch: 6 Global Step: 102600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:41,502-Speed 9459.35 samples/sec Loss 6.7070 LearningRate 0.0480 Epoch: 6 Global Step: 102610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:42,594-Speed 9380.66 samples/sec Loss 6.7494 LearningRate 0.0480 Epoch: 6 Global Step: 102620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:43,674-Speed 9485.04 samples/sec Loss 6.6804 LearningRate 0.0480 Epoch: 6 Global Step: 102630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:44,785-Speed 9222.14 samples/sec Loss 6.6840 LearningRate 0.0480 Epoch: 6 Global Step: 102640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:45,896-Speed 9219.13 samples/sec Loss 6.7474 LearningRate 0.0480 Epoch: 6 Global Step: 102650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:46,971-Speed 9536.53 samples/sec Loss 6.7671 LearningRate 0.0480 Epoch: 6 Global Step: 102660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:47,995-Speed 10014.40 samples/sec Loss 6.6487 LearningRate 0.0479 Epoch: 6 Global Step: 102670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:49,037-Speed 9832.74 samples/sec Loss 6.7948 LearningRate 0.0479 Epoch: 6 Global Step: 102680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:50,145-Speed 9242.21 samples/sec Loss 6.7775 LearningRate 0.0479 Epoch: 6 Global Step: 102690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:51,250-Speed 9274.79 samples/sec Loss 6.7850 LearningRate 0.0479 Epoch: 6 Global Step: 102700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:52,330-Speed 9486.03 samples/sec Loss 6.7728 LearningRate 0.0479 Epoch: 6 Global Step: 102710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:53,355-Speed 9992.53 samples/sec Loss 6.6145 LearningRate 0.0479 Epoch: 6 Global Step: 102720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:54,405-Speed 9764.01 samples/sec Loss 6.6811 LearningRate 0.0479 Epoch: 6 Global Step: 102730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:55,476-Speed 9562.96 samples/sec Loss 6.7356 LearningRate 0.0479 Epoch: 6 Global Step: 102740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:40:56,549-Speed 9548.59 samples/sec Loss 6.7809 LearningRate 0.0479 Epoch: 6 Global Step: 102750 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:57,624-Speed 9534.62 samples/sec Loss 6.7271 LearningRate 0.0479 Epoch: 6 Global Step: 102760 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:58,712-Speed 9416.66 samples/sec Loss 6.8789 LearningRate 0.0479 Epoch: 6 Global Step: 102770 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:40:59,781-Speed 9584.49 samples/sec Loss 6.7271 LearningRate 0.0479 Epoch: 6 Global Step: 102780 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:41:00,849-Speed 9587.81 samples/sec Loss 6.7331 LearningRate 0.0479 Epoch: 6 Global Step: 102790 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:41:01,942-Speed 9378.03 samples/sec Loss 6.6080 LearningRate 0.0479 Epoch: 6 Global Step: 102800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:03,027-Speed 9440.81 samples/sec Loss 6.6751 LearningRate 0.0479 Epoch: 6 Global Step: 102810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:04,063-Speed 9892.14 samples/sec Loss 6.7919 LearningRate 0.0479 Epoch: 6 Global Step: 102820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:05,100-Speed 9877.39 samples/sec Loss 6.6499 LearningRate 0.0479 Epoch: 6 Global Step: 102830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:06,141-Speed 9841.03 samples/sec Loss 6.7558 LearningRate 0.0479 Epoch: 6 Global Step: 102840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:07,205-Speed 9641.39 samples/sec Loss 6.7749 LearningRate 0.0479 Epoch: 6 Global Step: 102850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:08,293-Speed 9413.19 samples/sec Loss 6.6420 LearningRate 0.0479 Epoch: 6 Global Step: 102860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:09,403-Speed 9227.21 samples/sec Loss 6.7632 LearningRate 0.0479 Epoch: 6 Global Step: 102870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:10,471-Speed 9595.54 samples/sec Loss 6.7817 LearningRate 0.0479 Epoch: 6 Global Step: 102880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:11,559-Speed 9417.15 samples/sec Loss 6.7163 LearningRate 0.0479 Epoch: 6 Global Step: 102890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:12,656-Speed 9339.33 samples/sec Loss 6.8031 LearningRate 0.0479 Epoch: 6 Global Step: 102900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:13,748-Speed 9388.37 samples/sec Loss 6.7880 LearningRate 0.0478 Epoch: 6 Global Step: 102910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:14,851-Speed 9288.54 samples/sec Loss 6.6509 LearningRate 0.0478 Epoch: 6 Global Step: 102920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:15,960-Speed 9235.47 samples/sec Loss 6.5729 LearningRate 0.0478 Epoch: 6 Global Step: 102930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:17,088-Speed 9086.70 samples/sec Loss 6.7874 LearningRate 0.0478 Epoch: 6 Global Step: 102940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:18,151-Speed 9641.84 samples/sec Loss 6.7590 LearningRate 0.0478 Epoch: 6 Global Step: 102950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:19,228-Speed 9506.32 samples/sec Loss 6.7601 LearningRate 0.0478 Epoch: 6 Global Step: 102960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:20,292-Speed 9629.94 samples/sec Loss 6.8149 LearningRate 0.0478 Epoch: 6 Global Step: 102970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:21,428-Speed 9019.43 samples/sec Loss 6.7421 LearningRate 0.0478 Epoch: 6 Global Step: 102980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:22,535-Speed 9260.01 samples/sec Loss 6.7314 LearningRate 0.0478 Epoch: 6 Global Step: 102990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:23,573-Speed 9866.76 samples/sec Loss 6.6753 LearningRate 0.0478 Epoch: 6 Global Step: 103000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:24,685-Speed 9220.42 samples/sec Loss 6.8348 LearningRate 0.0478 Epoch: 6 Global Step: 103010 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:41:25,736-Speed 9758.14 samples/sec Loss 6.7694 LearningRate 0.0478 Epoch: 6 Global Step: 103020 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:41:26,891-Speed 8864.29 samples/sec Loss 6.7925 LearningRate 0.0478 Epoch: 6 Global Step: 103030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:27,981-Speed 9404.03 samples/sec Loss 6.7830 LearningRate 0.0478 Epoch: 6 Global Step: 103040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:29,075-Speed 9363.53 samples/sec Loss 6.8010 LearningRate 0.0478 Epoch: 6 Global Step: 103050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:30,147-Speed 9554.32 samples/sec Loss 6.7065 LearningRate 0.0478 Epoch: 6 Global Step: 103060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:31,217-Speed 9576.15 samples/sec Loss 6.7137 LearningRate 0.0478 Epoch: 6 Global Step: 103070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:32,276-Speed 9678.48 samples/sec Loss 6.7952 LearningRate 0.0478 Epoch: 6 Global Step: 103080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:33,307-Speed 9940.19 samples/sec Loss 6.7151 LearningRate 0.0478 Epoch: 6 Global Step: 103090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:34,353-Speed 9795.45 samples/sec Loss 6.6993 LearningRate 0.0478 Epoch: 6 Global Step: 103100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:35,434-Speed 9471.08 samples/sec Loss 6.7877 LearningRate 0.0478 Epoch: 6 Global Step: 103110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:36,530-Speed 9350.98 samples/sec Loss 6.8464 LearningRate 0.0478 Epoch: 6 Global Step: 103120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:37,679-Speed 8921.40 samples/sec Loss 6.7890 LearningRate 0.0478 Epoch: 6 Global Step: 103130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:38,736-Speed 9688.97 samples/sec Loss 6.7152 LearningRate 0.0478 Epoch: 6 Global Step: 103140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:39,853-Speed 9173.97 samples/sec Loss 6.7549 LearningRate 0.0477 Epoch: 6 Global Step: 103150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:40,919-Speed 9613.15 samples/sec Loss 6.6749 LearningRate 0.0477 Epoch: 6 Global Step: 103160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:41:42,008-Speed 9410.37 samples/sec Loss 6.7567 LearningRate 0.0477 Epoch: 6 Global Step: 103170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:43,103-Speed 9357.92 samples/sec Loss 6.8723 LearningRate 0.0477 Epoch: 6 Global Step: 103180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:44,204-Speed 9310.20 samples/sec Loss 6.7300 LearningRate 0.0477 Epoch: 6 Global Step: 103190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:45,297-Speed 9370.38 samples/sec Loss 6.8384 LearningRate 0.0477 Epoch: 6 Global Step: 103200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:46,373-Speed 9524.80 samples/sec Loss 6.7392 LearningRate 0.0477 Epoch: 6 Global Step: 103210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:47,432-Speed 9671.98 samples/sec Loss 6.8214 LearningRate 0.0477 Epoch: 6 Global Step: 103220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:48,474-Speed 9834.06 samples/sec Loss 6.7521 LearningRate 0.0477 Epoch: 6 Global Step: 103230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:49,519-Speed 9811.60 samples/sec Loss 6.7538 LearningRate 0.0477 Epoch: 6 Global Step: 103240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:50,631-Speed 9207.07 samples/sec Loss 6.7744 LearningRate 0.0477 Epoch: 6 Global Step: 103250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:51,726-Speed 9357.61 samples/sec Loss 6.7735 LearningRate 0.0477 Epoch: 6 Global Step: 103260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:52,785-Speed 9675.32 samples/sec Loss 6.8197 LearningRate 0.0477 Epoch: 6 Global Step: 103270 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:41:53,844-Speed 9675.83 samples/sec Loss 6.7398 LearningRate 0.0477 Epoch: 6 Global Step: 103280 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:41:54,930-Speed 9442.01 samples/sec Loss 6.7355 LearningRate 0.0477 Epoch: 6 Global Step: 103290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:56,046-Speed 9175.88 samples/sec Loss 6.7689 LearningRate 0.0477 Epoch: 6 Global Step: 103300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:57,132-Speed 9438.77 samples/sec Loss 6.8720 LearningRate 0.0477 Epoch: 6 Global Step: 103310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:58,226-Speed 9363.09 samples/sec Loss 6.8047 LearningRate 0.0477 Epoch: 6 Global Step: 103320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:41:59,341-Speed 9191.40 samples/sec Loss 6.7416 LearningRate 0.0477 Epoch: 6 Global Step: 103330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:00,430-Speed 9405.34 samples/sec Loss 6.7240 LearningRate 0.0477 Epoch: 6 Global Step: 103340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:01,502-Speed 9557.40 samples/sec Loss 6.7867 LearningRate 0.0477 Epoch: 6 Global Step: 103350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:02,605-Speed 9294.16 samples/sec Loss 6.7908 LearningRate 0.0477 Epoch: 6 Global Step: 103360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:03,694-Speed 9409.72 samples/sec Loss 6.6927 LearningRate 0.0477 Epoch: 6 Global Step: 103370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:04,768-Speed 9534.17 samples/sec Loss 6.8246 LearningRate 0.0477 Epoch: 6 Global Step: 103380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:05,798-Speed 9947.07 samples/sec Loss 6.8396 LearningRate 0.0476 Epoch: 6 Global Step: 103390 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:06,853-Speed 9711.59 samples/sec Loss 6.7770 LearningRate 0.0476 Epoch: 6 Global Step: 103400 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:07,913-Speed 9674.70 samples/sec Loss 6.7763 LearningRate 0.0476 Epoch: 6 Global Step: 103410 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:08,971-Speed 9680.04 samples/sec Loss 6.7624 LearningRate 0.0476 Epoch: 6 Global Step: 103420 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:10,035-Speed 9635.15 samples/sec Loss 6.7107 LearningRate 0.0476 Epoch: 6 Global Step: 103430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:11,081-Speed 9789.39 samples/sec Loss 6.7978 LearningRate 0.0476 Epoch: 6 Global Step: 103440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:42:12,125-Speed 9813.15 samples/sec Loss 6.8402 LearningRate 0.0476 Epoch: 6 Global Step: 103450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:42:13,222-Speed 9343.98 samples/sec Loss 6.8422 LearningRate 0.0476 Epoch: 6 Global Step: 103460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:42:14,304-Speed 9467.76 samples/sec Loss 6.7067 LearningRate 0.0476 Epoch: 6 Global Step: 103470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:42:15,372-Speed 9597.44 samples/sec Loss 6.7254 LearningRate 0.0476 Epoch: 6 Global Step: 103480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:42:16,413-Speed 9832.98 samples/sec Loss 6.6319 LearningRate 0.0476 Epoch: 6 Global Step: 103490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:42:17,507-Speed 9372.93 samples/sec Loss 6.6798 LearningRate 0.0476 Epoch: 6 Global Step: 103500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:42:18,540-Speed 9921.41 samples/sec Loss 6.6906 LearningRate 0.0476 Epoch: 6 Global Step: 103510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:42:19,598-Speed 9679.46 samples/sec Loss 6.7693 LearningRate 0.0476 Epoch: 6 Global Step: 103520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:42:20,695-Speed 9341.49 samples/sec Loss 6.7915 LearningRate 0.0476 Epoch: 6 Global Step: 103530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:42:21,764-Speed 9589.26 samples/sec Loss 6.7768 LearningRate 0.0476 Epoch: 6 Global Step: 103540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:22,861-Speed 9337.83 samples/sec Loss 6.7863 LearningRate 0.0476 Epoch: 6 Global Step: 103550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:23,950-Speed 9405.63 samples/sec Loss 6.8472 LearningRate 0.0476 Epoch: 6 Global Step: 103560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:25,026-Speed 9529.43 samples/sec Loss 6.9266 LearningRate 0.0476 Epoch: 6 Global Step: 103570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:26,126-Speed 9309.30 samples/sec Loss 6.9393 LearningRate 0.0476 Epoch: 6 Global Step: 103580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:27,210-Speed 9455.68 samples/sec Loss 6.7267 LearningRate 0.0476 Epoch: 6 Global Step: 103590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:28,320-Speed 9230.21 samples/sec Loss 6.8164 LearningRate 0.0476 Epoch: 6 Global Step: 103600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:29,441-Speed 9139.16 samples/sec Loss 6.7571 LearningRate 0.0476 Epoch: 6 Global Step: 103610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:30,530-Speed 9406.93 samples/sec Loss 6.7351 LearningRate 0.0476 Epoch: 6 Global Step: 103620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:31,606-Speed 9522.25 samples/sec Loss 6.8285 LearningRate 0.0475 Epoch: 6 Global Step: 103630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:32,689-Speed 9460.34 samples/sec Loss 6.7834 LearningRate 0.0475 Epoch: 6 Global Step: 103640 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:33,812-Speed 9127.96 samples/sec Loss 6.8245 LearningRate 0.0475 Epoch: 6 Global Step: 103650 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:34,891-Speed 9490.93 samples/sec Loss 6.8576 LearningRate 0.0475 Epoch: 6 Global Step: 103660 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:35,963-Speed 9555.78 samples/sec Loss 6.8551 LearningRate 0.0475 Epoch: 6 Global Step: 103670 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:37,027-Speed 9636.13 samples/sec Loss 6.8072 LearningRate 0.0475 Epoch: 6 Global Step: 103680 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:38,089-Speed 9650.28 samples/sec Loss 6.8435 LearningRate 0.0475 Epoch: 6 Global Step: 103690 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:39,197-Speed 9246.03 samples/sec Loss 6.7997 LearningRate 0.0475 Epoch: 6 Global Step: 103700 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:40,297-Speed 9316.72 samples/sec Loss 6.7905 LearningRate 0.0475 Epoch: 6 Global Step: 103710 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:41,396-Speed 9326.11 samples/sec Loss 6.8951 LearningRate 0.0475 Epoch: 6 Global Step: 103720 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:42,453-Speed 9687.64 samples/sec Loss 6.8170 LearningRate 0.0475 Epoch: 6 Global Step: 103730 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:43,510-Speed 9696.67 samples/sec Loss 6.7983 LearningRate 0.0475 Epoch: 6 Global Step: 103740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:44,585-Speed 9531.58 samples/sec Loss 6.9286 LearningRate 0.0475 Epoch: 6 Global Step: 103750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:45,714-Speed 9074.97 samples/sec Loss 6.8264 LearningRate 0.0475 Epoch: 6 Global Step: 103760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:46,814-Speed 9317.13 samples/sec Loss 6.8553 LearningRate 0.0475 Epoch: 6 Global Step: 103770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:47,838-Speed 10003.25 samples/sec Loss 6.7980 LearningRate 0.0475 Epoch: 6 Global Step: 103780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:48,924-Speed 9438.99 samples/sec Loss 6.7676 LearningRate 0.0475 Epoch: 6 Global Step: 103790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:50,045-Speed 9139.11 samples/sec Loss 6.8025 LearningRate 0.0475 Epoch: 6 Global Step: 103800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:51,107-Speed 9648.75 samples/sec Loss 6.8186 LearningRate 0.0475 Epoch: 6 Global Step: 103810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:52,154-Speed 9777.55 samples/sec Loss 6.8947 LearningRate 0.0475 Epoch: 6 Global Step: 103820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:53,259-Speed 9276.77 samples/sec Loss 6.7046 LearningRate 0.0475 Epoch: 6 Global Step: 103830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:54,342-Speed 9465.20 samples/sec Loss 6.8055 LearningRate 0.0475 Epoch: 6 Global Step: 103840 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:42:55,367-Speed 9988.07 samples/sec Loss 6.7846 LearningRate 0.0475 Epoch: 6 Global Step: 103850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:56,435-Speed 9599.52 samples/sec Loss 6.8189 LearningRate 0.0475 Epoch: 6 Global Step: 103860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:57,517-Speed 9468.35 samples/sec Loss 6.8721 LearningRate 0.0475 Epoch: 6 Global Step: 103870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:58,570-Speed 9735.08 samples/sec Loss 6.9358 LearningRate 0.0474 Epoch: 6 Global Step: 103880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:42:59,623-Speed 9731.03 samples/sec Loss 6.7626 LearningRate 0.0474 Epoch: 6 Global Step: 103890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:00,739-Speed 9175.08 samples/sec Loss 6.8872 LearningRate 0.0474 Epoch: 6 Global Step: 103900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:01,784-Speed 9808.70 samples/sec Loss 6.8177 LearningRate 0.0474 Epoch: 6 Global Step: 103910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:02,885-Speed 9302.18 samples/sec Loss 6.7314 LearningRate 0.0474 Epoch: 6 Global Step: 103920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:04,003-Speed 9166.24 samples/sec Loss 6.8024 LearningRate 0.0474 Epoch: 6 Global Step: 103930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:05,073-Speed 9573.09 samples/sec Loss 6.7523 LearningRate 0.0474 Epoch: 6 Global Step: 103940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:06,169-Speed 9353.78 samples/sec Loss 6.8983 LearningRate 0.0474 Epoch: 6 Global Step: 103950 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:43:07,276-Speed 9252.38 samples/sec Loss 6.8345 LearningRate 0.0474 Epoch: 6 Global Step: 103960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:08,376-Speed 9316.27 samples/sec Loss 6.8763 LearningRate 0.0474 Epoch: 6 Global Step: 103970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:09,482-Speed 9262.19 samples/sec Loss 6.8156 LearningRate 0.0474 Epoch: 6 Global Step: 103980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:10,574-Speed 9383.14 samples/sec Loss 6.9440 LearningRate 0.0474 Epoch: 6 Global Step: 103990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:11,638-Speed 9632.91 samples/sec Loss 6.8301 LearningRate 0.0474 Epoch: 6 Global Step: 104000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:43:33,990-[lfw][104000]XNorm: 11.199217 Training: 2022-04-11 15:43:33,991-[lfw][104000]Accuracy-Flip: 0.99617+-0.00269 Training: 2022-04-11 15:43:33,991-[lfw][104000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:43:59,870-[cfp_fp][104000]XNorm: 9.498876 Training: 2022-04-11 15:43:59,870-[cfp_fp][104000]Accuracy-Flip: 0.95700+-0.00959 Training: 2022-04-11 15:43:59,871-[cfp_fp][104000]Accuracy-Highest: 0.95857 Training: 2022-04-11 15:44:22,133-[agedb_30][104000]XNorm: 10.824177 Training: 2022-04-11 15:44:22,134-[agedb_30][104000]Accuracy-Flip: 0.96383+-0.00995 Training: 2022-04-11 15:44:22,134-[agedb_30][104000]Accuracy-Highest: 0.96483 Training: 2022-04-11 15:44:23,225-Speed 143.04 samples/sec Loss 6.9240 LearningRate 0.0474 Epoch: 6 Global Step: 104010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:24,310-Speed 9436.07 samples/sec Loss 6.7355 LearningRate 0.0474 Epoch: 6 Global Step: 104020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:25,422-Speed 9217.45 samples/sec Loss 6.7697 LearningRate 0.0474 Epoch: 6 Global Step: 104030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:26,548-Speed 9097.44 samples/sec Loss 6.8112 LearningRate 0.0474 Epoch: 6 Global Step: 104040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:27,600-Speed 9734.65 samples/sec Loss 6.8464 LearningRate 0.0474 Epoch: 6 Global Step: 104050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:28,677-Speed 9517.29 samples/sec Loss 6.8585 LearningRate 0.0474 Epoch: 6 Global Step: 104060 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:44:29,772-Speed 9358.93 samples/sec Loss 6.8592 LearningRate 0.0474 Epoch: 6 Global Step: 104070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:30,830-Speed 9679.86 samples/sec Loss 6.8474 LearningRate 0.0474 Epoch: 6 Global Step: 104080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:31,876-Speed 9795.88 samples/sec Loss 6.7884 LearningRate 0.0474 Epoch: 6 Global Step: 104090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:32,962-Speed 9431.29 samples/sec Loss 6.8124 LearningRate 0.0474 Epoch: 6 Global Step: 104100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:34,042-Speed 9495.95 samples/sec Loss 6.8388 LearningRate 0.0474 Epoch: 6 Global Step: 104110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:35,160-Speed 9165.22 samples/sec Loss 6.8529 LearningRate 0.0473 Epoch: 6 Global Step: 104120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:44:36,250-Speed 9397.46 samples/sec Loss 6.9060 LearningRate 0.0473 Epoch: 6 Global Step: 104130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:44:37,355-Speed 9273.61 samples/sec Loss 6.8562 LearningRate 0.0473 Epoch: 6 Global Step: 104140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:44:38,403-Speed 9768.72 samples/sec Loss 6.8456 LearningRate 0.0473 Epoch: 6 Global Step: 104150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:44:39,469-Speed 9617.02 samples/sec Loss 6.7935 LearningRate 0.0473 Epoch: 6 Global Step: 104160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:44:40,598-Speed 9073.91 samples/sec Loss 6.6845 LearningRate 0.0473 Epoch: 6 Global Step: 104170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:44:41,700-Speed 9294.81 samples/sec Loss 6.8292 LearningRate 0.0473 Epoch: 6 Global Step: 104180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:44:42,757-Speed 9699.62 samples/sec Loss 6.8528 LearningRate 0.0473 Epoch: 6 Global Step: 104190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:44:43,816-Speed 9677.19 samples/sec Loss 6.8091 LearningRate 0.0473 Epoch: 6 Global Step: 104200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:44:44,902-Speed 9430.85 samples/sec Loss 6.9329 LearningRate 0.0473 Epoch: 6 Global Step: 104210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:44:46,026-Speed 9119.56 samples/sec Loss 6.8128 LearningRate 0.0473 Epoch: 6 Global Step: 104220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:47,097-Speed 9560.08 samples/sec Loss 6.9614 LearningRate 0.0473 Epoch: 6 Global Step: 104230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:48,142-Speed 9805.74 samples/sec Loss 6.7955 LearningRate 0.0473 Epoch: 6 Global Step: 104240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:49,241-Speed 9327.24 samples/sec Loss 6.7545 LearningRate 0.0473 Epoch: 6 Global Step: 104250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:50,330-Speed 9409.92 samples/sec Loss 6.7622 LearningRate 0.0473 Epoch: 6 Global Step: 104260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:51,418-Speed 9412.78 samples/sec Loss 6.8664 LearningRate 0.0473 Epoch: 6 Global Step: 104270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:52,510-Speed 9388.12 samples/sec Loss 6.8287 LearningRate 0.0473 Epoch: 6 Global Step: 104280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:53,624-Speed 9196.76 samples/sec Loss 6.7677 LearningRate 0.0473 Epoch: 6 Global Step: 104290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:54,726-Speed 9297.37 samples/sec Loss 6.7228 LearningRate 0.0473 Epoch: 6 Global Step: 104300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:55,780-Speed 9725.92 samples/sec Loss 6.8678 LearningRate 0.0473 Epoch: 6 Global Step: 104310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:44:56,845-Speed 9616.30 samples/sec Loss 6.8192 LearningRate 0.0473 Epoch: 6 Global Step: 104320 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:44:57,999-Speed 8877.51 samples/sec Loss 6.9158 LearningRate 0.0473 Epoch: 6 Global Step: 104330 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:44:59,129-Speed 9067.41 samples/sec Loss 6.8182 LearningRate 0.0473 Epoch: 6 Global Step: 104340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:45:00,211-Speed 9469.45 samples/sec Loss 6.9023 LearningRate 0.0473 Epoch: 6 Global Step: 104350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:01,300-Speed 9408.43 samples/sec Loss 6.9427 LearningRate 0.0472 Epoch: 6 Global Step: 104360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:02,382-Speed 9477.98 samples/sec Loss 6.8512 LearningRate 0.0472 Epoch: 6 Global Step: 104370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:03,510-Speed 9077.24 samples/sec Loss 6.8824 LearningRate 0.0472 Epoch: 6 Global Step: 104380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:04,589-Speed 9501.42 samples/sec Loss 6.8960 LearningRate 0.0472 Epoch: 6 Global Step: 104390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:05,646-Speed 9694.02 samples/sec Loss 6.8969 LearningRate 0.0472 Epoch: 6 Global Step: 104400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:06,730-Speed 9451.78 samples/sec Loss 6.8697 LearningRate 0.0472 Epoch: 6 Global Step: 104410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:07,825-Speed 9358.70 samples/sec Loss 6.9398 LearningRate 0.0472 Epoch: 6 Global Step: 104420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:08,905-Speed 9484.21 samples/sec Loss 6.8170 LearningRate 0.0472 Epoch: 6 Global Step: 104430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:10,006-Speed 9302.20 samples/sec Loss 7.0257 LearningRate 0.0472 Epoch: 6 Global Step: 104440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:11,092-Speed 9440.62 samples/sec Loss 6.8473 LearningRate 0.0472 Epoch: 6 Global Step: 104450 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:45:12,215-Speed 9127.44 samples/sec Loss 6.7976 LearningRate 0.0472 Epoch: 6 Global Step: 104460 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:45:13,342-Speed 9087.97 samples/sec Loss 6.9312 LearningRate 0.0472 Epoch: 6 Global Step: 104470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:14,451-Speed 9237.28 samples/sec Loss 6.9118 LearningRate 0.0472 Epoch: 6 Global Step: 104480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:15,550-Speed 9328.91 samples/sec Loss 6.7221 LearningRate 0.0472 Epoch: 6 Global Step: 104490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:16,600-Speed 9757.13 samples/sec Loss 6.6521 LearningRate 0.0472 Epoch: 6 Global Step: 104500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:17,667-Speed 9599.85 samples/sec Loss 6.8312 LearningRate 0.0472 Epoch: 6 Global Step: 104510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:18,731-Speed 9629.65 samples/sec Loss 6.8137 LearningRate 0.0472 Epoch: 6 Global Step: 104520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:19,867-Speed 9021.52 samples/sec Loss 6.9197 LearningRate 0.0472 Epoch: 6 Global Step: 104530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:20,968-Speed 9305.65 samples/sec Loss 6.8351 LearningRate 0.0472 Epoch: 6 Global Step: 104540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:22,036-Speed 9590.55 samples/sec Loss 6.8494 LearningRate 0.0472 Epoch: 6 Global Step: 104550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:23,103-Speed 9606.63 samples/sec Loss 6.7745 LearningRate 0.0472 Epoch: 6 Global Step: 104560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:24,199-Speed 9347.19 samples/sec Loss 6.9114 LearningRate 0.0472 Epoch: 6 Global Step: 104570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:25,289-Speed 9403.97 samples/sec Loss 6.8293 LearningRate 0.0472 Epoch: 6 Global Step: 104580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:26,348-Speed 9673.49 samples/sec Loss 6.7360 LearningRate 0.0472 Epoch: 6 Global Step: 104590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:27,426-Speed 9500.31 samples/sec Loss 6.8777 LearningRate 0.0471 Epoch: 6 Global Step: 104600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:28,467-Speed 9847.40 samples/sec Loss 6.8639 LearningRate 0.0471 Epoch: 6 Global Step: 104610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:29,543-Speed 9513.98 samples/sec Loss 6.8611 LearningRate 0.0471 Epoch: 6 Global Step: 104620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:30,632-Speed 9417.39 samples/sec Loss 6.8167 LearningRate 0.0471 Epoch: 6 Global Step: 104630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:31,706-Speed 9536.01 samples/sec Loss 6.7649 LearningRate 0.0471 Epoch: 6 Global Step: 104640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:32,828-Speed 9129.88 samples/sec Loss 6.8664 LearningRate 0.0471 Epoch: 6 Global Step: 104650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:33,899-Speed 9568.28 samples/sec Loss 6.8979 LearningRate 0.0471 Epoch: 6 Global Step: 104660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:34,983-Speed 9460.48 samples/sec Loss 6.8146 LearningRate 0.0471 Epoch: 6 Global Step: 104670 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:45:36,034-Speed 9750.70 samples/sec Loss 6.9084 LearningRate 0.0471 Epoch: 6 Global Step: 104680 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:45:37,063-Speed 9956.87 samples/sec Loss 6.7948 LearningRate 0.0471 Epoch: 6 Global Step: 104690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:38,120-Speed 9686.59 samples/sec Loss 6.8005 LearningRate 0.0471 Epoch: 6 Global Step: 104700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:39,170-Speed 9764.26 samples/sec Loss 6.8946 LearningRate 0.0471 Epoch: 6 Global Step: 104710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:40,226-Speed 9699.83 samples/sec Loss 6.8343 LearningRate 0.0471 Epoch: 6 Global Step: 104720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:41,321-Speed 9363.41 samples/sec Loss 6.8363 LearningRate 0.0471 Epoch: 6 Global Step: 104730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:42,404-Speed 9460.63 samples/sec Loss 6.9499 LearningRate 0.0471 Epoch: 6 Global Step: 104740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:43,477-Speed 9549.00 samples/sec Loss 6.7827 LearningRate 0.0471 Epoch: 6 Global Step: 104750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:44,559-Speed 9467.08 samples/sec Loss 6.8785 LearningRate 0.0471 Epoch: 6 Global Step: 104760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:45,633-Speed 9538.57 samples/sec Loss 6.8960 LearningRate 0.0471 Epoch: 6 Global Step: 104770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:46,714-Speed 9481.74 samples/sec Loss 6.9169 LearningRate 0.0471 Epoch: 6 Global Step: 104780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:47,824-Speed 9228.50 samples/sec Loss 6.9895 LearningRate 0.0471 Epoch: 6 Global Step: 104790 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:45:48,879-Speed 9707.93 samples/sec Loss 6.7549 LearningRate 0.0471 Epoch: 6 Global Step: 104800 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:45:49,942-Speed 9643.83 samples/sec Loss 6.7524 LearningRate 0.0471 Epoch: 6 Global Step: 104810 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:45:50,997-Speed 9711.54 samples/sec Loss 6.8573 LearningRate 0.0471 Epoch: 6 Global Step: 104820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:52,044-Speed 9783.40 samples/sec Loss 6.8799 LearningRate 0.0471 Epoch: 6 Global Step: 104830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:53,068-Speed 10009.55 samples/sec Loss 6.8975 LearningRate 0.0471 Epoch: 6 Global Step: 104840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:54,123-Speed 9716.78 samples/sec Loss 6.7256 LearningRate 0.0470 Epoch: 6 Global Step: 104850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:55,216-Speed 9373.58 samples/sec Loss 6.7956 LearningRate 0.0470 Epoch: 6 Global Step: 104860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:56,290-Speed 9543.88 samples/sec Loss 6.9019 LearningRate 0.0470 Epoch: 6 Global Step: 104870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:57,355-Speed 9619.42 samples/sec Loss 6.8381 LearningRate 0.0470 Epoch: 6 Global Step: 104880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:58,418-Speed 9638.96 samples/sec Loss 6.7964 LearningRate 0.0470 Epoch: 6 Global Step: 104890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:45:59,537-Speed 9155.84 samples/sec Loss 6.8796 LearningRate 0.0470 Epoch: 6 Global Step: 104900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:00,619-Speed 9467.59 samples/sec Loss 6.7952 LearningRate 0.0470 Epoch: 6 Global Step: 104910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:01,708-Speed 9405.28 samples/sec Loss 6.9066 LearningRate 0.0470 Epoch: 6 Global Step: 104920 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:02,813-Speed 9271.66 samples/sec Loss 6.9620 LearningRate 0.0470 Epoch: 6 Global Step: 104930 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:03,925-Speed 9219.78 samples/sec Loss 6.8241 LearningRate 0.0470 Epoch: 6 Global Step: 104940 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:04,993-Speed 9594.17 samples/sec Loss 6.8614 LearningRate 0.0470 Epoch: 6 Global Step: 104950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:06,060-Speed 9597.98 samples/sec Loss 6.7891 LearningRate 0.0470 Epoch: 6 Global Step: 104960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:07,124-Speed 9637.03 samples/sec Loss 6.9506 LearningRate 0.0470 Epoch: 6 Global Step: 104970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:08,231-Speed 9252.44 samples/sec Loss 6.7641 LearningRate 0.0470 Epoch: 6 Global Step: 104980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:09,297-Speed 9605.71 samples/sec Loss 6.9203 LearningRate 0.0470 Epoch: 6 Global Step: 104990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:10,383-Speed 9436.76 samples/sec Loss 6.7854 LearningRate 0.0470 Epoch: 6 Global Step: 105000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:11,495-Speed 9219.14 samples/sec Loss 6.7799 LearningRate 0.0470 Epoch: 6 Global Step: 105010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:12,585-Speed 9399.49 samples/sec Loss 6.8693 LearningRate 0.0470 Epoch: 6 Global Step: 105020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:13,652-Speed 9606.21 samples/sec Loss 6.7474 LearningRate 0.0470 Epoch: 6 Global Step: 105030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:14,761-Speed 9241.39 samples/sec Loss 6.9150 LearningRate 0.0470 Epoch: 6 Global Step: 105040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:15,861-Speed 9311.96 samples/sec Loss 7.0376 LearningRate 0.0470 Epoch: 6 Global Step: 105050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:16,945-Speed 9450.53 samples/sec Loss 6.8472 LearningRate 0.0470 Epoch: 6 Global Step: 105060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:18,022-Speed 9515.52 samples/sec Loss 6.8901 LearningRate 0.0470 Epoch: 6 Global Step: 105070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:46:19,100-Speed 9503.46 samples/sec Loss 6.8584 LearningRate 0.0470 Epoch: 6 Global Step: 105080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:46:20,193-Speed 9377.52 samples/sec Loss 6.9077 LearningRate 0.0469 Epoch: 6 Global Step: 105090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:46:21,263-Speed 9568.19 samples/sec Loss 6.9215 LearningRate 0.0469 Epoch: 6 Global Step: 105100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:46:22,345-Speed 9474.30 samples/sec Loss 7.0051 LearningRate 0.0469 Epoch: 6 Global Step: 105110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:46:23,432-Speed 9424.88 samples/sec Loss 6.9291 LearningRate 0.0469 Epoch: 6 Global Step: 105120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:46:24,527-Speed 9359.94 samples/sec Loss 6.8483 LearningRate 0.0469 Epoch: 6 Global Step: 105130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:46:25,613-Speed 9430.01 samples/sec Loss 6.9642 LearningRate 0.0469 Epoch: 6 Global Step: 105140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:46:26,734-Speed 9146.50 samples/sec Loss 6.7915 LearningRate 0.0469 Epoch: 6 Global Step: 105150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:46:27,856-Speed 9127.10 samples/sec Loss 6.8787 LearningRate 0.0469 Epoch: 6 Global Step: 105160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:46:28,947-Speed 9391.85 samples/sec Loss 6.8345 LearningRate 0.0469 Epoch: 6 Global Step: 105170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:30,032-Speed 9445.68 samples/sec Loss 6.9377 LearningRate 0.0469 Epoch: 6 Global Step: 105180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:31,098-Speed 9610.90 samples/sec Loss 6.9226 LearningRate 0.0469 Epoch: 6 Global Step: 105190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:32,177-Speed 9495.16 samples/sec Loss 6.9391 LearningRate 0.0469 Epoch: 6 Global Step: 105200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:33,285-Speed 9251.14 samples/sec Loss 6.8311 LearningRate 0.0469 Epoch: 6 Global Step: 105210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:34,375-Speed 9398.66 samples/sec Loss 6.8178 LearningRate 0.0469 Epoch: 6 Global Step: 105220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:35,474-Speed 9321.71 samples/sec Loss 6.9530 LearningRate 0.0469 Epoch: 6 Global Step: 105230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:36,516-Speed 9835.25 samples/sec Loss 6.8243 LearningRate 0.0469 Epoch: 6 Global Step: 105240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:37,554-Speed 9875.50 samples/sec Loss 6.8313 LearningRate 0.0469 Epoch: 6 Global Step: 105250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:38,589-Speed 9895.40 samples/sec Loss 6.8419 LearningRate 0.0469 Epoch: 6 Global Step: 105260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:39,666-Speed 9511.22 samples/sec Loss 6.9056 LearningRate 0.0469 Epoch: 6 Global Step: 105270 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:40,713-Speed 9789.30 samples/sec Loss 6.9827 LearningRate 0.0469 Epoch: 6 Global Step: 105280 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:41,809-Speed 9345.48 samples/sec Loss 6.8695 LearningRate 0.0469 Epoch: 6 Global Step: 105290 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:42,908-Speed 9326.33 samples/sec Loss 6.9640 LearningRate 0.0469 Epoch: 6 Global Step: 105300 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:44,012-Speed 9283.38 samples/sec Loss 6.9096 LearningRate 0.0469 Epoch: 6 Global Step: 105310 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:45,037-Speed 9993.45 samples/sec Loss 6.7880 LearningRate 0.0469 Epoch: 6 Global Step: 105320 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:46,127-Speed 9401.97 samples/sec Loss 6.8605 LearningRate 0.0469 Epoch: 6 Global Step: 105330 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:47,193-Speed 9611.58 samples/sec Loss 6.8415 LearningRate 0.0468 Epoch: 6 Global Step: 105340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:48,301-Speed 9244.78 samples/sec Loss 6.7744 LearningRate 0.0468 Epoch: 6 Global Step: 105350 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:49,393-Speed 9379.59 samples/sec Loss 6.8163 LearningRate 0.0468 Epoch: 6 Global Step: 105360 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:50,468-Speed 9539.23 samples/sec Loss 6.8801 LearningRate 0.0468 Epoch: 6 Global Step: 105370 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:46:51,528-Speed 9662.55 samples/sec Loss 6.9143 LearningRate 0.0468 Epoch: 6 Global Step: 105380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:52,624-Speed 9354.23 samples/sec Loss 6.9459 LearningRate 0.0468 Epoch: 6 Global Step: 105390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:53,714-Speed 9400.61 samples/sec Loss 6.8932 LearningRate 0.0468 Epoch: 6 Global Step: 105400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:54,824-Speed 9230.06 samples/sec Loss 6.8807 LearningRate 0.0468 Epoch: 6 Global Step: 105410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:55,915-Speed 9389.21 samples/sec Loss 6.7844 LearningRate 0.0468 Epoch: 6 Global Step: 105420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:56,969-Speed 9719.30 samples/sec Loss 6.8813 LearningRate 0.0468 Epoch: 6 Global Step: 105430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:58,045-Speed 9526.39 samples/sec Loss 6.8213 LearningRate 0.0468 Epoch: 6 Global Step: 105440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:46:59,148-Speed 9284.64 samples/sec Loss 6.8564 LearningRate 0.0468 Epoch: 6 Global Step: 105450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:00,203-Speed 9717.15 samples/sec Loss 6.8472 LearningRate 0.0468 Epoch: 6 Global Step: 105460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:01,301-Speed 9325.35 samples/sec Loss 6.9226 LearningRate 0.0468 Epoch: 6 Global Step: 105470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:02,396-Speed 9356.58 samples/sec Loss 6.8355 LearningRate 0.0468 Epoch: 6 Global Step: 105480 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:47:03,491-Speed 9363.76 samples/sec Loss 6.9070 LearningRate 0.0468 Epoch: 6 Global Step: 105490 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:47:04,558-Speed 9662.94 samples/sec Loss 6.8892 LearningRate 0.0468 Epoch: 6 Global Step: 105500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:05,650-Speed 9381.91 samples/sec Loss 6.9279 LearningRate 0.0468 Epoch: 6 Global Step: 105510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:06,740-Speed 9401.42 samples/sec Loss 6.8939 LearningRate 0.0468 Epoch: 6 Global Step: 105520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:07,812-Speed 9556.28 samples/sec Loss 6.8392 LearningRate 0.0468 Epoch: 6 Global Step: 105530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:08,872-Speed 9672.96 samples/sec Loss 6.9101 LearningRate 0.0468 Epoch: 6 Global Step: 105540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:09,929-Speed 9691.35 samples/sec Loss 6.8361 LearningRate 0.0468 Epoch: 6 Global Step: 105550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:11,032-Speed 9290.83 samples/sec Loss 6.8482 LearningRate 0.0468 Epoch: 6 Global Step: 105560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:12,110-Speed 9507.18 samples/sec Loss 6.8015 LearningRate 0.0468 Epoch: 6 Global Step: 105570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:13,205-Speed 9351.43 samples/sec Loss 6.7878 LearningRate 0.0467 Epoch: 6 Global Step: 105580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:14,277-Speed 9559.80 samples/sec Loss 6.8817 LearningRate 0.0467 Epoch: 6 Global Step: 105590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:15,406-Speed 9075.14 samples/sec Loss 6.8196 LearningRate 0.0467 Epoch: 6 Global Step: 105600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:16,485-Speed 9490.80 samples/sec Loss 6.8786 LearningRate 0.0467 Epoch: 6 Global Step: 105610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:17,574-Speed 9410.47 samples/sec Loss 6.9375 LearningRate 0.0467 Epoch: 6 Global Step: 105620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:18,629-Speed 9709.02 samples/sec Loss 6.8388 LearningRate 0.0467 Epoch: 6 Global Step: 105630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:19,703-Speed 9548.93 samples/sec Loss 6.7723 LearningRate 0.0467 Epoch: 6 Global Step: 105640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:20,776-Speed 9548.08 samples/sec Loss 6.8223 LearningRate 0.0467 Epoch: 6 Global Step: 105650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:21,866-Speed 9394.23 samples/sec Loss 6.9528 LearningRate 0.0467 Epoch: 6 Global Step: 105660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:22,918-Speed 9742.54 samples/sec Loss 6.8889 LearningRate 0.0467 Epoch: 6 Global Step: 105670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:24,023-Speed 9279.96 samples/sec Loss 6.8597 LearningRate 0.0467 Epoch: 6 Global Step: 105680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:25,123-Speed 9310.76 samples/sec Loss 6.9508 LearningRate 0.0467 Epoch: 6 Global Step: 105690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:26,203-Speed 9483.40 samples/sec Loss 6.8067 LearningRate 0.0467 Epoch: 6 Global Step: 105700 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:47:27,262-Speed 9682.14 samples/sec Loss 6.8458 LearningRate 0.0467 Epoch: 6 Global Step: 105710 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:47:28,354-Speed 9381.32 samples/sec Loss 6.8282 LearningRate 0.0467 Epoch: 6 Global Step: 105720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:29,432-Speed 9502.11 samples/sec Loss 6.9799 LearningRate 0.0467 Epoch: 6 Global Step: 105730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:30,517-Speed 9438.33 samples/sec Loss 6.8523 LearningRate 0.0467 Epoch: 6 Global Step: 105740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:31,597-Speed 9489.09 samples/sec Loss 6.8194 LearningRate 0.0467 Epoch: 6 Global Step: 105750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:32,700-Speed 9291.12 samples/sec Loss 6.7926 LearningRate 0.0467 Epoch: 6 Global Step: 105760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:33,785-Speed 9442.72 samples/sec Loss 7.0017 LearningRate 0.0467 Epoch: 6 Global Step: 105770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:34,876-Speed 9391.31 samples/sec Loss 6.8716 LearningRate 0.0467 Epoch: 6 Global Step: 105780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:35,951-Speed 9535.25 samples/sec Loss 7.0103 LearningRate 0.0467 Epoch: 6 Global Step: 105790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:37,009-Speed 9681.27 samples/sec Loss 6.8317 LearningRate 0.0467 Epoch: 6 Global Step: 105800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:38,101-Speed 9381.53 samples/sec Loss 6.8673 LearningRate 0.0467 Epoch: 6 Global Step: 105810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:39,198-Speed 9346.42 samples/sec Loss 6.9477 LearningRate 0.0466 Epoch: 6 Global Step: 105820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:40,308-Speed 9229.26 samples/sec Loss 6.9323 LearningRate 0.0466 Epoch: 6 Global Step: 105830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:41,369-Speed 9664.81 samples/sec Loss 6.8762 LearningRate 0.0466 Epoch: 6 Global Step: 105840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:42,449-Speed 9479.73 samples/sec Loss 6.8220 LearningRate 0.0466 Epoch: 6 Global Step: 105850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:43,513-Speed 9630.49 samples/sec Loss 6.8913 LearningRate 0.0466 Epoch: 6 Global Step: 105860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:44,568-Speed 9715.42 samples/sec Loss 6.9206 LearningRate 0.0466 Epoch: 6 Global Step: 105870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:45,627-Speed 9672.99 samples/sec Loss 6.8249 LearningRate 0.0466 Epoch: 6 Global Step: 105880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:46,667-Speed 9854.34 samples/sec Loss 6.8784 LearningRate 0.0466 Epoch: 6 Global Step: 105890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:47,752-Speed 9442.47 samples/sec Loss 6.9954 LearningRate 0.0466 Epoch: 6 Global Step: 105900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:48,831-Speed 9492.88 samples/sec Loss 6.9064 LearningRate 0.0466 Epoch: 6 Global Step: 105910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:49,902-Speed 9568.45 samples/sec Loss 6.8380 LearningRate 0.0466 Epoch: 6 Global Step: 105920 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:47:50,988-Speed 9430.69 samples/sec Loss 6.9003 LearningRate 0.0466 Epoch: 6 Global Step: 105930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:52,079-Speed 9389.49 samples/sec Loss 6.9558 LearningRate 0.0466 Epoch: 6 Global Step: 105940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:53,145-Speed 9615.29 samples/sec Loss 6.8939 LearningRate 0.0466 Epoch: 6 Global Step: 105950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:54,220-Speed 9535.27 samples/sec Loss 7.0385 LearningRate 0.0466 Epoch: 6 Global Step: 105960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:55,279-Speed 9669.71 samples/sec Loss 6.9641 LearningRate 0.0466 Epoch: 6 Global Step: 105970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:56,375-Speed 9350.58 samples/sec Loss 6.8012 LearningRate 0.0466 Epoch: 6 Global Step: 105980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:57,476-Speed 9311.99 samples/sec Loss 6.9365 LearningRate 0.0466 Epoch: 6 Global Step: 105990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:47:58,558-Speed 9466.93 samples/sec Loss 6.8786 LearningRate 0.0466 Epoch: 6 Global Step: 106000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:48:20,355-[lfw][106000]XNorm: 11.342331 Training: 2022-04-11 15:48:20,356-[lfw][106000]Accuracy-Flip: 0.99617+-0.00279 Training: 2022-04-11 15:48:20,356-[lfw][106000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:48:45,547-[cfp_fp][106000]XNorm: 9.626601 Training: 2022-04-11 15:48:45,548-[cfp_fp][106000]Accuracy-Flip: 0.95614+-0.01169 Training: 2022-04-11 15:48:45,548-[cfp_fp][106000]Accuracy-Highest: 0.95857 Training: 2022-04-11 15:49:07,360-[agedb_30][106000]XNorm: 10.892514 Training: 2022-04-11 15:49:07,361-[agedb_30][106000]Accuracy-Flip: 0.96250+-0.00932 Training: 2022-04-11 15:49:07,362-[agedb_30][106000]Accuracy-Highest: 0.96483 Training: 2022-04-11 15:49:08,435-Speed 146.54 samples/sec Loss 6.9093 LearningRate 0.0466 Epoch: 6 Global Step: 106010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:09,480-Speed 9803.29 samples/sec Loss 6.8598 LearningRate 0.0466 Epoch: 6 Global Step: 106020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:10,557-Speed 9513.81 samples/sec Loss 6.8489 LearningRate 0.0466 Epoch: 6 Global Step: 106030 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:49:11,644-Speed 9432.38 samples/sec Loss 6.9421 LearningRate 0.0466 Epoch: 6 Global Step: 106040 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:49:12,699-Speed 9711.01 samples/sec Loss 6.8313 LearningRate 0.0466 Epoch: 6 Global Step: 106050 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:49:13,803-Speed 9274.54 samples/sec Loss 6.7791 LearningRate 0.0466 Epoch: 6 Global Step: 106060 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:49:14,915-Speed 9221.42 samples/sec Loss 6.9308 LearningRate 0.0465 Epoch: 6 Global Step: 106070 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:49:15,964-Speed 9767.84 samples/sec Loss 6.8382 LearningRate 0.0465 Epoch: 6 Global Step: 106080 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:49:17,020-Speed 9697.92 samples/sec Loss 6.9276 LearningRate 0.0465 Epoch: 6 Global Step: 106090 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:49:18,095-Speed 9529.51 samples/sec Loss 6.8956 LearningRate 0.0465 Epoch: 6 Global Step: 106100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:19,163-Speed 9590.79 samples/sec Loss 6.9513 LearningRate 0.0465 Epoch: 6 Global Step: 106110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:20,249-Speed 9436.34 samples/sec Loss 6.8384 LearningRate 0.0465 Epoch: 6 Global Step: 106120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:21,344-Speed 9356.87 samples/sec Loss 6.8742 LearningRate 0.0465 Epoch: 6 Global Step: 106130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:22,444-Speed 9318.34 samples/sec Loss 6.7840 LearningRate 0.0465 Epoch: 6 Global Step: 106140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:23,537-Speed 9366.90 samples/sec Loss 6.9060 LearningRate 0.0465 Epoch: 6 Global Step: 106150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:24,653-Speed 9183.14 samples/sec Loss 6.9492 LearningRate 0.0465 Epoch: 6 Global Step: 106160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:25,751-Speed 9331.42 samples/sec Loss 6.8420 LearningRate 0.0465 Epoch: 6 Global Step: 106170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:26,871-Speed 9150.16 samples/sec Loss 6.9556 LearningRate 0.0465 Epoch: 6 Global Step: 106180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:27,966-Speed 9355.28 samples/sec Loss 6.9018 LearningRate 0.0465 Epoch: 6 Global Step: 106190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:29,045-Speed 9500.86 samples/sec Loss 6.9526 LearningRate 0.0465 Epoch: 6 Global Step: 106200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:30,114-Speed 9582.52 samples/sec Loss 6.9538 LearningRate 0.0465 Epoch: 6 Global Step: 106210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:31,158-Speed 9817.07 samples/sec Loss 6.9679 LearningRate 0.0465 Epoch: 6 Global Step: 106220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:32,241-Speed 9464.03 samples/sec Loss 6.8943 LearningRate 0.0465 Epoch: 6 Global Step: 106230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:33,268-Speed 9970.07 samples/sec Loss 6.9103 LearningRate 0.0465 Epoch: 6 Global Step: 106240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:34,371-Speed 9293.38 samples/sec Loss 6.8863 LearningRate 0.0465 Epoch: 6 Global Step: 106250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:35,461-Speed 9398.33 samples/sec Loss 6.8861 LearningRate 0.0465 Epoch: 6 Global Step: 106260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:36,545-Speed 9448.01 samples/sec Loss 6.8663 LearningRate 0.0465 Epoch: 6 Global Step: 106270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:37,612-Speed 9605.27 samples/sec Loss 7.0419 LearningRate 0.0465 Epoch: 6 Global Step: 106280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:38,709-Speed 9339.29 samples/sec Loss 6.9604 LearningRate 0.0465 Epoch: 6 Global Step: 106290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:39,833-Speed 9117.85 samples/sec Loss 6.9788 LearningRate 0.0465 Epoch: 6 Global Step: 106300 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:49:40,915-Speed 9468.16 samples/sec Loss 6.9188 LearningRate 0.0464 Epoch: 6 Global Step: 106310 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:49:42,024-Speed 9241.65 samples/sec Loss 6.8250 LearningRate 0.0464 Epoch: 6 Global Step: 106320 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:49:43,107-Speed 9454.10 samples/sec Loss 6.8998 LearningRate 0.0464 Epoch: 6 Global Step: 106330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:44,173-Speed 9621.20 samples/sec Loss 6.8594 LearningRate 0.0464 Epoch: 6 Global Step: 106340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:45,219-Speed 9795.16 samples/sec Loss 6.9432 LearningRate 0.0464 Epoch: 6 Global Step: 106350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:46,271-Speed 9737.20 samples/sec Loss 6.9162 LearningRate 0.0464 Epoch: 6 Global Step: 106360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:47,352-Speed 9480.86 samples/sec Loss 6.8445 LearningRate 0.0464 Epoch: 6 Global Step: 106370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:48,423-Speed 9566.27 samples/sec Loss 6.9484 LearningRate 0.0464 Epoch: 6 Global Step: 106380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:49,464-Speed 9847.10 samples/sec Loss 6.9326 LearningRate 0.0464 Epoch: 6 Global Step: 106390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:50,490-Speed 9982.35 samples/sec Loss 6.8154 LearningRate 0.0464 Epoch: 6 Global Step: 106400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:49:51,536-Speed 9793.75 samples/sec Loss 6.8382 LearningRate 0.0464 Epoch: 6 Global Step: 106410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:49:52,628-Speed 9384.98 samples/sec Loss 6.9795 LearningRate 0.0464 Epoch: 6 Global Step: 106420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:49:53,685-Speed 9692.74 samples/sec Loss 6.9277 LearningRate 0.0464 Epoch: 6 Global Step: 106430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:49:54,777-Speed 9389.73 samples/sec Loss 6.8959 LearningRate 0.0464 Epoch: 6 Global Step: 106440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:49:55,806-Speed 9951.65 samples/sec Loss 7.0154 LearningRate 0.0464 Epoch: 6 Global Step: 106450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:49:56,865-Speed 9674.45 samples/sec Loss 6.8655 LearningRate 0.0464 Epoch: 6 Global Step: 106460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:49:57,953-Speed 9420.57 samples/sec Loss 6.8256 LearningRate 0.0464 Epoch: 6 Global Step: 106470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:49:59,016-Speed 9639.46 samples/sec Loss 6.8865 LearningRate 0.0464 Epoch: 6 Global Step: 106480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:00,116-Speed 9309.12 samples/sec Loss 6.9659 LearningRate 0.0464 Epoch: 6 Global Step: 106490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:01,178-Speed 9651.48 samples/sec Loss 6.9429 LearningRate 0.0464 Epoch: 6 Global Step: 106500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:02,268-Speed 9403.72 samples/sec Loss 6.9039 LearningRate 0.0464 Epoch: 6 Global Step: 106510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:03,341-Speed 9547.63 samples/sec Loss 6.8493 LearningRate 0.0464 Epoch: 6 Global Step: 106520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:04,457-Speed 9181.28 samples/sec Loss 6.9361 LearningRate 0.0464 Epoch: 6 Global Step: 106530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:05,537-Speed 9478.81 samples/sec Loss 7.0226 LearningRate 0.0464 Epoch: 6 Global Step: 106540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:06,641-Speed 9287.24 samples/sec Loss 6.9307 LearningRate 0.0464 Epoch: 6 Global Step: 106550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:07,683-Speed 9831.60 samples/sec Loss 6.9026 LearningRate 0.0463 Epoch: 6 Global Step: 106560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:08,755-Speed 9559.70 samples/sec Loss 6.8376 LearningRate 0.0463 Epoch: 6 Global Step: 106570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:09,845-Speed 9401.11 samples/sec Loss 6.8133 LearningRate 0.0463 Epoch: 6 Global Step: 106580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:10,957-Speed 9209.05 samples/sec Loss 6.8294 LearningRate 0.0463 Epoch: 6 Global Step: 106590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:12,017-Speed 9670.57 samples/sec Loss 6.9144 LearningRate 0.0463 Epoch: 6 Global Step: 106600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:13,059-Speed 9828.20 samples/sec Loss 6.9551 LearningRate 0.0463 Epoch: 6 Global Step: 106610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:14,114-Speed 9716.03 samples/sec Loss 6.9371 LearningRate 0.0463 Epoch: 6 Global Step: 106620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:15,211-Speed 9345.08 samples/sec Loss 6.9863 LearningRate 0.0463 Epoch: 6 Global Step: 106630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:16,300-Speed 9407.64 samples/sec Loss 6.9304 LearningRate 0.0463 Epoch: 6 Global Step: 106640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:17,357-Speed 9687.81 samples/sec Loss 6.9753 LearningRate 0.0463 Epoch: 6 Global Step: 106650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:18,489-Speed 9053.80 samples/sec Loss 6.8581 LearningRate 0.0463 Epoch: 6 Global Step: 106660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:19,530-Speed 9837.13 samples/sec Loss 6.9243 LearningRate 0.0463 Epoch: 6 Global Step: 106670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:20,613-Speed 9463.68 samples/sec Loss 6.8502 LearningRate 0.0463 Epoch: 6 Global Step: 106680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:21,650-Speed 9879.68 samples/sec Loss 6.8702 LearningRate 0.0463 Epoch: 6 Global Step: 106690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:22,776-Speed 9106.18 samples/sec Loss 6.9380 LearningRate 0.0463 Epoch: 6 Global Step: 106700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:23,907-Speed 9057.18 samples/sec Loss 6.8806 LearningRate 0.0463 Epoch: 6 Global Step: 106710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:24,988-Speed 9488.86 samples/sec Loss 6.9101 LearningRate 0.0463 Epoch: 6 Global Step: 106720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:26,079-Speed 9390.33 samples/sec Loss 6.8977 LearningRate 0.0463 Epoch: 6 Global Step: 106730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:27,159-Speed 9488.95 samples/sec Loss 6.9619 LearningRate 0.0463 Epoch: 6 Global Step: 106740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:28,254-Speed 9350.91 samples/sec Loss 7.0063 LearningRate 0.0463 Epoch: 6 Global Step: 106750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:29,329-Speed 9537.20 samples/sec Loss 6.8830 LearningRate 0.0463 Epoch: 6 Global Step: 106760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:30,407-Speed 9503.89 samples/sec Loss 6.8354 LearningRate 0.0463 Epoch: 6 Global Step: 106770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:31,483-Speed 9518.71 samples/sec Loss 6.8491 LearningRate 0.0463 Epoch: 6 Global Step: 106780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:32,552-Speed 9585.94 samples/sec Loss 6.9226 LearningRate 0.0463 Epoch: 6 Global Step: 106790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:33,658-Speed 9264.30 samples/sec Loss 6.9748 LearningRate 0.0462 Epoch: 6 Global Step: 106800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:34,717-Speed 9674.77 samples/sec Loss 6.9613 LearningRate 0.0462 Epoch: 6 Global Step: 106810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:50:35,807-Speed 9401.55 samples/sec Loss 6.7876 LearningRate 0.0462 Epoch: 6 Global Step: 106820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:36,907-Speed 9314.25 samples/sec Loss 6.9500 LearningRate 0.0462 Epoch: 6 Global Step: 106830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:37,994-Speed 9421.00 samples/sec Loss 6.8395 LearningRate 0.0462 Epoch: 6 Global Step: 106840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:39,098-Speed 9281.60 samples/sec Loss 6.8987 LearningRate 0.0462 Epoch: 6 Global Step: 106850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:40,157-Speed 9680.12 samples/sec Loss 6.7959 LearningRate 0.0462 Epoch: 6 Global Step: 106860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:41,235-Speed 9497.98 samples/sec Loss 6.8959 LearningRate 0.0462 Epoch: 6 Global Step: 106870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:42,311-Speed 9525.18 samples/sec Loss 6.8802 LearningRate 0.0462 Epoch: 6 Global Step: 106880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:43,394-Speed 9462.77 samples/sec Loss 6.9170 LearningRate 0.0462 Epoch: 6 Global Step: 106890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:44,431-Speed 9888.73 samples/sec Loss 6.8729 LearningRate 0.0462 Epoch: 6 Global Step: 106900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:45,506-Speed 9532.86 samples/sec Loss 6.9153 LearningRate 0.0462 Epoch: 6 Global Step: 106910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:46,536-Speed 9942.36 samples/sec Loss 6.8835 LearningRate 0.0462 Epoch: 6 Global Step: 106920 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:50:47,654-Speed 9167.35 samples/sec Loss 6.9002 LearningRate 0.0462 Epoch: 6 Global Step: 106930 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:50:48,732-Speed 9499.77 samples/sec Loss 6.7789 LearningRate 0.0462 Epoch: 6 Global Step: 106940 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:50:49,831-Speed 9323.47 samples/sec Loss 6.7649 LearningRate 0.0462 Epoch: 6 Global Step: 106950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:50,917-Speed 9437.08 samples/sec Loss 6.8751 LearningRate 0.0462 Epoch: 6 Global Step: 106960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:51,974-Speed 9693.34 samples/sec Loss 6.8417 LearningRate 0.0462 Epoch: 6 Global Step: 106970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:53,086-Speed 9213.49 samples/sec Loss 6.9046 LearningRate 0.0462 Epoch: 6 Global Step: 106980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:54,151-Speed 9618.88 samples/sec Loss 6.8505 LearningRate 0.0462 Epoch: 6 Global Step: 106990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:55,259-Speed 9247.48 samples/sec Loss 6.9250 LearningRate 0.0462 Epoch: 6 Global Step: 107000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:56,367-Speed 9253.34 samples/sec Loss 6.8886 LearningRate 0.0462 Epoch: 6 Global Step: 107010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:57,480-Speed 9207.41 samples/sec Loss 6.8825 LearningRate 0.0462 Epoch: 6 Global Step: 107020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:58,600-Speed 9146.00 samples/sec Loss 6.8469 LearningRate 0.0462 Epoch: 6 Global Step: 107030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:50:59,699-Speed 9320.26 samples/sec Loss 6.7819 LearningRate 0.0462 Epoch: 6 Global Step: 107040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:00,839-Speed 8989.43 samples/sec Loss 7.0548 LearningRate 0.0461 Epoch: 6 Global Step: 107050 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:51:01,932-Speed 9374.61 samples/sec Loss 6.8545 LearningRate 0.0461 Epoch: 6 Global Step: 107060 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:51:02,995-Speed 9640.57 samples/sec Loss 6.8374 LearningRate 0.0461 Epoch: 6 Global Step: 107070 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:51:04,076-Speed 9483.58 samples/sec Loss 6.9003 LearningRate 0.0461 Epoch: 6 Global Step: 107080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:05,173-Speed 9337.18 samples/sec Loss 6.9073 LearningRate 0.0461 Epoch: 6 Global Step: 107090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:06,251-Speed 9500.54 samples/sec Loss 6.9002 LearningRate 0.0461 Epoch: 6 Global Step: 107100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:07,310-Speed 9683.18 samples/sec Loss 6.8883 LearningRate 0.0461 Epoch: 6 Global Step: 107110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:08,344-Speed 9899.97 samples/sec Loss 6.9166 LearningRate 0.0461 Epoch: 6 Global Step: 107120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:09,392-Speed 9781.50 samples/sec Loss 7.0155 LearningRate 0.0461 Epoch: 6 Global Step: 107130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:10,467-Speed 9526.38 samples/sec Loss 6.9027 LearningRate 0.0461 Epoch: 6 Global Step: 107140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:11,543-Speed 9522.00 samples/sec Loss 6.8614 LearningRate 0.0461 Epoch: 6 Global Step: 107150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:12,646-Speed 9288.80 samples/sec Loss 6.9425 LearningRate 0.0461 Epoch: 6 Global Step: 107160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:13,703-Speed 9694.98 samples/sec Loss 6.9167 LearningRate 0.0461 Epoch: 6 Global Step: 107170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:14,767-Speed 9630.48 samples/sec Loss 6.9423 LearningRate 0.0461 Epoch: 6 Global Step: 107180 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:51:15,876-Speed 9239.53 samples/sec Loss 6.7966 LearningRate 0.0461 Epoch: 6 Global Step: 107190 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:51:16,979-Speed 9288.81 samples/sec Loss 7.0305 LearningRate 0.0461 Epoch: 6 Global Step: 107200 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:51:18,071-Speed 9381.36 samples/sec Loss 6.8593 LearningRate 0.0461 Epoch: 6 Global Step: 107210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:19,105-Speed 9912.21 samples/sec Loss 6.9257 LearningRate 0.0461 Epoch: 6 Global Step: 107220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:20,180-Speed 9531.90 samples/sec Loss 6.9001 LearningRate 0.0461 Epoch: 6 Global Step: 107230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:21,300-Speed 9151.52 samples/sec Loss 6.9468 LearningRate 0.0461 Epoch: 6 Global Step: 107240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:22,361-Speed 9659.42 samples/sec Loss 6.9643 LearningRate 0.0461 Epoch: 6 Global Step: 107250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:23,448-Speed 9425.24 samples/sec Loss 6.8523 LearningRate 0.0461 Epoch: 6 Global Step: 107260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:24,484-Speed 9887.52 samples/sec Loss 6.7992 LearningRate 0.0461 Epoch: 6 Global Step: 107270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:25,528-Speed 9815.24 samples/sec Loss 6.8781 LearningRate 0.0461 Epoch: 6 Global Step: 107280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:26,582-Speed 9722.58 samples/sec Loss 6.8833 LearningRate 0.0460 Epoch: 6 Global Step: 107290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:27,676-Speed 9365.33 samples/sec Loss 6.9153 LearningRate 0.0460 Epoch: 6 Global Step: 107300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:28,736-Speed 9664.46 samples/sec Loss 6.9290 LearningRate 0.0460 Epoch: 6 Global Step: 107310 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:51:29,801-Speed 9618.23 samples/sec Loss 6.9796 LearningRate 0.0460 Epoch: 6 Global Step: 107320 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:51:30,864-Speed 9639.74 samples/sec Loss 6.9332 LearningRate 0.0460 Epoch: 6 Global Step: 107330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:31,929-Speed 9624.82 samples/sec Loss 7.0110 LearningRate 0.0460 Epoch: 6 Global Step: 107340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:32,991-Speed 9651.85 samples/sec Loss 6.8885 LearningRate 0.0460 Epoch: 6 Global Step: 107350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:34,057-Speed 9609.87 samples/sec Loss 6.9600 LearningRate 0.0460 Epoch: 6 Global Step: 107360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:35,174-Speed 9172.19 samples/sec Loss 7.0113 LearningRate 0.0460 Epoch: 6 Global Step: 107370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:36,230-Speed 9705.60 samples/sec Loss 6.9293 LearningRate 0.0460 Epoch: 6 Global Step: 107380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:37,304-Speed 9535.31 samples/sec Loss 6.8396 LearningRate 0.0460 Epoch: 6 Global Step: 107390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:38,402-Speed 9331.32 samples/sec Loss 6.9062 LearningRate 0.0460 Epoch: 6 Global Step: 107400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:39,479-Speed 9513.77 samples/sec Loss 6.9177 LearningRate 0.0460 Epoch: 6 Global Step: 107410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:40,589-Speed 9231.56 samples/sec Loss 6.9280 LearningRate 0.0460 Epoch: 6 Global Step: 107420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:41,675-Speed 9436.90 samples/sec Loss 6.8322 LearningRate 0.0460 Epoch: 6 Global Step: 107430 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:51:42,768-Speed 9376.33 samples/sec Loss 6.9270 LearningRate 0.0460 Epoch: 6 Global Step: 107440 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:51:43,841-Speed 9545.22 samples/sec Loss 6.8333 LearningRate 0.0460 Epoch: 6 Global Step: 107450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:44,911-Speed 9579.81 samples/sec Loss 6.9451 LearningRate 0.0460 Epoch: 6 Global Step: 107460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:46,024-Speed 9201.40 samples/sec Loss 6.9931 LearningRate 0.0460 Epoch: 6 Global Step: 107470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:47,087-Speed 9642.55 samples/sec Loss 6.9764 LearningRate 0.0460 Epoch: 6 Global Step: 107480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:48,149-Speed 9650.60 samples/sec Loss 6.8643 LearningRate 0.0460 Epoch: 6 Global Step: 107490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:49,215-Speed 9606.28 samples/sec Loss 6.9312 LearningRate 0.0460 Epoch: 6 Global Step: 107500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:50,266-Speed 9753.77 samples/sec Loss 6.8361 LearningRate 0.0460 Epoch: 6 Global Step: 107510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:51,309-Speed 9821.32 samples/sec Loss 6.9064 LearningRate 0.0460 Epoch: 6 Global Step: 107520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:52,411-Speed 9298.90 samples/sec Loss 6.9725 LearningRate 0.0460 Epoch: 6 Global Step: 107530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:53,562-Speed 8895.33 samples/sec Loss 6.8538 LearningRate 0.0459 Epoch: 6 Global Step: 107540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:54,645-Speed 9463.74 samples/sec Loss 7.0254 LearningRate 0.0459 Epoch: 6 Global Step: 107550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:55,709-Speed 9631.86 samples/sec Loss 6.8783 LearningRate 0.0459 Epoch: 6 Global Step: 107560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:56,770-Speed 9655.04 samples/sec Loss 6.9237 LearningRate 0.0459 Epoch: 6 Global Step: 107570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:51:57,862-Speed 9387.01 samples/sec Loss 7.0443 LearningRate 0.0459 Epoch: 6 Global Step: 107580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:58,918-Speed 9705.72 samples/sec Loss 6.9373 LearningRate 0.0459 Epoch: 6 Global Step: 107590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:51:59,981-Speed 9633.29 samples/sec Loss 6.9914 LearningRate 0.0459 Epoch: 6 Global Step: 107600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:01,061-Speed 9493.68 samples/sec Loss 6.8867 LearningRate 0.0459 Epoch: 6 Global Step: 107610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:02,138-Speed 9513.28 samples/sec Loss 6.9129 LearningRate 0.0459 Epoch: 6 Global Step: 107620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:03,184-Speed 9794.31 samples/sec Loss 6.8688 LearningRate 0.0459 Epoch: 6 Global Step: 107630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:04,223-Speed 9860.26 samples/sec Loss 6.9110 LearningRate 0.0459 Epoch: 6 Global Step: 107640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:05,268-Speed 9807.47 samples/sec Loss 6.8690 LearningRate 0.0459 Epoch: 6 Global Step: 107650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:06,406-Speed 9005.18 samples/sec Loss 6.9379 LearningRate 0.0459 Epoch: 6 Global Step: 107660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:07,494-Speed 9416.46 samples/sec Loss 7.0150 LearningRate 0.0459 Epoch: 6 Global Step: 107670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:08,581-Speed 9419.60 samples/sec Loss 6.9038 LearningRate 0.0459 Epoch: 6 Global Step: 107680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:09,646-Speed 9623.26 samples/sec Loss 6.8142 LearningRate 0.0459 Epoch: 6 Global Step: 107690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:10,710-Speed 9632.01 samples/sec Loss 6.9370 LearningRate 0.0459 Epoch: 6 Global Step: 107700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:11,768-Speed 9682.99 samples/sec Loss 6.8504 LearningRate 0.0459 Epoch: 6 Global Step: 107710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:12,891-Speed 9122.42 samples/sec Loss 6.9116 LearningRate 0.0459 Epoch: 6 Global Step: 107720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:13,977-Speed 9433.76 samples/sec Loss 6.9939 LearningRate 0.0459 Epoch: 6 Global Step: 107730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:15,048-Speed 9567.96 samples/sec Loss 7.0023 LearningRate 0.0459 Epoch: 6 Global Step: 107740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:16,106-Speed 9682.32 samples/sec Loss 6.7953 LearningRate 0.0459 Epoch: 6 Global Step: 107750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:17,136-Speed 9955.41 samples/sec Loss 6.8372 LearningRate 0.0459 Epoch: 6 Global Step: 107760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:18,220-Speed 9450.03 samples/sec Loss 6.8268 LearningRate 0.0459 Epoch: 6 Global Step: 107770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:19,250-Speed 9949.59 samples/sec Loss 6.9161 LearningRate 0.0459 Epoch: 6 Global Step: 107780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:20,329-Speed 9494.82 samples/sec Loss 7.0035 LearningRate 0.0458 Epoch: 6 Global Step: 107790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:21,396-Speed 9601.79 samples/sec Loss 6.8937 LearningRate 0.0458 Epoch: 6 Global Step: 107800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:22,474-Speed 9504.57 samples/sec Loss 6.9094 LearningRate 0.0458 Epoch: 6 Global Step: 107810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:23,569-Speed 9354.69 samples/sec Loss 6.9917 LearningRate 0.0458 Epoch: 6 Global Step: 107820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:24,645-Speed 9524.82 samples/sec Loss 6.8572 LearningRate 0.0458 Epoch: 6 Global Step: 107830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:25,702-Speed 9701.50 samples/sec Loss 6.8916 LearningRate 0.0458 Epoch: 6 Global Step: 107840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:26,740-Speed 9867.93 samples/sec Loss 6.8359 LearningRate 0.0458 Epoch: 6 Global Step: 107850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:27,799-Speed 9669.02 samples/sec Loss 6.7758 LearningRate 0.0458 Epoch: 6 Global Step: 107860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:28,926-Speed 9094.40 samples/sec Loss 6.8662 LearningRate 0.0458 Epoch: 6 Global Step: 107870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:30,021-Speed 9364.17 samples/sec Loss 7.0419 LearningRate 0.0458 Epoch: 6 Global Step: 107880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:31,112-Speed 9383.82 samples/sec Loss 6.7906 LearningRate 0.0458 Epoch: 6 Global Step: 107890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:32,221-Speed 9242.87 samples/sec Loss 6.9107 LearningRate 0.0458 Epoch: 6 Global Step: 107900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:33,343-Speed 9137.75 samples/sec Loss 6.7910 LearningRate 0.0458 Epoch: 6 Global Step: 107910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:34,382-Speed 9852.12 samples/sec Loss 6.8898 LearningRate 0.0458 Epoch: 6 Global Step: 107920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:35,490-Speed 9249.78 samples/sec Loss 6.9576 LearningRate 0.0458 Epoch: 6 Global Step: 107930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:36,586-Speed 9349.06 samples/sec Loss 6.9216 LearningRate 0.0458 Epoch: 6 Global Step: 107940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:37,674-Speed 9426.24 samples/sec Loss 6.8689 LearningRate 0.0458 Epoch: 6 Global Step: 107950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:38,780-Speed 9259.76 samples/sec Loss 6.7596 LearningRate 0.0458 Epoch: 6 Global Step: 107960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:52:39,868-Speed 9417.84 samples/sec Loss 6.8974 LearningRate 0.0458 Epoch: 6 Global Step: 107970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:40,957-Speed 9405.15 samples/sec Loss 6.8233 LearningRate 0.0458 Epoch: 6 Global Step: 107980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:42,044-Speed 9429.11 samples/sec Loss 6.7681 LearningRate 0.0458 Epoch: 6 Global Step: 107990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:52:43,158-Speed 9198.10 samples/sec Loss 6.8618 LearningRate 0.0458 Epoch: 6 Global Step: 108000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:53:04,915-[lfw][108000]XNorm: 11.272427 Training: 2022-04-11 15:53:04,915-[lfw][108000]Accuracy-Flip: 0.99600+-0.00260 Training: 2022-04-11 15:53:04,916-[lfw][108000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:53:30,093-[cfp_fp][108000]XNorm: 9.577855 Training: 2022-04-11 15:53:30,094-[cfp_fp][108000]Accuracy-Flip: 0.95700+-0.01003 Training: 2022-04-11 15:53:30,094-[cfp_fp][108000]Accuracy-Highest: 0.95857 Training: 2022-04-11 15:53:51,843-[agedb_30][108000]XNorm: 10.826729 Training: 2022-04-11 15:53:51,843-[agedb_30][108000]Accuracy-Flip: 0.96317+-0.00838 Training: 2022-04-11 15:53:51,843-[agedb_30][108000]Accuracy-Highest: 0.96483 Training: 2022-04-11 15:53:52,950-Speed 146.72 samples/sec Loss 6.9105 LearningRate 0.0458 Epoch: 6 Global Step: 108010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:53:54,036-Speed 9430.62 samples/sec Loss 6.8513 LearningRate 0.0458 Epoch: 6 Global Step: 108020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:53:55,078-Speed 9839.25 samples/sec Loss 6.9590 LearningRate 0.0457 Epoch: 6 Global Step: 108030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:53:56,193-Speed 9188.66 samples/sec Loss 6.8845 LearningRate 0.0457 Epoch: 6 Global Step: 108040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:53:57,285-Speed 9386.88 samples/sec Loss 6.8457 LearningRate 0.0457 Epoch: 6 Global Step: 108050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:53:58,402-Speed 9173.29 samples/sec Loss 6.9052 LearningRate 0.0457 Epoch: 6 Global Step: 108060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:53:59,470-Speed 9593.58 samples/sec Loss 6.9710 LearningRate 0.0457 Epoch: 6 Global Step: 108070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:00,552-Speed 9466.88 samples/sec Loss 6.8759 LearningRate 0.0457 Epoch: 6 Global Step: 108080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:01,599-Speed 9783.65 samples/sec Loss 6.9611 LearningRate 0.0457 Epoch: 6 Global Step: 108090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:02,693-Speed 9370.02 samples/sec Loss 6.8934 LearningRate 0.0457 Epoch: 6 Global Step: 108100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:03,784-Speed 9391.18 samples/sec Loss 7.0359 LearningRate 0.0457 Epoch: 6 Global Step: 108110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:04,847-Speed 9634.74 samples/sec Loss 6.8872 LearningRate 0.0457 Epoch: 6 Global Step: 108120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:05,923-Speed 9521.01 samples/sec Loss 6.9729 LearningRate 0.0457 Epoch: 6 Global Step: 108130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:06,999-Speed 9526.75 samples/sec Loss 6.8655 LearningRate 0.0457 Epoch: 6 Global Step: 108140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:08,066-Speed 9597.26 samples/sec Loss 6.9494 LearningRate 0.0457 Epoch: 6 Global Step: 108150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:09,112-Speed 9801.41 samples/sec Loss 6.9066 LearningRate 0.0457 Epoch: 6 Global Step: 108160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:10,159-Speed 9781.83 samples/sec Loss 6.8944 LearningRate 0.0457 Epoch: 6 Global Step: 108170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:11,211-Speed 9744.27 samples/sec Loss 6.9251 LearningRate 0.0457 Epoch: 6 Global Step: 108180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:12,255-Speed 9807.15 samples/sec Loss 6.9315 LearningRate 0.0457 Epoch: 6 Global Step: 108190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:13,316-Speed 9662.29 samples/sec Loss 6.7143 LearningRate 0.0457 Epoch: 6 Global Step: 108200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:14,388-Speed 9559.22 samples/sec Loss 6.9271 LearningRate 0.0457 Epoch: 6 Global Step: 108210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:15,465-Speed 9514.97 samples/sec Loss 7.0072 LearningRate 0.0457 Epoch: 6 Global Step: 108220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:16,611-Speed 8940.53 samples/sec Loss 6.9331 LearningRate 0.0457 Epoch: 6 Global Step: 108230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:17,688-Speed 9517.55 samples/sec Loss 6.8820 LearningRate 0.0457 Epoch: 6 Global Step: 108240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:18,767-Speed 9500.32 samples/sec Loss 6.9420 LearningRate 0.0457 Epoch: 6 Global Step: 108250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:19,824-Speed 9690.27 samples/sec Loss 6.9073 LearningRate 0.0457 Epoch: 6 Global Step: 108260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:20,935-Speed 9221.55 samples/sec Loss 6.8654 LearningRate 0.0457 Epoch: 6 Global Step: 108270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:22,076-Speed 8978.42 samples/sec Loss 6.8185 LearningRate 0.0456 Epoch: 6 Global Step: 108280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:23,168-Speed 9386.11 samples/sec Loss 6.8052 LearningRate 0.0456 Epoch: 6 Global Step: 108290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:24,283-Speed 9189.11 samples/sec Loss 6.9546 LearningRate 0.0456 Epoch: 6 Global Step: 108300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:25,366-Speed 9456.05 samples/sec Loss 6.8604 LearningRate 0.0456 Epoch: 6 Global Step: 108310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:26,451-Speed 9445.83 samples/sec Loss 6.8522 LearningRate 0.0456 Epoch: 6 Global Step: 108320 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:54:27,521-Speed 9581.26 samples/sec Loss 6.9645 LearningRate 0.0456 Epoch: 6 Global Step: 108330 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:54:28,582-Speed 9651.47 samples/sec Loss 6.8900 LearningRate 0.0456 Epoch: 6 Global Step: 108340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:54:29,647-Speed 9618.17 samples/sec Loss 6.9794 LearningRate 0.0456 Epoch: 6 Global Step: 108350 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:54:30,685-Speed 9877.83 samples/sec Loss 6.9663 LearningRate 0.0456 Epoch: 6 Global Step: 108360 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:54:31,795-Speed 9227.56 samples/sec Loss 6.8808 LearningRate 0.0456 Epoch: 6 Global Step: 108370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:32,874-Speed 9498.93 samples/sec Loss 6.9947 LearningRate 0.0456 Epoch: 6 Global Step: 108380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:33,939-Speed 9621.46 samples/sec Loss 6.8456 LearningRate 0.0456 Epoch: 6 Global Step: 108390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:35,006-Speed 9602.27 samples/sec Loss 6.8946 LearningRate 0.0456 Epoch: 6 Global Step: 108400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:36,093-Speed 9429.13 samples/sec Loss 6.8375 LearningRate 0.0456 Epoch: 6 Global Step: 108410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:37,164-Speed 9566.56 samples/sec Loss 6.9271 LearningRate 0.0456 Epoch: 6 Global Step: 108420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:38,241-Speed 9509.47 samples/sec Loss 6.9776 LearningRate 0.0456 Epoch: 6 Global Step: 108430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:39,344-Speed 9291.37 samples/sec Loss 6.8723 LearningRate 0.0456 Epoch: 6 Global Step: 108440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:40,412-Speed 9593.36 samples/sec Loss 6.9896 LearningRate 0.0456 Epoch: 6 Global Step: 108450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:41,486-Speed 9543.67 samples/sec Loss 6.9407 LearningRate 0.0456 Epoch: 6 Global Step: 108460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:42,566-Speed 9481.51 samples/sec Loss 6.8686 LearningRate 0.0456 Epoch: 6 Global Step: 108470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:43,677-Speed 9223.52 samples/sec Loss 6.7859 LearningRate 0.0456 Epoch: 6 Global Step: 108480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:54:44,775-Speed 9335.07 samples/sec Loss 6.8148 LearningRate 0.0456 Epoch: 6 Global Step: 108490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:45,895-Speed 9148.43 samples/sec Loss 6.9652 LearningRate 0.0456 Epoch: 6 Global Step: 108500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:46,953-Speed 9687.43 samples/sec Loss 6.9834 LearningRate 0.0456 Epoch: 6 Global Step: 108510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:47,993-Speed 9845.79 samples/sec Loss 6.8957 LearningRate 0.0456 Epoch: 6 Global Step: 108520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:49,097-Speed 9285.49 samples/sec Loss 6.8408 LearningRate 0.0455 Epoch: 6 Global Step: 108530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:50,167-Speed 9572.45 samples/sec Loss 6.9186 LearningRate 0.0455 Epoch: 6 Global Step: 108540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:51,266-Speed 9322.74 samples/sec Loss 6.9604 LearningRate 0.0455 Epoch: 6 Global Step: 108550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:52,344-Speed 9500.52 samples/sec Loss 6.8550 LearningRate 0.0455 Epoch: 6 Global Step: 108560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:53,392-Speed 9773.97 samples/sec Loss 6.9077 LearningRate 0.0455 Epoch: 6 Global Step: 108570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:54,483-Speed 9398.57 samples/sec Loss 6.8861 LearningRate 0.0455 Epoch: 6 Global Step: 108580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:55,544-Speed 9656.68 samples/sec Loss 6.9363 LearningRate 0.0455 Epoch: 6 Global Step: 108590 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:54:56,644-Speed 9316.78 samples/sec Loss 6.9250 LearningRate 0.0455 Epoch: 6 Global Step: 108600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:57,744-Speed 9311.51 samples/sec Loss 6.8066 LearningRate 0.0455 Epoch: 6 Global Step: 108610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:58,839-Speed 9355.37 samples/sec Loss 6.9258 LearningRate 0.0455 Epoch: 6 Global Step: 108620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:54:59,916-Speed 9513.74 samples/sec Loss 6.8127 LearningRate 0.0455 Epoch: 6 Global Step: 108630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:00,971-Speed 9713.51 samples/sec Loss 6.8613 LearningRate 0.0455 Epoch: 6 Global Step: 108640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:02,026-Speed 9718.16 samples/sec Loss 6.8932 LearningRate 0.0455 Epoch: 6 Global Step: 108650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:03,104-Speed 9502.42 samples/sec Loss 6.8646 LearningRate 0.0455 Epoch: 6 Global Step: 108660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:04,183-Speed 9498.87 samples/sec Loss 6.8460 LearningRate 0.0455 Epoch: 6 Global Step: 108670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:05,285-Speed 9293.16 samples/sec Loss 6.7700 LearningRate 0.0455 Epoch: 6 Global Step: 108680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:06,348-Speed 9645.35 samples/sec Loss 6.9303 LearningRate 0.0455 Epoch: 6 Global Step: 108690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:07,426-Speed 9496.12 samples/sec Loss 6.8386 LearningRate 0.0455 Epoch: 6 Global Step: 108700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:55:08,500-Speed 9541.56 samples/sec Loss 6.8077 LearningRate 0.0455 Epoch: 6 Global Step: 108710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:55:09,573-Speed 9554.55 samples/sec Loss 6.9255 LearningRate 0.0455 Epoch: 6 Global Step: 108720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:55:10,673-Speed 9313.24 samples/sec Loss 6.8308 LearningRate 0.0455 Epoch: 6 Global Step: 108730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:55:11,740-Speed 9601.26 samples/sec Loss 6.8333 LearningRate 0.0455 Epoch: 6 Global Step: 108740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:55:12,843-Speed 9283.71 samples/sec Loss 6.8551 LearningRate 0.0455 Epoch: 6 Global Step: 108750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:55:13,932-Speed 9410.89 samples/sec Loss 6.8418 LearningRate 0.0455 Epoch: 6 Global Step: 108760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:55:14,967-Speed 9902.27 samples/sec Loss 6.8954 LearningRate 0.0454 Epoch: 6 Global Step: 108770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:55:16,039-Speed 9558.93 samples/sec Loss 6.9495 LearningRate 0.0454 Epoch: 6 Global Step: 108780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:55:17,137-Speed 9330.69 samples/sec Loss 6.9225 LearningRate 0.0454 Epoch: 6 Global Step: 108790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:55:18,226-Speed 9411.31 samples/sec Loss 6.9204 LearningRate 0.0454 Epoch: 6 Global Step: 108800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:19,316-Speed 9397.92 samples/sec Loss 6.9373 LearningRate 0.0454 Epoch: 6 Global Step: 108810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:20,409-Speed 9373.30 samples/sec Loss 6.8747 LearningRate 0.0454 Epoch: 6 Global Step: 108820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:21,507-Speed 9333.61 samples/sec Loss 6.8496 LearningRate 0.0454 Epoch: 6 Global Step: 108830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:22,566-Speed 9676.06 samples/sec Loss 6.8767 LearningRate 0.0454 Epoch: 6 Global Step: 108840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:23,645-Speed 9493.73 samples/sec Loss 6.8052 LearningRate 0.0454 Epoch: 6 Global Step: 108850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:24,724-Speed 9499.15 samples/sec Loss 6.9075 LearningRate 0.0454 Epoch: 6 Global Step: 108860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:25,842-Speed 9165.95 samples/sec Loss 6.9076 LearningRate 0.0454 Epoch: 6 Global Step: 108870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:26,948-Speed 9264.29 samples/sec Loss 6.8992 LearningRate 0.0454 Epoch: 6 Global Step: 108880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:28,024-Speed 9520.52 samples/sec Loss 6.8721 LearningRate 0.0454 Epoch: 6 Global Step: 108890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:29,073-Speed 9766.93 samples/sec Loss 6.8815 LearningRate 0.0454 Epoch: 6 Global Step: 108900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:30,164-Speed 9397.39 samples/sec Loss 6.8941 LearningRate 0.0454 Epoch: 6 Global Step: 108910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:31,247-Speed 9460.57 samples/sec Loss 6.9094 LearningRate 0.0454 Epoch: 6 Global Step: 108920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:32,289-Speed 9830.67 samples/sec Loss 6.9196 LearningRate 0.0454 Epoch: 6 Global Step: 108930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:33,353-Speed 9630.63 samples/sec Loss 6.9383 LearningRate 0.0454 Epoch: 6 Global Step: 108940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:34,414-Speed 9660.47 samples/sec Loss 6.9224 LearningRate 0.0454 Epoch: 6 Global Step: 108950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:35,457-Speed 9823.07 samples/sec Loss 6.8462 LearningRate 0.0454 Epoch: 6 Global Step: 108960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:36,510-Speed 9726.09 samples/sec Loss 6.9460 LearningRate 0.0454 Epoch: 6 Global Step: 108970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:37,584-Speed 9542.66 samples/sec Loss 6.8231 LearningRate 0.0454 Epoch: 6 Global Step: 108980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:38,669-Speed 9439.73 samples/sec Loss 6.9970 LearningRate 0.0454 Epoch: 6 Global Step: 108990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:39,736-Speed 9600.79 samples/sec Loss 6.9377 LearningRate 0.0454 Epoch: 6 Global Step: 109000 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:55:40,855-Speed 9160.81 samples/sec Loss 6.9547 LearningRate 0.0454 Epoch: 6 Global Step: 109010 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:55:41,941-Speed 9426.68 samples/sec Loss 6.8352 LearningRate 0.0453 Epoch: 6 Global Step: 109020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:43,022-Speed 9478.30 samples/sec Loss 6.9196 LearningRate 0.0453 Epoch: 6 Global Step: 109030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:44,113-Speed 9395.62 samples/sec Loss 7.0188 LearningRate 0.0453 Epoch: 6 Global Step: 109040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:45,166-Speed 9740.14 samples/sec Loss 6.9821 LearningRate 0.0453 Epoch: 6 Global Step: 109050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:46,226-Speed 9664.48 samples/sec Loss 6.8727 LearningRate 0.0453 Epoch: 6 Global Step: 109060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:47,301-Speed 9531.48 samples/sec Loss 6.8130 LearningRate 0.0453 Epoch: 6 Global Step: 109070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:48,360-Speed 9668.02 samples/sec Loss 6.9308 LearningRate 0.0453 Epoch: 6 Global Step: 109080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:49,484-Speed 9117.97 samples/sec Loss 6.8334 LearningRate 0.0453 Epoch: 6 Global Step: 109090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:50,567-Speed 9458.31 samples/sec Loss 6.8326 LearningRate 0.0453 Epoch: 6 Global Step: 109100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:51,652-Speed 9442.35 samples/sec Loss 6.8751 LearningRate 0.0453 Epoch: 6 Global Step: 109110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:52,733-Speed 9484.08 samples/sec Loss 6.9450 LearningRate 0.0453 Epoch: 6 Global Step: 109120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:53,806-Speed 9546.04 samples/sec Loss 6.9295 LearningRate 0.0453 Epoch: 6 Global Step: 109130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:54,910-Speed 9278.05 samples/sec Loss 6.7746 LearningRate 0.0453 Epoch: 6 Global Step: 109140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:55,970-Speed 9669.45 samples/sec Loss 6.8338 LearningRate 0.0453 Epoch: 6 Global Step: 109150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:57,077-Speed 9253.24 samples/sec Loss 6.9083 LearningRate 0.0453 Epoch: 6 Global Step: 109160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:58,174-Speed 9342.65 samples/sec Loss 6.8695 LearningRate 0.0453 Epoch: 6 Global Step: 109170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:55:59,242-Speed 9598.62 samples/sec Loss 6.9157 LearningRate 0.0453 Epoch: 6 Global Step: 109180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:00,337-Speed 9351.02 samples/sec Loss 6.9443 LearningRate 0.0453 Epoch: 6 Global Step: 109190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:01,401-Speed 9630.84 samples/sec Loss 6.9002 LearningRate 0.0453 Epoch: 6 Global Step: 109200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:02,487-Speed 9439.83 samples/sec Loss 6.7999 LearningRate 0.0453 Epoch: 6 Global Step: 109210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:03,514-Speed 9983.18 samples/sec Loss 6.8635 LearningRate 0.0453 Epoch: 6 Global Step: 109220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:04,551-Speed 9882.67 samples/sec Loss 6.9387 LearningRate 0.0453 Epoch: 6 Global Step: 109230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:05,614-Speed 9638.21 samples/sec Loss 6.9817 LearningRate 0.0453 Epoch: 6 Global Step: 109240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:06,691-Speed 9508.24 samples/sec Loss 7.0278 LearningRate 0.0453 Epoch: 6 Global Step: 109250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:07,755-Speed 9628.44 samples/sec Loss 6.9788 LearningRate 0.0453 Epoch: 6 Global Step: 109260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:08,840-Speed 9445.43 samples/sec Loss 6.9895 LearningRate 0.0452 Epoch: 6 Global Step: 109270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:09,892-Speed 9736.22 samples/sec Loss 6.9225 LearningRate 0.0452 Epoch: 6 Global Step: 109280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:10,952-Speed 9667.64 samples/sec Loss 7.0147 LearningRate 0.0452 Epoch: 6 Global Step: 109290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:12,011-Speed 9670.73 samples/sec Loss 6.8780 LearningRate 0.0452 Epoch: 6 Global Step: 109300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:13,093-Speed 9469.58 samples/sec Loss 6.8738 LearningRate 0.0452 Epoch: 6 Global Step: 109310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:14,174-Speed 9483.75 samples/sec Loss 6.9077 LearningRate 0.0452 Epoch: 6 Global Step: 109320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:15,243-Speed 9588.88 samples/sec Loss 6.9149 LearningRate 0.0452 Epoch: 6 Global Step: 109330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:16,281-Speed 9871.62 samples/sec Loss 6.8652 LearningRate 0.0452 Epoch: 6 Global Step: 109340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:17,337-Speed 9697.39 samples/sec Loss 6.9232 LearningRate 0.0452 Epoch: 6 Global Step: 109350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:18,414-Speed 9513.17 samples/sec Loss 6.9019 LearningRate 0.0452 Epoch: 6 Global Step: 109360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:19,522-Speed 9243.60 samples/sec Loss 6.9882 LearningRate 0.0452 Epoch: 6 Global Step: 109370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:20,633-Speed 9226.35 samples/sec Loss 6.8536 LearningRate 0.0452 Epoch: 6 Global Step: 109380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:21,724-Speed 9388.67 samples/sec Loss 7.0047 LearningRate 0.0452 Epoch: 6 Global Step: 109390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:22,763-Speed 9862.70 samples/sec Loss 6.8820 LearningRate 0.0452 Epoch: 6 Global Step: 109400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:23,848-Speed 9448.42 samples/sec Loss 6.8486 LearningRate 0.0452 Epoch: 6 Global Step: 109410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:24,911-Speed 9637.85 samples/sec Loss 6.8291 LearningRate 0.0452 Epoch: 6 Global Step: 109420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:25,990-Speed 9498.67 samples/sec Loss 6.8468 LearningRate 0.0452 Epoch: 6 Global Step: 109430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:27,051-Speed 9653.31 samples/sec Loss 6.8386 LearningRate 0.0452 Epoch: 6 Global Step: 109440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:28,125-Speed 9538.23 samples/sec Loss 6.9101 LearningRate 0.0452 Epoch: 6 Global Step: 109450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:29,173-Speed 9778.40 samples/sec Loss 6.8689 LearningRate 0.0452 Epoch: 6 Global Step: 109460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:30,249-Speed 9526.92 samples/sec Loss 6.8252 LearningRate 0.0452 Epoch: 6 Global Step: 109470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:31,358-Speed 9231.50 samples/sec Loss 6.9764 LearningRate 0.0452 Epoch: 6 Global Step: 109480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:32,458-Speed 9320.51 samples/sec Loss 6.9051 LearningRate 0.0452 Epoch: 6 Global Step: 109490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:33,517-Speed 9671.13 samples/sec Loss 6.7911 LearningRate 0.0452 Epoch: 6 Global Step: 109500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:34,587-Speed 9574.32 samples/sec Loss 6.9636 LearningRate 0.0452 Epoch: 6 Global Step: 109510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:35,675-Speed 9423.01 samples/sec Loss 6.8885 LearningRate 0.0451 Epoch: 6 Global Step: 109520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:36,800-Speed 9105.72 samples/sec Loss 6.8755 LearningRate 0.0451 Epoch: 6 Global Step: 109530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:37,936-Speed 9020.67 samples/sec Loss 6.9948 LearningRate 0.0451 Epoch: 6 Global Step: 109540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:39,001-Speed 9617.76 samples/sec Loss 6.9084 LearningRate 0.0451 Epoch: 6 Global Step: 109550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:40,034-Speed 9917.83 samples/sec Loss 6.8867 LearningRate 0.0451 Epoch: 6 Global Step: 109560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:41,095-Speed 9662.40 samples/sec Loss 6.7562 LearningRate 0.0451 Epoch: 6 Global Step: 109570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:42,201-Speed 9265.47 samples/sec Loss 6.7567 LearningRate 0.0451 Epoch: 6 Global Step: 109580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:43,306-Speed 9265.63 samples/sec Loss 6.8932 LearningRate 0.0451 Epoch: 6 Global Step: 109590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:44,403-Speed 9343.02 samples/sec Loss 6.9921 LearningRate 0.0451 Epoch: 6 Global Step: 109600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:45,526-Speed 9125.13 samples/sec Loss 6.9753 LearningRate 0.0451 Epoch: 6 Global Step: 109610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:46,563-Speed 9879.08 samples/sec Loss 6.8676 LearningRate 0.0451 Epoch: 6 Global Step: 109620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:47,626-Speed 9637.86 samples/sec Loss 6.8353 LearningRate 0.0451 Epoch: 6 Global Step: 109630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:48,707-Speed 9478.75 samples/sec Loss 6.9690 LearningRate 0.0451 Epoch: 6 Global Step: 109640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:49,804-Speed 9339.44 samples/sec Loss 6.9677 LearningRate 0.0451 Epoch: 6 Global Step: 109650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:56:50,882-Speed 9504.22 samples/sec Loss 6.9863 LearningRate 0.0451 Epoch: 6 Global Step: 109660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:51,959-Speed 9514.45 samples/sec Loss 6.9162 LearningRate 0.0451 Epoch: 6 Global Step: 109670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:53,031-Speed 9556.63 samples/sec Loss 6.8396 LearningRate 0.0451 Epoch: 6 Global Step: 109680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:54,134-Speed 9291.71 samples/sec Loss 6.9646 LearningRate 0.0451 Epoch: 6 Global Step: 109690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:55,210-Speed 9521.98 samples/sec Loss 6.8023 LearningRate 0.0451 Epoch: 6 Global Step: 109700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:56,327-Speed 9171.82 samples/sec Loss 6.7901 LearningRate 0.0451 Epoch: 6 Global Step: 109710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:57,414-Speed 9427.84 samples/sec Loss 6.9287 LearningRate 0.0451 Epoch: 6 Global Step: 109720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:58,496-Speed 9472.38 samples/sec Loss 6.7726 LearningRate 0.0451 Epoch: 6 Global Step: 109730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:56:59,558-Speed 9645.59 samples/sec Loss 6.9584 LearningRate 0.0451 Epoch: 6 Global Step: 109740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:00,633-Speed 9538.66 samples/sec Loss 6.8575 LearningRate 0.0451 Epoch: 6 Global Step: 109750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:01,714-Speed 9476.80 samples/sec Loss 6.9024 LearningRate 0.0451 Epoch: 6 Global Step: 109760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:02,801-Speed 9423.94 samples/sec Loss 6.8551 LearningRate 0.0450 Epoch: 6 Global Step: 109770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:03,865-Speed 9630.22 samples/sec Loss 6.9025 LearningRate 0.0450 Epoch: 6 Global Step: 109780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:04,908-Speed 9822.08 samples/sec Loss 6.9496 LearningRate 0.0450 Epoch: 6 Global Step: 109790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:06,018-Speed 9234.92 samples/sec Loss 6.9164 LearningRate 0.0450 Epoch: 6 Global Step: 109800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:07,077-Speed 9673.50 samples/sec Loss 6.9135 LearningRate 0.0450 Epoch: 6 Global Step: 109810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:08,120-Speed 9823.04 samples/sec Loss 6.8166 LearningRate 0.0450 Epoch: 6 Global Step: 109820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:09,187-Speed 9603.59 samples/sec Loss 6.8695 LearningRate 0.0450 Epoch: 6 Global Step: 109830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:10,249-Speed 9642.77 samples/sec Loss 6.8529 LearningRate 0.0450 Epoch: 6 Global Step: 109840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:11,347-Speed 9335.04 samples/sec Loss 6.8898 LearningRate 0.0450 Epoch: 6 Global Step: 109850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:12,461-Speed 9196.80 samples/sec Loss 6.8423 LearningRate 0.0450 Epoch: 6 Global Step: 109860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:13,558-Speed 9342.19 samples/sec Loss 6.8979 LearningRate 0.0450 Epoch: 6 Global Step: 109870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:14,635-Speed 9513.63 samples/sec Loss 6.9578 LearningRate 0.0450 Epoch: 6 Global Step: 109880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:57:15,721-Speed 9438.76 samples/sec Loss 6.7252 LearningRate 0.0450 Epoch: 6 Global Step: 109890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:16,823-Speed 9293.69 samples/sec Loss 6.8842 LearningRate 0.0450 Epoch: 6 Global Step: 109900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:17,900-Speed 9516.16 samples/sec Loss 6.9378 LearningRate 0.0450 Epoch: 6 Global Step: 109910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:18,949-Speed 9770.28 samples/sec Loss 6.8465 LearningRate 0.0450 Epoch: 6 Global Step: 109920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:20,061-Speed 9212.92 samples/sec Loss 6.8624 LearningRate 0.0450 Epoch: 6 Global Step: 109930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:21,177-Speed 9182.19 samples/sec Loss 6.9315 LearningRate 0.0450 Epoch: 6 Global Step: 109940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:22,236-Speed 9676.27 samples/sec Loss 6.8422 LearningRate 0.0450 Epoch: 6 Global Step: 109950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:23,272-Speed 9892.34 samples/sec Loss 6.9037 LearningRate 0.0450 Epoch: 6 Global Step: 109960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:24,348-Speed 9518.24 samples/sec Loss 6.8485 LearningRate 0.0450 Epoch: 6 Global Step: 109970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:25,408-Speed 9669.14 samples/sec Loss 6.9500 LearningRate 0.0450 Epoch: 6 Global Step: 109980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:26,508-Speed 9316.50 samples/sec Loss 6.9742 LearningRate 0.0450 Epoch: 6 Global Step: 109990 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 15:57:27,558-Speed 9757.06 samples/sec Loss 6.9436 LearningRate 0.0450 Epoch: 6 Global Step: 110000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:57:49,353-[lfw][110000]XNorm: 10.955663 Training: 2022-04-11 15:57:49,354-[lfw][110000]Accuracy-Flip: 0.99500+-0.00342 Training: 2022-04-11 15:57:49,354-[lfw][110000]Accuracy-Highest: 0.99683 Training: 2022-04-11 15:58:14,839-[cfp_fp][110000]XNorm: 9.328135 Training: 2022-04-11 15:58:14,840-[cfp_fp][110000]Accuracy-Flip: 0.96157+-0.00987 Training: 2022-04-11 15:58:14,840-[cfp_fp][110000]Accuracy-Highest: 0.96157 Training: 2022-04-11 15:58:36,856-[agedb_30][110000]XNorm: 10.627442 Training: 2022-04-11 15:58:36,856-[agedb_30][110000]Accuracy-Flip: 0.96317+-0.00728 Training: 2022-04-11 15:58:36,857-[agedb_30][110000]Accuracy-Highest: 0.96483 Training: 2022-04-11 15:58:37,920-Speed 145.53 samples/sec Loss 6.8450 LearningRate 0.0450 Epoch: 6 Global Step: 110010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:38,981-Speed 9659.02 samples/sec Loss 6.9452 LearningRate 0.0449 Epoch: 6 Global Step: 110020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:40,016-Speed 9903.35 samples/sec Loss 6.9396 LearningRate 0.0449 Epoch: 6 Global Step: 110030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:41,088-Speed 9556.92 samples/sec Loss 6.9088 LearningRate 0.0449 Epoch: 6 Global Step: 110040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:42,193-Speed 9269.30 samples/sec Loss 6.9043 LearningRate 0.0449 Epoch: 6 Global Step: 110050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:43,242-Speed 9774.81 samples/sec Loss 6.9385 LearningRate 0.0449 Epoch: 6 Global Step: 110060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:44,267-Speed 9995.75 samples/sec Loss 6.9062 LearningRate 0.0449 Epoch: 6 Global Step: 110070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:45,328-Speed 9659.08 samples/sec Loss 6.8470 LearningRate 0.0449 Epoch: 6 Global Step: 110080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:46,458-Speed 9063.88 samples/sec Loss 6.9333 LearningRate 0.0449 Epoch: 6 Global Step: 110090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:47,556-Speed 9332.39 samples/sec Loss 6.9626 LearningRate 0.0449 Epoch: 6 Global Step: 110100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:48,632-Speed 9521.81 samples/sec Loss 6.8640 LearningRate 0.0449 Epoch: 6 Global Step: 110110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:49,734-Speed 9299.55 samples/sec Loss 6.9086 LearningRate 0.0449 Epoch: 6 Global Step: 110120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:58:50,762-Speed 9962.63 samples/sec Loss 6.9801 LearningRate 0.0449 Epoch: 6 Global Step: 110130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:58:51,821-Speed 9675.13 samples/sec Loss 6.9158 LearningRate 0.0449 Epoch: 6 Global Step: 110140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:58:52,861-Speed 9848.86 samples/sec Loss 6.8287 LearningRate 0.0449 Epoch: 6 Global Step: 110150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:58:53,971-Speed 9238.05 samples/sec Loss 6.8217 LearningRate 0.0449 Epoch: 6 Global Step: 110160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:58:55,032-Speed 9652.67 samples/sec Loss 6.7674 LearningRate 0.0449 Epoch: 6 Global Step: 110170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:58:56,112-Speed 9489.94 samples/sec Loss 6.8269 LearningRate 0.0449 Epoch: 6 Global Step: 110180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:58:57,181-Speed 9585.55 samples/sec Loss 6.8820 LearningRate 0.0449 Epoch: 6 Global Step: 110190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:58:58,245-Speed 9623.74 samples/sec Loss 6.8660 LearningRate 0.0449 Epoch: 6 Global Step: 110200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:58:59,318-Speed 9553.38 samples/sec Loss 7.0440 LearningRate 0.0449 Epoch: 6 Global Step: 110210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:00,451-Speed 9039.45 samples/sec Loss 6.8765 LearningRate 0.0449 Epoch: 6 Global Step: 110220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:01,525-Speed 9545.78 samples/sec Loss 6.9687 LearningRate 0.0449 Epoch: 6 Global Step: 110230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:02,580-Speed 9710.28 samples/sec Loss 6.8568 LearningRate 0.0449 Epoch: 6 Global Step: 110240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:03,664-Speed 9450.17 samples/sec Loss 6.9363 LearningRate 0.0449 Epoch: 6 Global Step: 110250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:04,745-Speed 9479.57 samples/sec Loss 6.9002 LearningRate 0.0449 Epoch: 6 Global Step: 110260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:05,814-Speed 9590.90 samples/sec Loss 6.8515 LearningRate 0.0448 Epoch: 6 Global Step: 110270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:06,873-Speed 9675.25 samples/sec Loss 6.9163 LearningRate 0.0448 Epoch: 6 Global Step: 110280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:07,952-Speed 9498.44 samples/sec Loss 6.8935 LearningRate 0.0448 Epoch: 6 Global Step: 110290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:09,030-Speed 9501.69 samples/sec Loss 6.8920 LearningRate 0.0448 Epoch: 6 Global Step: 110300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:10,082-Speed 9739.23 samples/sec Loss 7.0098 LearningRate 0.0448 Epoch: 6 Global Step: 110310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:11,178-Speed 9350.05 samples/sec Loss 6.8355 LearningRate 0.0448 Epoch: 6 Global Step: 110320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:12,280-Speed 9299.15 samples/sec Loss 6.9188 LearningRate 0.0448 Epoch: 6 Global Step: 110330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:13,340-Speed 9664.53 samples/sec Loss 6.8768 LearningRate 0.0448 Epoch: 6 Global Step: 110340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:14,418-Speed 9503.65 samples/sec Loss 6.8298 LearningRate 0.0448 Epoch: 6 Global Step: 110350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:15,519-Speed 9304.96 samples/sec Loss 6.8040 LearningRate 0.0448 Epoch: 6 Global Step: 110360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:16,620-Speed 9310.06 samples/sec Loss 6.8542 LearningRate 0.0448 Epoch: 6 Global Step: 110370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:17,707-Speed 9429.81 samples/sec Loss 7.0409 LearningRate 0.0448 Epoch: 6 Global Step: 110380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:18,794-Speed 9420.82 samples/sec Loss 6.9238 LearningRate 0.0448 Epoch: 6 Global Step: 110390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:19,875-Speed 9481.50 samples/sec Loss 6.9305 LearningRate 0.0448 Epoch: 6 Global Step: 110400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:20,980-Speed 9268.29 samples/sec Loss 6.8853 LearningRate 0.0448 Epoch: 6 Global Step: 110410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:22,090-Speed 9228.84 samples/sec Loss 6.9630 LearningRate 0.0448 Epoch: 6 Global Step: 110420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:23,190-Speed 9318.79 samples/sec Loss 6.9391 LearningRate 0.0448 Epoch: 6 Global Step: 110430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:24,302-Speed 9218.28 samples/sec Loss 6.9487 LearningRate 0.0448 Epoch: 6 Global Step: 110440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:25,372-Speed 9574.45 samples/sec Loss 7.0150 LearningRate 0.0448 Epoch: 6 Global Step: 110450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:26,458-Speed 9458.13 samples/sec Loss 6.8890 LearningRate 0.0448 Epoch: 6 Global Step: 110460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:27,500-Speed 9828.48 samples/sec Loss 6.7512 LearningRate 0.0448 Epoch: 6 Global Step: 110470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:28,603-Speed 9291.93 samples/sec Loss 6.6332 LearningRate 0.0448 Epoch: 6 Global Step: 110480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:29,680-Speed 9513.66 samples/sec Loss 6.9839 LearningRate 0.0448 Epoch: 6 Global Step: 110490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:30,777-Speed 9332.12 samples/sec Loss 6.9242 LearningRate 0.0448 Epoch: 6 Global Step: 110500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:31,887-Speed 9229.85 samples/sec Loss 6.9116 LearningRate 0.0447 Epoch: 6 Global Step: 110510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:32,975-Speed 9426.86 samples/sec Loss 6.9609 LearningRate 0.0447 Epoch: 6 Global Step: 110520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:34,096-Speed 9138.81 samples/sec Loss 6.8553 LearningRate 0.0447 Epoch: 6 Global Step: 110530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:35,147-Speed 9745.64 samples/sec Loss 6.9421 LearningRate 0.0447 Epoch: 6 Global Step: 110540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:36,208-Speed 9661.27 samples/sec Loss 6.9066 LearningRate 0.0447 Epoch: 6 Global Step: 110550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:37,280-Speed 9555.59 samples/sec Loss 6.9421 LearningRate 0.0447 Epoch: 6 Global Step: 110560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:38,401-Speed 9143.36 samples/sec Loss 6.9372 LearningRate 0.0447 Epoch: 6 Global Step: 110570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:39,474-Speed 9546.88 samples/sec Loss 6.8903 LearningRate 0.0447 Epoch: 6 Global Step: 110580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:40,516-Speed 9830.19 samples/sec Loss 6.8421 LearningRate 0.0447 Epoch: 6 Global Step: 110590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:41,626-Speed 9233.41 samples/sec Loss 6.9236 LearningRate 0.0447 Epoch: 6 Global Step: 110600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:42,699-Speed 9543.60 samples/sec Loss 6.8544 LearningRate 0.0447 Epoch: 6 Global Step: 110610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:43,807-Speed 9253.68 samples/sec Loss 6.9962 LearningRate 0.0447 Epoch: 6 Global Step: 110620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:44,913-Speed 9266.20 samples/sec Loss 6.9310 LearningRate 0.0447 Epoch: 6 Global Step: 110630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:46,005-Speed 9382.87 samples/sec Loss 6.9095 LearningRate 0.0447 Epoch: 6 Global Step: 110640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:47,091-Speed 9435.79 samples/sec Loss 6.9357 LearningRate 0.0447 Epoch: 6 Global Step: 110650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:48,166-Speed 9533.24 samples/sec Loss 6.8240 LearningRate 0.0447 Epoch: 6 Global Step: 110660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:49,230-Speed 9628.46 samples/sec Loss 6.8466 LearningRate 0.0447 Epoch: 6 Global Step: 110670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 15:59:50,298-Speed 9591.01 samples/sec Loss 6.8226 LearningRate 0.0447 Epoch: 6 Global Step: 110680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:51,381-Speed 9459.53 samples/sec Loss 6.8362 LearningRate 0.0447 Epoch: 6 Global Step: 110690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:52,498-Speed 9175.12 samples/sec Loss 6.8400 LearningRate 0.0447 Epoch: 6 Global Step: 110700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:53,621-Speed 9122.78 samples/sec Loss 6.7968 LearningRate 0.0447 Epoch: 6 Global Step: 110710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:54,660-Speed 9862.75 samples/sec Loss 6.7806 LearningRate 0.0447 Epoch: 6 Global Step: 110720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:55,738-Speed 9508.28 samples/sec Loss 6.8453 LearningRate 0.0447 Epoch: 6 Global Step: 110730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:56,848-Speed 9225.87 samples/sec Loss 6.9852 LearningRate 0.0447 Epoch: 6 Global Step: 110740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:57,919-Speed 9568.09 samples/sec Loss 6.8636 LearningRate 0.0447 Epoch: 6 Global Step: 110750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 15:59:58,979-Speed 9669.65 samples/sec Loss 6.8575 LearningRate 0.0446 Epoch: 6 Global Step: 110760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:00,068-Speed 9408.70 samples/sec Loss 6.8366 LearningRate 0.0446 Epoch: 6 Global Step: 110770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:01,168-Speed 9311.14 samples/sec Loss 6.8745 LearningRate 0.0446 Epoch: 6 Global Step: 110780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:02,233-Speed 9626.02 samples/sec Loss 6.9046 LearningRate 0.0446 Epoch: 6 Global Step: 110790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:03,329-Speed 9350.73 samples/sec Loss 6.7753 LearningRate 0.0446 Epoch: 6 Global Step: 110800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:04,420-Speed 9393.57 samples/sec Loss 6.9127 LearningRate 0.0446 Epoch: 6 Global Step: 110810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:05,517-Speed 9331.49 samples/sec Loss 6.9142 LearningRate 0.0446 Epoch: 6 Global Step: 110820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:06,603-Speed 9442.06 samples/sec Loss 6.8911 LearningRate 0.0446 Epoch: 6 Global Step: 110830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:07,686-Speed 9454.27 samples/sec Loss 6.8835 LearningRate 0.0446 Epoch: 6 Global Step: 110840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:08,737-Speed 9749.63 samples/sec Loss 6.7294 LearningRate 0.0446 Epoch: 6 Global Step: 110850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:09,832-Speed 9361.31 samples/sec Loss 6.9183 LearningRate 0.0446 Epoch: 6 Global Step: 110860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:10,930-Speed 9331.73 samples/sec Loss 6.8751 LearningRate 0.0446 Epoch: 6 Global Step: 110870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:11,988-Speed 9684.83 samples/sec Loss 6.8897 LearningRate 0.0446 Epoch: 6 Global Step: 110880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:13,147-Speed 8834.69 samples/sec Loss 6.9579 LearningRate 0.0446 Epoch: 6 Global Step: 110890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:14,238-Speed 9391.11 samples/sec Loss 6.9349 LearningRate 0.0446 Epoch: 6 Global Step: 110900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:15,341-Speed 9296.05 samples/sec Loss 7.0523 LearningRate 0.0446 Epoch: 6 Global Step: 110910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:16,416-Speed 9530.26 samples/sec Loss 6.8797 LearningRate 0.0446 Epoch: 6 Global Step: 110920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:17,509-Speed 9370.16 samples/sec Loss 6.9291 LearningRate 0.0446 Epoch: 6 Global Step: 110930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:18,607-Speed 9334.21 samples/sec Loss 6.8774 LearningRate 0.0446 Epoch: 6 Global Step: 110940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:19,685-Speed 9503.12 samples/sec Loss 6.7198 LearningRate 0.0446 Epoch: 6 Global Step: 110950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:20,807-Speed 9132.75 samples/sec Loss 6.8502 LearningRate 0.0446 Epoch: 6 Global Step: 110960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:21,922-Speed 9196.41 samples/sec Loss 6.7989 LearningRate 0.0446 Epoch: 6 Global Step: 110970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:22,975-Speed 9730.57 samples/sec Loss 6.9418 LearningRate 0.0446 Epoch: 6 Global Step: 110980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:24,050-Speed 9531.61 samples/sec Loss 6.8226 LearningRate 0.0446 Epoch: 6 Global Step: 110990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:25,140-Speed 9396.70 samples/sec Loss 6.9454 LearningRate 0.0446 Epoch: 6 Global Step: 111000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:26,218-Speed 9507.71 samples/sec Loss 6.8808 LearningRate 0.0445 Epoch: 6 Global Step: 111010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:27,329-Speed 9217.16 samples/sec Loss 6.8205 LearningRate 0.0445 Epoch: 6 Global Step: 111020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:28,372-Speed 9826.16 samples/sec Loss 6.9571 LearningRate 0.0445 Epoch: 6 Global Step: 111030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:29,422-Speed 9760.25 samples/sec Loss 6.9653 LearningRate 0.0445 Epoch: 6 Global Step: 111040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:30,463-Speed 9838.29 samples/sec Loss 6.8645 LearningRate 0.0445 Epoch: 6 Global Step: 111050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:31,553-Speed 9403.24 samples/sec Loss 6.8191 LearningRate 0.0445 Epoch: 6 Global Step: 111060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:32,620-Speed 9602.90 samples/sec Loss 6.8593 LearningRate 0.0445 Epoch: 6 Global Step: 111070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:33,711-Speed 9389.22 samples/sec Loss 6.9639 LearningRate 0.0445 Epoch: 6 Global Step: 111080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:34,846-Speed 9028.25 samples/sec Loss 6.9011 LearningRate 0.0445 Epoch: 6 Global Step: 111090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:35,954-Speed 9247.88 samples/sec Loss 6.9964 LearningRate 0.0445 Epoch: 6 Global Step: 111100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:37,098-Speed 8953.26 samples/sec Loss 6.8539 LearningRate 0.0445 Epoch: 6 Global Step: 111110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:38,234-Speed 9023.29 samples/sec Loss 6.9475 LearningRate 0.0445 Epoch: 6 Global Step: 111120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:39,325-Speed 9399.25 samples/sec Loss 6.9278 LearningRate 0.0445 Epoch: 6 Global Step: 111130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:40,405-Speed 9486.75 samples/sec Loss 6.7814 LearningRate 0.0445 Epoch: 6 Global Step: 111140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:41,499-Speed 9365.60 samples/sec Loss 6.7481 LearningRate 0.0445 Epoch: 6 Global Step: 111150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:42,541-Speed 9826.06 samples/sec Loss 6.9546 LearningRate 0.0445 Epoch: 6 Global Step: 111160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:43,562-Speed 10043.77 samples/sec Loss 6.8539 LearningRate 0.0445 Epoch: 6 Global Step: 111170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:44,635-Speed 9548.22 samples/sec Loss 6.8159 LearningRate 0.0445 Epoch: 6 Global Step: 111180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:45,697-Speed 9646.35 samples/sec Loss 6.8115 LearningRate 0.0445 Epoch: 6 Global Step: 111190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:46,758-Speed 9650.53 samples/sec Loss 6.9414 LearningRate 0.0445 Epoch: 6 Global Step: 111200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:47,787-Speed 9964.44 samples/sec Loss 6.8090 LearningRate 0.0445 Epoch: 6 Global Step: 111210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:48,860-Speed 9542.51 samples/sec Loss 6.9167 LearningRate 0.0445 Epoch: 6 Global Step: 111220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:49,957-Speed 9344.78 samples/sec Loss 7.0247 LearningRate 0.0445 Epoch: 6 Global Step: 111230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:00:51,029-Speed 9559.13 samples/sec Loss 6.7724 LearningRate 0.0445 Epoch: 6 Global Step: 111240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:52,083-Speed 9716.17 samples/sec Loss 6.8972 LearningRate 0.0445 Epoch: 6 Global Step: 111250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:53,157-Speed 9545.36 samples/sec Loss 6.8561 LearningRate 0.0444 Epoch: 6 Global Step: 111260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:54,242-Speed 9444.87 samples/sec Loss 6.9220 LearningRate 0.0444 Epoch: 6 Global Step: 111270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:55,321-Speed 9490.91 samples/sec Loss 6.7823 LearningRate 0.0444 Epoch: 6 Global Step: 111280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:56,410-Speed 9412.81 samples/sec Loss 6.9358 LearningRate 0.0444 Epoch: 6 Global Step: 111290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:57,504-Speed 9359.49 samples/sec Loss 6.8823 LearningRate 0.0444 Epoch: 6 Global Step: 111300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:58,621-Speed 9173.43 samples/sec Loss 7.0397 LearningRate 0.0444 Epoch: 6 Global Step: 111310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:00:59,755-Speed 9036.59 samples/sec Loss 6.9719 LearningRate 0.0444 Epoch: 6 Global Step: 111320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:00,831-Speed 9531.23 samples/sec Loss 6.9896 LearningRate 0.0444 Epoch: 6 Global Step: 111330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:01,905-Speed 9539.14 samples/sec Loss 6.8588 LearningRate 0.0444 Epoch: 6 Global Step: 111340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:02,988-Speed 9458.96 samples/sec Loss 6.8462 LearningRate 0.0444 Epoch: 6 Global Step: 111350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:04,055-Speed 9602.58 samples/sec Loss 6.9406 LearningRate 0.0444 Epoch: 6 Global Step: 111360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:05,105-Speed 9749.94 samples/sec Loss 6.9653 LearningRate 0.0444 Epoch: 6 Global Step: 111370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:06,167-Speed 9655.29 samples/sec Loss 6.9334 LearningRate 0.0444 Epoch: 6 Global Step: 111380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:07,217-Speed 9757.09 samples/sec Loss 6.8712 LearningRate 0.0444 Epoch: 6 Global Step: 111390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:08,279-Speed 9644.85 samples/sec Loss 6.9684 LearningRate 0.0444 Epoch: 6 Global Step: 111400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:09,322-Speed 9829.97 samples/sec Loss 6.8326 LearningRate 0.0444 Epoch: 6 Global Step: 111410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:10,395-Speed 9541.46 samples/sec Loss 6.9074 LearningRate 0.0444 Epoch: 6 Global Step: 111420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:11,497-Speed 9299.68 samples/sec Loss 6.9401 LearningRate 0.0444 Epoch: 6 Global Step: 111430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:12,560-Speed 9638.13 samples/sec Loss 6.8704 LearningRate 0.0444 Epoch: 6 Global Step: 111440 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:01:13,629-Speed 9588.07 samples/sec Loss 6.8483 LearningRate 0.0444 Epoch: 6 Global Step: 111450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:14,731-Speed 9295.53 samples/sec Loss 6.9258 LearningRate 0.0444 Epoch: 6 Global Step: 111460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:15,830-Speed 9318.70 samples/sec Loss 6.8595 LearningRate 0.0444 Epoch: 6 Global Step: 111470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:16,899-Speed 9584.77 samples/sec Loss 6.8672 LearningRate 0.0444 Epoch: 6 Global Step: 111480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:18,013-Speed 9203.45 samples/sec Loss 6.8938 LearningRate 0.0444 Epoch: 6 Global Step: 111490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:19,096-Speed 9461.76 samples/sec Loss 6.9371 LearningRate 0.0444 Epoch: 6 Global Step: 111500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:20,163-Speed 9604.96 samples/sec Loss 6.9452 LearningRate 0.0443 Epoch: 6 Global Step: 111510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:21,235-Speed 9563.28 samples/sec Loss 6.8259 LearningRate 0.0443 Epoch: 6 Global Step: 111520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:22,316-Speed 9469.43 samples/sec Loss 6.9966 LearningRate 0.0443 Epoch: 6 Global Step: 111530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:23,369-Speed 9733.50 samples/sec Loss 6.9797 LearningRate 0.0443 Epoch: 6 Global Step: 111540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:24,466-Speed 9342.86 samples/sec Loss 6.8097 LearningRate 0.0443 Epoch: 6 Global Step: 111550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:25,508-Speed 9827.94 samples/sec Loss 6.8132 LearningRate 0.0443 Epoch: 6 Global Step: 111560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:26,574-Speed 9613.71 samples/sec Loss 6.8127 LearningRate 0.0443 Epoch: 6 Global Step: 111570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:27,646-Speed 9561.84 samples/sec Loss 6.8742 LearningRate 0.0443 Epoch: 6 Global Step: 111580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:28,705-Speed 9671.34 samples/sec Loss 6.8303 LearningRate 0.0443 Epoch: 6 Global Step: 111590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:29,743-Speed 9867.88 samples/sec Loss 6.9115 LearningRate 0.0443 Epoch: 6 Global Step: 111600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:30,765-Speed 10036.58 samples/sec Loss 6.8400 LearningRate 0.0443 Epoch: 6 Global Step: 111610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:31,872-Speed 9249.49 samples/sec Loss 6.8820 LearningRate 0.0443 Epoch: 6 Global Step: 111620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:32,952-Speed 9488.30 samples/sec Loss 6.8438 LearningRate 0.0443 Epoch: 6 Global Step: 111630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:33,980-Speed 9969.06 samples/sec Loss 6.8898 LearningRate 0.0443 Epoch: 6 Global Step: 111640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:35,039-Speed 9670.46 samples/sec Loss 6.7121 LearningRate 0.0443 Epoch: 6 Global Step: 111650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:36,145-Speed 9266.10 samples/sec Loss 6.9649 LearningRate 0.0443 Epoch: 6 Global Step: 111660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:37,261-Speed 9187.45 samples/sec Loss 6.6999 LearningRate 0.0443 Epoch: 6 Global Step: 111670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:38,388-Speed 9086.69 samples/sec Loss 6.8389 LearningRate 0.0443 Epoch: 6 Global Step: 111680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:39,425-Speed 9884.12 samples/sec Loss 6.8039 LearningRate 0.0443 Epoch: 6 Global Step: 111690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:40,520-Speed 9359.98 samples/sec Loss 6.9100 LearningRate 0.0443 Epoch: 6 Global Step: 111700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:41,579-Speed 9666.59 samples/sec Loss 6.8757 LearningRate 0.0443 Epoch: 6 Global Step: 111710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:42,618-Speed 9866.27 samples/sec Loss 6.8287 LearningRate 0.0443 Epoch: 6 Global Step: 111720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:43,726-Speed 9248.26 samples/sec Loss 6.9103 LearningRate 0.0443 Epoch: 6 Global Step: 111730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:44,796-Speed 9573.05 samples/sec Loss 6.8765 LearningRate 0.0443 Epoch: 6 Global Step: 111740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:45,846-Speed 9759.17 samples/sec Loss 6.8063 LearningRate 0.0443 Epoch: 6 Global Step: 111750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:46,934-Speed 9411.73 samples/sec Loss 6.8438 LearningRate 0.0443 Epoch: 6 Global Step: 111760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:48,016-Speed 9474.50 samples/sec Loss 6.8698 LearningRate 0.0442 Epoch: 6 Global Step: 111770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:49,109-Speed 9373.86 samples/sec Loss 6.8321 LearningRate 0.0442 Epoch: 6 Global Step: 111780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:50,190-Speed 9482.24 samples/sec Loss 6.8198 LearningRate 0.0442 Epoch: 6 Global Step: 111790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:51,325-Speed 9028.84 samples/sec Loss 6.9205 LearningRate 0.0442 Epoch: 6 Global Step: 111800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:52,403-Speed 9501.74 samples/sec Loss 6.9502 LearningRate 0.0442 Epoch: 6 Global Step: 111810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:53,491-Speed 9413.56 samples/sec Loss 6.8588 LearningRate 0.0442 Epoch: 6 Global Step: 111820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:01:54,580-Speed 9408.19 samples/sec Loss 6.7928 LearningRate 0.0442 Epoch: 6 Global Step: 111830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:55,666-Speed 9440.16 samples/sec Loss 6.9436 LearningRate 0.0442 Epoch: 6 Global Step: 111840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:56,754-Speed 9412.36 samples/sec Loss 6.8722 LearningRate 0.0442 Epoch: 6 Global Step: 111850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:57,873-Speed 9159.42 samples/sec Loss 6.8772 LearningRate 0.0442 Epoch: 6 Global Step: 111860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:01:58,943-Speed 9572.53 samples/sec Loss 6.9544 LearningRate 0.0442 Epoch: 6 Global Step: 111870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:02:00,064-Speed 9144.95 samples/sec Loss 6.9225 LearningRate 0.0442 Epoch: 6 Global Step: 111880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:02:01,166-Speed 9299.48 samples/sec Loss 6.8580 LearningRate 0.0442 Epoch: 6 Global Step: 111890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:02:02,239-Speed 9548.97 samples/sec Loss 6.9132 LearningRate 0.0442 Epoch: 6 Global Step: 111900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:02:03,272-Speed 9911.10 samples/sec Loss 6.8217 LearningRate 0.0442 Epoch: 6 Global Step: 111910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:02:04,322-Speed 9762.31 samples/sec Loss 6.8490 LearningRate 0.0442 Epoch: 6 Global Step: 111920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:02:05,390-Speed 9594.88 samples/sec Loss 6.9522 LearningRate 0.0442 Epoch: 6 Global Step: 111930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:02:06,507-Speed 9169.90 samples/sec Loss 6.8968 LearningRate 0.0442 Epoch: 6 Global Step: 111940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:02:07,556-Speed 9771.36 samples/sec Loss 6.9628 LearningRate 0.0442 Epoch: 6 Global Step: 111950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:02:08,650-Speed 9363.24 samples/sec Loss 6.9879 LearningRate 0.0442 Epoch: 6 Global Step: 111960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:02:09,730-Speed 9489.96 samples/sec Loss 6.8515 LearningRate 0.0442 Epoch: 6 Global Step: 111970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:02:10,786-Speed 9698.86 samples/sec Loss 6.9190 LearningRate 0.0442 Epoch: 6 Global Step: 111980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:02:11,837-Speed 9746.61 samples/sec Loss 6.8941 LearningRate 0.0442 Epoch: 6 Global Step: 111990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:02:12,917-Speed 9491.76 samples/sec Loss 6.9265 LearningRate 0.0442 Epoch: 6 Global Step: 112000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:02:34,791-[lfw][112000]XNorm: 10.958596 Training: 2022-04-11 16:02:34,791-[lfw][112000]Accuracy-Flip: 0.99633+-0.00296 Training: 2022-04-11 16:02:34,792-[lfw][112000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:03:00,101-[cfp_fp][112000]XNorm: 9.443154 Training: 2022-04-11 16:03:00,102-[cfp_fp][112000]Accuracy-Flip: 0.95257+-0.01326 Training: 2022-04-11 16:03:00,103-[cfp_fp][112000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:03:21,988-[agedb_30][112000]XNorm: 10.618512 Training: 2022-04-11 16:03:21,989-[agedb_30][112000]Accuracy-Flip: 0.96017+-0.00970 Training: 2022-04-11 16:03:21,990-[agedb_30][112000]Accuracy-Highest: 0.96483 Training: 2022-04-11 16:03:23,060-Speed 145.99 samples/sec Loss 6.8568 LearningRate 0.0442 Epoch: 6 Global Step: 112010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:24,166-Speed 9261.37 samples/sec Loss 6.9093 LearningRate 0.0441 Epoch: 6 Global Step: 112020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:25,239-Speed 9549.75 samples/sec Loss 6.8922 LearningRate 0.0441 Epoch: 6 Global Step: 112030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:26,271-Speed 9921.72 samples/sec Loss 6.9085 LearningRate 0.0441 Epoch: 6 Global Step: 112040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:27,312-Speed 9847.32 samples/sec Loss 6.9158 LearningRate 0.0441 Epoch: 6 Global Step: 112050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:28,364-Speed 9739.26 samples/sec Loss 6.8637 LearningRate 0.0441 Epoch: 6 Global Step: 112060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:29,401-Speed 9884.63 samples/sec Loss 6.8420 LearningRate 0.0441 Epoch: 6 Global Step: 112070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:30,442-Speed 9837.99 samples/sec Loss 6.9096 LearningRate 0.0441 Epoch: 6 Global Step: 112080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:31,486-Speed 9813.51 samples/sec Loss 6.9852 LearningRate 0.0441 Epoch: 6 Global Step: 112090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:32,542-Speed 9706.27 samples/sec Loss 6.8462 LearningRate 0.0441 Epoch: 6 Global Step: 112100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:33,621-Speed 9494.48 samples/sec Loss 6.8572 LearningRate 0.0441 Epoch: 6 Global Step: 112110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:34,664-Speed 9821.89 samples/sec Loss 6.7703 LearningRate 0.0441 Epoch: 6 Global Step: 112120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:35,744-Speed 9490.17 samples/sec Loss 6.7991 LearningRate 0.0441 Epoch: 6 Global Step: 112130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:36,874-Speed 9065.87 samples/sec Loss 6.9030 LearningRate 0.0441 Epoch: 6 Global Step: 112140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:37,990-Speed 9183.85 samples/sec Loss 6.9687 LearningRate 0.0441 Epoch: 6 Global Step: 112150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:39,098-Speed 9246.70 samples/sec Loss 6.9746 LearningRate 0.0441 Epoch: 6 Global Step: 112160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:40,185-Speed 9423.63 samples/sec Loss 6.8918 LearningRate 0.0441 Epoch: 6 Global Step: 112170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:41,252-Speed 9606.95 samples/sec Loss 6.8111 LearningRate 0.0441 Epoch: 6 Global Step: 112180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:42,302-Speed 9765.71 samples/sec Loss 6.7345 LearningRate 0.0441 Epoch: 6 Global Step: 112190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:43,370-Speed 9585.28 samples/sec Loss 6.7768 LearningRate 0.0441 Epoch: 6 Global Step: 112200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:44,423-Speed 9734.49 samples/sec Loss 6.9237 LearningRate 0.0441 Epoch: 6 Global Step: 112210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:45,499-Speed 9519.75 samples/sec Loss 6.8183 LearningRate 0.0441 Epoch: 6 Global Step: 112220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:46,546-Speed 9786.61 samples/sec Loss 6.9109 LearningRate 0.0441 Epoch: 6 Global Step: 112230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:47,608-Speed 9651.39 samples/sec Loss 6.8918 LearningRate 0.0441 Epoch: 6 Global Step: 112240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:48,670-Speed 9647.94 samples/sec Loss 6.9182 LearningRate 0.0441 Epoch: 6 Global Step: 112250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:03:49,698-Speed 9973.17 samples/sec Loss 6.8322 LearningRate 0.0441 Epoch: 6 Global Step: 112260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:50,774-Speed 9521.10 samples/sec Loss 6.9543 LearningRate 0.0440 Epoch: 6 Global Step: 112270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:51,842-Speed 9592.93 samples/sec Loss 6.9652 LearningRate 0.0440 Epoch: 6 Global Step: 112280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:52,891-Speed 9765.44 samples/sec Loss 6.8450 LearningRate 0.0440 Epoch: 6 Global Step: 112290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:53,959-Speed 9590.34 samples/sec Loss 6.8746 LearningRate 0.0440 Epoch: 6 Global Step: 112300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:55,049-Speed 9399.25 samples/sec Loss 6.7918 LearningRate 0.0440 Epoch: 6 Global Step: 112310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:56,131-Speed 9467.75 samples/sec Loss 6.8601 LearningRate 0.0440 Epoch: 6 Global Step: 112320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:57,279-Speed 8928.42 samples/sec Loss 6.8623 LearningRate 0.0440 Epoch: 6 Global Step: 112330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:58,361-Speed 9468.86 samples/sec Loss 6.8906 LearningRate 0.0440 Epoch: 6 Global Step: 112340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:03:59,407-Speed 9801.41 samples/sec Loss 6.9064 LearningRate 0.0440 Epoch: 6 Global Step: 112350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:00,489-Speed 9475.15 samples/sec Loss 6.7749 LearningRate 0.0440 Epoch: 6 Global Step: 112360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:01,595-Speed 9260.78 samples/sec Loss 6.7579 LearningRate 0.0440 Epoch: 6 Global Step: 112370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:02,675-Speed 9484.95 samples/sec Loss 6.8525 LearningRate 0.0440 Epoch: 6 Global Step: 112380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:03,752-Speed 9509.98 samples/sec Loss 6.8831 LearningRate 0.0440 Epoch: 6 Global Step: 112390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:04,839-Speed 9426.44 samples/sec Loss 6.8487 LearningRate 0.0440 Epoch: 6 Global Step: 112400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:05,878-Speed 9859.85 samples/sec Loss 6.8909 LearningRate 0.0440 Epoch: 6 Global Step: 112410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:06,947-Speed 9589.29 samples/sec Loss 6.8370 LearningRate 0.0440 Epoch: 6 Global Step: 112420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:07,999-Speed 9733.80 samples/sec Loss 6.9084 LearningRate 0.0440 Epoch: 6 Global Step: 112430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:09,056-Speed 9692.65 samples/sec Loss 6.8248 LearningRate 0.0440 Epoch: 6 Global Step: 112440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:10,117-Speed 9664.78 samples/sec Loss 6.8918 LearningRate 0.0440 Epoch: 6 Global Step: 112450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:11,191-Speed 9535.84 samples/sec Loss 6.7859 LearningRate 0.0440 Epoch: 6 Global Step: 112460 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:04:12,232-Speed 9845.03 samples/sec Loss 6.9633 LearningRate 0.0440 Epoch: 6 Global Step: 112470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:13,318-Speed 9431.08 samples/sec Loss 6.8837 LearningRate 0.0440 Epoch: 6 Global Step: 112480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:14,427-Speed 9240.93 samples/sec Loss 6.8832 LearningRate 0.0440 Epoch: 6 Global Step: 112490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:15,546-Speed 9155.08 samples/sec Loss 7.0007 LearningRate 0.0440 Epoch: 6 Global Step: 112500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:16,623-Speed 9520.52 samples/sec Loss 6.8612 LearningRate 0.0440 Epoch: 6 Global Step: 112510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:17,705-Speed 9537.58 samples/sec Loss 6.8551 LearningRate 0.0439 Epoch: 6 Global Step: 112520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:18,763-Speed 9680.44 samples/sec Loss 6.7893 LearningRate 0.0439 Epoch: 6 Global Step: 112530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:19,822-Speed 9676.29 samples/sec Loss 6.8326 LearningRate 0.0439 Epoch: 6 Global Step: 112540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:20,854-Speed 9923.42 samples/sec Loss 6.8176 LearningRate 0.0439 Epoch: 6 Global Step: 112550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:21,915-Speed 9659.21 samples/sec Loss 6.7855 LearningRate 0.0439 Epoch: 6 Global Step: 112560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:23,032-Speed 9173.94 samples/sec Loss 6.8726 LearningRate 0.0439 Epoch: 6 Global Step: 112570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:24,154-Speed 9133.05 samples/sec Loss 6.8729 LearningRate 0.0439 Epoch: 6 Global Step: 112580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:25,199-Speed 9801.05 samples/sec Loss 6.8374 LearningRate 0.0439 Epoch: 6 Global Step: 112590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:26,249-Speed 9759.23 samples/sec Loss 6.8448 LearningRate 0.0439 Epoch: 6 Global Step: 112600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:27,319-Speed 9574.18 samples/sec Loss 6.8476 LearningRate 0.0439 Epoch: 6 Global Step: 112610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:28,371-Speed 9734.66 samples/sec Loss 6.8083 LearningRate 0.0439 Epoch: 6 Global Step: 112620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:29,416-Speed 9807.91 samples/sec Loss 6.7934 LearningRate 0.0439 Epoch: 6 Global Step: 112630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:30,508-Speed 9386.08 samples/sec Loss 6.9124 LearningRate 0.0439 Epoch: 6 Global Step: 112640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:31,569-Speed 9654.22 samples/sec Loss 6.8770 LearningRate 0.0439 Epoch: 6 Global Step: 112650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:32,634-Speed 9625.64 samples/sec Loss 6.8555 LearningRate 0.0439 Epoch: 6 Global Step: 112660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:33,690-Speed 9699.81 samples/sec Loss 6.8761 LearningRate 0.0439 Epoch: 6 Global Step: 112670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:34,731-Speed 9843.88 samples/sec Loss 6.8279 LearningRate 0.0439 Epoch: 6 Global Step: 112680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:35,761-Speed 9949.75 samples/sec Loss 6.8648 LearningRate 0.0439 Epoch: 6 Global Step: 112690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:36,826-Speed 9619.03 samples/sec Loss 6.7789 LearningRate 0.0439 Epoch: 6 Global Step: 112700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:37,868-Speed 9833.80 samples/sec Loss 6.8849 LearningRate 0.0439 Epoch: 6 Global Step: 112710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:38,905-Speed 9878.67 samples/sec Loss 6.8620 LearningRate 0.0439 Epoch: 6 Global Step: 112720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:39,961-Speed 9700.79 samples/sec Loss 6.8454 LearningRate 0.0439 Epoch: 6 Global Step: 112730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:41,005-Speed 9827.25 samples/sec Loss 7.0040 LearningRate 0.0439 Epoch: 6 Global Step: 112740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:42,101-Speed 9345.60 samples/sec Loss 6.8671 LearningRate 0.0439 Epoch: 6 Global Step: 112750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:43,177-Speed 9522.76 samples/sec Loss 6.8715 LearningRate 0.0439 Epoch: 6 Global Step: 112760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:44,243-Speed 9608.38 samples/sec Loss 6.9838 LearningRate 0.0438 Epoch: 6 Global Step: 112770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:45,337-Speed 9368.42 samples/sec Loss 6.8367 LearningRate 0.0438 Epoch: 6 Global Step: 112780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:46,431-Speed 9359.37 samples/sec Loss 6.8663 LearningRate 0.0438 Epoch: 6 Global Step: 112790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:47,491-Speed 9679.92 samples/sec Loss 6.8439 LearningRate 0.0438 Epoch: 6 Global Step: 112800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:48,568-Speed 9506.28 samples/sec Loss 6.9632 LearningRate 0.0438 Epoch: 6 Global Step: 112810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:49,662-Speed 9364.74 samples/sec Loss 6.9588 LearningRate 0.0438 Epoch: 6 Global Step: 112820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:50,806-Speed 8960.07 samples/sec Loss 6.8738 LearningRate 0.0438 Epoch: 6 Global Step: 112830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:51,900-Speed 9367.65 samples/sec Loss 6.9314 LearningRate 0.0438 Epoch: 6 Global Step: 112840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:53,007-Speed 9256.02 samples/sec Loss 6.7429 LearningRate 0.0438 Epoch: 6 Global Step: 112850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:54,108-Speed 9301.79 samples/sec Loss 6.8508 LearningRate 0.0438 Epoch: 6 Global Step: 112860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:55,208-Speed 9315.70 samples/sec Loss 6.8397 LearningRate 0.0438 Epoch: 6 Global Step: 112870 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:04:56,332-Speed 9114.59 samples/sec Loss 6.8224 LearningRate 0.0438 Epoch: 6 Global Step: 112880 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:04:57,405-Speed 9552.90 samples/sec Loss 6.7735 LearningRate 0.0438 Epoch: 6 Global Step: 112890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:04:58,496-Speed 9386.50 samples/sec Loss 6.8806 LearningRate 0.0438 Epoch: 6 Global Step: 112900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:04:59,608-Speed 9218.81 samples/sec Loss 6.8589 LearningRate 0.0438 Epoch: 6 Global Step: 112910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:00,672-Speed 9634.35 samples/sec Loss 6.8716 LearningRate 0.0438 Epoch: 6 Global Step: 112920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:01,771-Speed 9320.76 samples/sec Loss 6.7288 LearningRate 0.0438 Epoch: 6 Global Step: 112930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:02,842-Speed 9563.71 samples/sec Loss 7.0437 LearningRate 0.0438 Epoch: 6 Global Step: 112940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:03,909-Speed 9602.12 samples/sec Loss 6.8021 LearningRate 0.0438 Epoch: 6 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:05,003-Speed 9370.44 samples/sec Loss 6.8515 LearningRate 0.0438 Epoch: 6 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:06,105-Speed 9291.85 samples/sec Loss 6.8994 LearningRate 0.0438 Epoch: 6 Global Step: 112970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:07,231-Speed 9106.26 samples/sec Loss 6.8206 LearningRate 0.0438 Epoch: 6 Global Step: 112980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:08,300-Speed 9584.88 samples/sec Loss 6.8237 LearningRate 0.0438 Epoch: 6 Global Step: 112990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:09,405-Speed 9270.49 samples/sec Loss 6.8414 LearningRate 0.0438 Epoch: 6 Global Step: 113000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:10,475-Speed 9573.83 samples/sec Loss 6.8664 LearningRate 0.0438 Epoch: 6 Global Step: 113010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:11,540-Speed 9623.22 samples/sec Loss 6.9245 LearningRate 0.0437 Epoch: 6 Global Step: 113020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:12,653-Speed 9211.52 samples/sec Loss 6.8395 LearningRate 0.0437 Epoch: 6 Global Step: 113030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:13,742-Speed 9407.96 samples/sec Loss 6.9235 LearningRate 0.0437 Epoch: 6 Global Step: 113040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:14,796-Speed 9725.91 samples/sec Loss 6.8534 LearningRate 0.0437 Epoch: 6 Global Step: 113050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:15,889-Speed 9373.12 samples/sec Loss 6.9424 LearningRate 0.0437 Epoch: 6 Global Step: 113060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:16,955-Speed 9614.15 samples/sec Loss 6.9442 LearningRate 0.0437 Epoch: 6 Global Step: 113070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:18,030-Speed 9531.47 samples/sec Loss 7.0037 LearningRate 0.0437 Epoch: 6 Global Step: 113080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:19,105-Speed 9527.94 samples/sec Loss 6.7850 LearningRate 0.0437 Epoch: 6 Global Step: 113090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:20,199-Speed 9371.02 samples/sec Loss 6.8787 LearningRate 0.0437 Epoch: 6 Global Step: 113100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:21,247-Speed 9771.28 samples/sec Loss 6.9058 LearningRate 0.0437 Epoch: 6 Global Step: 113110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:22,365-Speed 9168.49 samples/sec Loss 6.8426 LearningRate 0.0437 Epoch: 6 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:23,434-Speed 9580.55 samples/sec Loss 6.8613 LearningRate 0.0437 Epoch: 6 Global Step: 113130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:24,537-Speed 9288.40 samples/sec Loss 6.9343 LearningRate 0.0437 Epoch: 6 Global Step: 113140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:25,631-Speed 9369.15 samples/sec Loss 6.9522 LearningRate 0.0437 Epoch: 6 Global Step: 113150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:26,723-Speed 9378.10 samples/sec Loss 6.7783 LearningRate 0.0437 Epoch: 6 Global Step: 113160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:27,822-Speed 9325.19 samples/sec Loss 6.8294 LearningRate 0.0437 Epoch: 6 Global Step: 113170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:28,894-Speed 9553.55 samples/sec Loss 6.8603 LearningRate 0.0437 Epoch: 6 Global Step: 113180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:30,013-Speed 9158.90 samples/sec Loss 6.8222 LearningRate 0.0437 Epoch: 6 Global Step: 113190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:31,067-Speed 9721.62 samples/sec Loss 6.7855 LearningRate 0.0437 Epoch: 6 Global Step: 113200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:32,124-Speed 9694.97 samples/sec Loss 6.8958 LearningRate 0.0437 Epoch: 6 Global Step: 113210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:33,163-Speed 9868.71 samples/sec Loss 6.9713 LearningRate 0.0437 Epoch: 6 Global Step: 113220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:34,200-Speed 9871.20 samples/sec Loss 6.8220 LearningRate 0.0437 Epoch: 6 Global Step: 113230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:35,320-Speed 9154.92 samples/sec Loss 6.8761 LearningRate 0.0437 Epoch: 6 Global Step: 113240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:36,412-Speed 9376.08 samples/sec Loss 6.9850 LearningRate 0.0437 Epoch: 6 Global Step: 113250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:37,490-Speed 9509.91 samples/sec Loss 6.9873 LearningRate 0.0437 Epoch: 6 Global Step: 113260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:38,527-Speed 9878.28 samples/sec Loss 6.8707 LearningRate 0.0437 Epoch: 6 Global Step: 113270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:39,563-Speed 9894.46 samples/sec Loss 6.9093 LearningRate 0.0436 Epoch: 6 Global Step: 113280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:40,672-Speed 9239.16 samples/sec Loss 6.8529 LearningRate 0.0436 Epoch: 6 Global Step: 113290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:41,769-Speed 9341.81 samples/sec Loss 6.8712 LearningRate 0.0436 Epoch: 6 Global Step: 113300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:42,863-Speed 9364.42 samples/sec Loss 6.8188 LearningRate 0.0436 Epoch: 6 Global Step: 113310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:43,938-Speed 9530.11 samples/sec Loss 6.8011 LearningRate 0.0436 Epoch: 6 Global Step: 113320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:45,029-Speed 9398.34 samples/sec Loss 6.9563 LearningRate 0.0436 Epoch: 6 Global Step: 113330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:46,092-Speed 9631.60 samples/sec Loss 6.8075 LearningRate 0.0436 Epoch: 6 Global Step: 113340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:47,163-Speed 9568.68 samples/sec Loss 6.8533 LearningRate 0.0436 Epoch: 6 Global Step: 113350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:48,260-Speed 9343.17 samples/sec Loss 6.7580 LearningRate 0.0436 Epoch: 6 Global Step: 113360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:49,306-Speed 9797.46 samples/sec Loss 6.8635 LearningRate 0.0436 Epoch: 6 Global Step: 113370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:50,349-Speed 9821.01 samples/sec Loss 6.9523 LearningRate 0.0436 Epoch: 6 Global Step: 113380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:51,386-Speed 9882.06 samples/sec Loss 6.7795 LearningRate 0.0436 Epoch: 6 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:52,473-Speed 9428.76 samples/sec Loss 6.8667 LearningRate 0.0436 Epoch: 6 Global Step: 113400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:53,527-Speed 9722.41 samples/sec Loss 6.7915 LearningRate 0.0436 Epoch: 6 Global Step: 113410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:54,576-Speed 9764.90 samples/sec Loss 6.8690 LearningRate 0.0436 Epoch: 6 Global Step: 113420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:55,662-Speed 9437.72 samples/sec Loss 6.8969 LearningRate 0.0436 Epoch: 6 Global Step: 113430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:05:56,738-Speed 9515.13 samples/sec Loss 6.9839 LearningRate 0.0436 Epoch: 6 Global Step: 113440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:57,831-Speed 9374.72 samples/sec Loss 6.9241 LearningRate 0.0436 Epoch: 6 Global Step: 113450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:58,905-Speed 9540.57 samples/sec Loss 6.7680 LearningRate 0.0436 Epoch: 6 Global Step: 113460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:05:59,966-Speed 9659.29 samples/sec Loss 6.8311 LearningRate 0.0436 Epoch: 6 Global Step: 113470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:01,014-Speed 9785.31 samples/sec Loss 6.7965 LearningRate 0.0436 Epoch: 6 Global Step: 113480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:02,079-Speed 9616.38 samples/sec Loss 6.7541 LearningRate 0.0436 Epoch: 6 Global Step: 113490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:03,138-Speed 9673.73 samples/sec Loss 6.8375 LearningRate 0.0436 Epoch: 6 Global Step: 113500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:04,226-Speed 9418.36 samples/sec Loss 6.8163 LearningRate 0.0436 Epoch: 6 Global Step: 113510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:05,300-Speed 9536.97 samples/sec Loss 6.8234 LearningRate 0.0436 Epoch: 6 Global Step: 113520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:06,382-Speed 9471.99 samples/sec Loss 6.8283 LearningRate 0.0435 Epoch: 6 Global Step: 113530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:07,449-Speed 9597.97 samples/sec Loss 6.7667 LearningRate 0.0435 Epoch: 6 Global Step: 113540 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:06:08,536-Speed 9433.62 samples/sec Loss 6.9030 LearningRate 0.0435 Epoch: 6 Global Step: 113550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:09,679-Speed 8964.64 samples/sec Loss 6.7968 LearningRate 0.0435 Epoch: 6 Global Step: 113560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:10,789-Speed 9232.11 samples/sec Loss 6.8331 LearningRate 0.0435 Epoch: 6 Global Step: 113570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:11,873-Speed 9448.54 samples/sec Loss 6.8697 LearningRate 0.0435 Epoch: 6 Global Step: 113580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:12,974-Speed 9312.92 samples/sec Loss 6.8072 LearningRate 0.0435 Epoch: 6 Global Step: 113590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:14,059-Speed 9439.56 samples/sec Loss 6.8128 LearningRate 0.0435 Epoch: 6 Global Step: 113600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:15,093-Speed 9910.21 samples/sec Loss 6.8948 LearningRate 0.0435 Epoch: 6 Global Step: 113610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:16,123-Speed 9939.99 samples/sec Loss 6.8874 LearningRate 0.0435 Epoch: 6 Global Step: 113620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:17,176-Speed 9732.54 samples/sec Loss 6.7947 LearningRate 0.0435 Epoch: 6 Global Step: 113630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:18,223-Speed 9787.94 samples/sec Loss 6.8225 LearningRate 0.0435 Epoch: 6 Global Step: 113640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:19,327-Speed 9280.72 samples/sec Loss 6.8263 LearningRate 0.0435 Epoch: 6 Global Step: 113650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:20,437-Speed 9234.43 samples/sec Loss 6.8228 LearningRate 0.0435 Epoch: 6 Global Step: 113660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:21,500-Speed 9634.11 samples/sec Loss 6.8152 LearningRate 0.0435 Epoch: 6 Global Step: 113670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:22,568-Speed 9593.35 samples/sec Loss 6.8019 LearningRate 0.0435 Epoch: 6 Global Step: 113680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:23,624-Speed 9698.78 samples/sec Loss 6.8817 LearningRate 0.0435 Epoch: 6 Global Step: 113690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:24,688-Speed 9631.79 samples/sec Loss 6.7638 LearningRate 0.0435 Epoch: 6 Global Step: 113700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:25,753-Speed 9620.03 samples/sec Loss 6.9047 LearningRate 0.0435 Epoch: 6 Global Step: 113710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:26,817-Speed 9633.78 samples/sec Loss 6.8736 LearningRate 0.0435 Epoch: 6 Global Step: 113720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:27,901-Speed 9452.36 samples/sec Loss 6.7726 LearningRate 0.0435 Epoch: 6 Global Step: 113730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:29,035-Speed 9038.82 samples/sec Loss 6.7801 LearningRate 0.0435 Epoch: 6 Global Step: 113740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:30,102-Speed 9599.36 samples/sec Loss 6.8822 LearningRate 0.0435 Epoch: 6 Global Step: 113750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:31,171-Speed 9593.92 samples/sec Loss 6.7772 LearningRate 0.0435 Epoch: 6 Global Step: 113760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:32,354-Speed 8653.98 samples/sec Loss 6.9146 LearningRate 0.0435 Epoch: 6 Global Step: 113770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:33,411-Speed 9696.18 samples/sec Loss 6.8935 LearningRate 0.0434 Epoch: 6 Global Step: 113780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:34,431-Speed 10050.69 samples/sec Loss 6.8482 LearningRate 0.0434 Epoch: 6 Global Step: 113790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:35,501-Speed 9570.62 samples/sec Loss 6.8577 LearningRate 0.0434 Epoch: 6 Global Step: 113800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:36,611-Speed 9226.90 samples/sec Loss 6.9195 LearningRate 0.0434 Epoch: 6 Global Step: 113810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:37,692-Speed 9483.47 samples/sec Loss 6.8325 LearningRate 0.0434 Epoch: 6 Global Step: 113820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:38,826-Speed 9032.32 samples/sec Loss 6.9460 LearningRate 0.0434 Epoch: 6 Global Step: 113830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:39,906-Speed 9483.30 samples/sec Loss 6.7879 LearningRate 0.0434 Epoch: 6 Global Step: 113840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:41,008-Speed 9303.57 samples/sec Loss 6.9016 LearningRate 0.0434 Epoch: 6 Global Step: 113850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:42,105-Speed 9338.50 samples/sec Loss 6.8857 LearningRate 0.0434 Epoch: 6 Global Step: 113860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:43,208-Speed 9288.19 samples/sec Loss 6.7898 LearningRate 0.0434 Epoch: 6 Global Step: 113870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:44,334-Speed 9100.75 samples/sec Loss 6.9211 LearningRate 0.0434 Epoch: 6 Global Step: 113880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:45,406-Speed 9557.58 samples/sec Loss 6.9295 LearningRate 0.0434 Epoch: 6 Global Step: 113890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:46,498-Speed 9388.27 samples/sec Loss 6.8283 LearningRate 0.0434 Epoch: 6 Global Step: 113900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:47,538-Speed 9845.97 samples/sec Loss 6.9163 LearningRate 0.0434 Epoch: 6 Global Step: 113910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:48,634-Speed 9354.67 samples/sec Loss 6.8433 LearningRate 0.0434 Epoch: 6 Global Step: 113920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:49,727-Speed 9374.45 samples/sec Loss 6.8531 LearningRate 0.0434 Epoch: 6 Global Step: 113930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:50,822-Speed 9353.61 samples/sec Loss 6.8794 LearningRate 0.0434 Epoch: 6 Global Step: 113940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:51,921-Speed 9323.56 samples/sec Loss 6.7360 LearningRate 0.0434 Epoch: 6 Global Step: 113950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:52,989-Speed 9592.86 samples/sec Loss 6.8186 LearningRate 0.0434 Epoch: 6 Global Step: 113960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:54,078-Speed 9404.30 samples/sec Loss 6.7877 LearningRate 0.0434 Epoch: 6 Global Step: 113970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:06:55,154-Speed 9527.13 samples/sec Loss 6.7925 LearningRate 0.0434 Epoch: 6 Global Step: 113980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:56,224-Speed 9573.59 samples/sec Loss 6.8506 LearningRate 0.0434 Epoch: 6 Global Step: 113990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:06:57,287-Speed 9641.68 samples/sec Loss 6.7944 LearningRate 0.0434 Epoch: 6 Global Step: 114000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:07:19,335-[lfw][114000]XNorm: 11.049207 Training: 2022-04-11 16:07:19,336-[lfw][114000]Accuracy-Flip: 0.99600+-0.00260 Training: 2022-04-11 16:07:19,337-[lfw][114000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:07:44,774-[cfp_fp][114000]XNorm: 9.426555 Training: 2022-04-11 16:07:44,775-[cfp_fp][114000]Accuracy-Flip: 0.96071+-0.00978 Training: 2022-04-11 16:07:44,775-[cfp_fp][114000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:08:06,736-[agedb_30][114000]XNorm: 10.742994 Training: 2022-04-11 16:08:06,737-[agedb_30][114000]Accuracy-Flip: 0.96217+-0.01019 Training: 2022-04-11 16:08:06,737-[agedb_30][114000]Accuracy-Highest: 0.96483 Training: 2022-04-11 16:08:07,815-Speed 145.19 samples/sec Loss 6.9462 LearningRate 0.0434 Epoch: 6 Global Step: 114010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:08,869-Speed 9727.13 samples/sec Loss 6.9183 LearningRate 0.0434 Epoch: 6 Global Step: 114020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:09,929-Speed 9662.77 samples/sec Loss 6.8832 LearningRate 0.0434 Epoch: 6 Global Step: 114030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:11,000-Speed 9568.92 samples/sec Loss 6.7718 LearningRate 0.0433 Epoch: 6 Global Step: 114040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:08:12,127-Speed 9096.65 samples/sec Loss 6.9774 LearningRate 0.0433 Epoch: 6 Global Step: 114050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:08:13,180-Speed 9721.81 samples/sec Loss 6.8830 LearningRate 0.0433 Epoch: 6 Global Step: 114060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:08:14,259-Speed 9500.86 samples/sec Loss 6.8161 LearningRate 0.0433 Epoch: 6 Global Step: 114070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:08:15,340-Speed 9481.94 samples/sec Loss 6.8143 LearningRate 0.0433 Epoch: 6 Global Step: 114080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:08:16,422-Speed 9465.00 samples/sec Loss 7.0218 LearningRate 0.0433 Epoch: 6 Global Step: 114090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:08:17,499-Speed 9519.47 samples/sec Loss 6.9187 LearningRate 0.0433 Epoch: 6 Global Step: 114100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:08:18,560-Speed 9658.92 samples/sec Loss 6.8028 LearningRate 0.0433 Epoch: 6 Global Step: 114110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:08:19,622-Speed 9641.95 samples/sec Loss 6.8369 LearningRate 0.0433 Epoch: 6 Global Step: 114120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:08:20,666-Speed 9818.48 samples/sec Loss 6.7271 LearningRate 0.0433 Epoch: 6 Global Step: 114130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:08:21,739-Speed 9550.97 samples/sec Loss 6.7614 LearningRate 0.0433 Epoch: 6 Global Step: 114140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:22,788-Speed 9766.49 samples/sec Loss 6.8543 LearningRate 0.0433 Epoch: 6 Global Step: 114150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:23,833-Speed 9803.78 samples/sec Loss 6.8941 LearningRate 0.0433 Epoch: 6 Global Step: 114160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:24,951-Speed 9165.76 samples/sec Loss 6.7954 LearningRate 0.0433 Epoch: 6 Global Step: 114170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:26,018-Speed 9600.37 samples/sec Loss 6.7496 LearningRate 0.0433 Epoch: 6 Global Step: 114180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:27,090-Speed 9557.99 samples/sec Loss 6.7684 LearningRate 0.0433 Epoch: 6 Global Step: 114190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:28,175-Speed 9445.54 samples/sec Loss 6.8029 LearningRate 0.0433 Epoch: 6 Global Step: 114200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:29,236-Speed 9655.04 samples/sec Loss 6.8080 LearningRate 0.0433 Epoch: 6 Global Step: 114210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:30,278-Speed 9840.14 samples/sec Loss 6.8040 LearningRate 0.0433 Epoch: 6 Global Step: 114220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:31,334-Speed 9697.20 samples/sec Loss 6.7976 LearningRate 0.0433 Epoch: 6 Global Step: 114230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:32,369-Speed 9903.78 samples/sec Loss 6.7872 LearningRate 0.0433 Epoch: 6 Global Step: 114240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:33,465-Speed 9349.48 samples/sec Loss 6.8554 LearningRate 0.0433 Epoch: 6 Global Step: 114250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:34,583-Speed 9164.36 samples/sec Loss 6.8822 LearningRate 0.0433 Epoch: 6 Global Step: 114260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:35,714-Speed 9056.78 samples/sec Loss 6.9090 LearningRate 0.0433 Epoch: 6 Global Step: 114270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:36,834-Speed 9145.81 samples/sec Loss 6.7741 LearningRate 0.0433 Epoch: 6 Global Step: 114280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:37,925-Speed 9396.26 samples/sec Loss 6.7206 LearningRate 0.0432 Epoch: 6 Global Step: 114290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:39,003-Speed 9500.52 samples/sec Loss 6.7960 LearningRate 0.0432 Epoch: 6 Global Step: 114300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:40,108-Speed 9271.27 samples/sec Loss 6.9794 LearningRate 0.0432 Epoch: 6 Global Step: 114310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:41,195-Speed 9430.17 samples/sec Loss 6.7242 LearningRate 0.0432 Epoch: 6 Global Step: 114320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:42,329-Speed 9034.02 samples/sec Loss 6.8284 LearningRate 0.0432 Epoch: 6 Global Step: 114330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:43,365-Speed 9896.16 samples/sec Loss 6.7879 LearningRate 0.0432 Epoch: 6 Global Step: 114340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:08:44,436-Speed 9569.54 samples/sec Loss 6.8012 LearningRate 0.0432 Epoch: 6 Global Step: 114350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:45,511-Speed 9530.31 samples/sec Loss 6.7594 LearningRate 0.0432 Epoch: 6 Global Step: 114360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:46,605-Speed 9357.89 samples/sec Loss 6.8420 LearningRate 0.0432 Epoch: 6 Global Step: 114370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:47,733-Speed 9084.82 samples/sec Loss 6.8703 LearningRate 0.0432 Epoch: 6 Global Step: 114380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:48,796-Speed 9641.11 samples/sec Loss 6.8474 LearningRate 0.0432 Epoch: 6 Global Step: 114390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:49,874-Speed 9500.65 samples/sec Loss 6.8054 LearningRate 0.0432 Epoch: 6 Global Step: 114400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:50,949-Speed 9534.99 samples/sec Loss 6.8133 LearningRate 0.0432 Epoch: 6 Global Step: 114410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:52,029-Speed 9482.10 samples/sec Loss 6.8045 LearningRate 0.0432 Epoch: 6 Global Step: 114420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:53,083-Speed 9725.16 samples/sec Loss 6.9039 LearningRate 0.0432 Epoch: 6 Global Step: 114430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:54,177-Speed 9371.20 samples/sec Loss 6.8645 LearningRate 0.0432 Epoch: 6 Global Step: 114440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:55,241-Speed 9633.05 samples/sec Loss 6.9066 LearningRate 0.0432 Epoch: 6 Global Step: 114450 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:08:56,314-Speed 9552.32 samples/sec Loss 6.7391 LearningRate 0.0432 Epoch: 6 Global Step: 114460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:57,383-Speed 9582.76 samples/sec Loss 6.8543 LearningRate 0.0432 Epoch: 6 Global Step: 114470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:58,486-Speed 9293.47 samples/sec Loss 6.7097 LearningRate 0.0432 Epoch: 6 Global Step: 114480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:08:59,541-Speed 9709.50 samples/sec Loss 6.8443 LearningRate 0.0432 Epoch: 6 Global Step: 114490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:00,564-Speed 10015.32 samples/sec Loss 6.9483 LearningRate 0.0432 Epoch: 6 Global Step: 114500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:01,625-Speed 9654.83 samples/sec Loss 6.8726 LearningRate 0.0432 Epoch: 6 Global Step: 114510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:02,723-Speed 9338.39 samples/sec Loss 6.7328 LearningRate 0.0432 Epoch: 6 Global Step: 114520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:03,761-Speed 9864.10 samples/sec Loss 6.7743 LearningRate 0.0432 Epoch: 6 Global Step: 114530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:04,826-Speed 9624.88 samples/sec Loss 6.7984 LearningRate 0.0431 Epoch: 6 Global Step: 114540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:05,856-Speed 9943.85 samples/sec Loss 6.8302 LearningRate 0.0431 Epoch: 6 Global Step: 114550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:06,953-Speed 9347.30 samples/sec Loss 6.7051 LearningRate 0.0431 Epoch: 6 Global Step: 114560 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:09:08,049-Speed 9348.62 samples/sec Loss 6.7488 LearningRate 0.0431 Epoch: 6 Global Step: 114570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:09,159-Speed 9230.35 samples/sec Loss 6.7684 LearningRate 0.0431 Epoch: 6 Global Step: 114580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:10,225-Speed 9603.86 samples/sec Loss 6.7623 LearningRate 0.0431 Epoch: 6 Global Step: 114590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:11,298-Speed 9552.64 samples/sec Loss 6.7194 LearningRate 0.0431 Epoch: 6 Global Step: 114600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:12,425-Speed 9092.14 samples/sec Loss 6.8962 LearningRate 0.0431 Epoch: 6 Global Step: 114610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:13,553-Speed 9080.20 samples/sec Loss 6.7582 LearningRate 0.0431 Epoch: 6 Global Step: 114620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:14,671-Speed 9170.73 samples/sec Loss 6.9024 LearningRate 0.0431 Epoch: 6 Global Step: 114630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:15,754-Speed 9464.83 samples/sec Loss 6.8303 LearningRate 0.0431 Epoch: 6 Global Step: 114640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:09:16,840-Speed 9430.22 samples/sec Loss 6.9189 LearningRate 0.0431 Epoch: 6 Global Step: 114650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:09:17,921-Speed 9481.82 samples/sec Loss 6.8294 LearningRate 0.0431 Epoch: 6 Global Step: 114660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:09:18,956-Speed 9891.75 samples/sec Loss 6.9234 LearningRate 0.0431 Epoch: 6 Global Step: 114670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:09:20,031-Speed 9537.42 samples/sec Loss 6.8185 LearningRate 0.0431 Epoch: 6 Global Step: 114680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:09:21,087-Speed 9696.47 samples/sec Loss 6.7062 LearningRate 0.0431 Epoch: 6 Global Step: 114690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:09:22,130-Speed 9821.46 samples/sec Loss 6.8912 LearningRate 0.0431 Epoch: 6 Global Step: 114700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:09:23,273-Speed 8963.35 samples/sec Loss 6.8557 LearningRate 0.0431 Epoch: 6 Global Step: 114710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:09:24,314-Speed 9853.62 samples/sec Loss 6.8634 LearningRate 0.0431 Epoch: 6 Global Step: 114720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:09:25,378-Speed 9624.79 samples/sec Loss 6.7731 LearningRate 0.0431 Epoch: 6 Global Step: 114730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:09:26,427-Speed 9763.99 samples/sec Loss 6.7786 LearningRate 0.0431 Epoch: 6 Global Step: 114740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:27,528-Speed 9310.92 samples/sec Loss 6.9691 LearningRate 0.0431 Epoch: 6 Global Step: 114750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:28,625-Speed 9338.92 samples/sec Loss 6.7982 LearningRate 0.0431 Epoch: 6 Global Step: 114760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:29,716-Speed 9392.96 samples/sec Loss 6.8003 LearningRate 0.0431 Epoch: 6 Global Step: 114770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:30,782-Speed 9612.06 samples/sec Loss 6.8139 LearningRate 0.0431 Epoch: 6 Global Step: 114780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:31,866-Speed 9449.86 samples/sec Loss 6.8156 LearningRate 0.0431 Epoch: 6 Global Step: 114790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:33,020-Speed 8882.27 samples/sec Loss 6.8522 LearningRate 0.0430 Epoch: 6 Global Step: 114800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:34,117-Speed 9338.68 samples/sec Loss 6.9242 LearningRate 0.0430 Epoch: 6 Global Step: 114810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:35,157-Speed 9854.45 samples/sec Loss 6.8680 LearningRate 0.0430 Epoch: 6 Global Step: 114820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:36,249-Speed 9379.44 samples/sec Loss 6.8603 LearningRate 0.0430 Epoch: 6 Global Step: 114830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:37,342-Speed 9376.88 samples/sec Loss 6.7872 LearningRate 0.0430 Epoch: 6 Global Step: 114840 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:09:38,440-Speed 9332.66 samples/sec Loss 6.8249 LearningRate 0.0430 Epoch: 6 Global Step: 114850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:39,526-Speed 9436.59 samples/sec Loss 6.8222 LearningRate 0.0430 Epoch: 6 Global Step: 114860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:40,594-Speed 9588.76 samples/sec Loss 6.7536 LearningRate 0.0430 Epoch: 6 Global Step: 114870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:41,674-Speed 9486.89 samples/sec Loss 6.9170 LearningRate 0.0430 Epoch: 6 Global Step: 114880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:42,731-Speed 9697.27 samples/sec Loss 6.9111 LearningRate 0.0430 Epoch: 6 Global Step: 114890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:43,836-Speed 9278.48 samples/sec Loss 6.7876 LearningRate 0.0430 Epoch: 6 Global Step: 114900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:44,929-Speed 9374.27 samples/sec Loss 6.8270 LearningRate 0.0430 Epoch: 6 Global Step: 114910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:45,978-Speed 9760.78 samples/sec Loss 6.9490 LearningRate 0.0430 Epoch: 6 Global Step: 114920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:47,085-Speed 9258.88 samples/sec Loss 6.7741 LearningRate 0.0430 Epoch: 6 Global Step: 114930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:48,160-Speed 9527.39 samples/sec Loss 6.8884 LearningRate 0.0430 Epoch: 6 Global Step: 114940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:49,261-Speed 9309.31 samples/sec Loss 6.7446 LearningRate 0.0430 Epoch: 6 Global Step: 114950 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:09:50,337-Speed 9519.62 samples/sec Loss 6.8093 LearningRate 0.0430 Epoch: 6 Global Step: 114960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:51,380-Speed 9827.82 samples/sec Loss 6.8647 LearningRate 0.0430 Epoch: 6 Global Step: 114970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:52,462-Speed 9468.14 samples/sec Loss 6.8217 LearningRate 0.0430 Epoch: 6 Global Step: 114980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:53,550-Speed 9418.99 samples/sec Loss 6.7459 LearningRate 0.0430 Epoch: 6 Global Step: 114990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:54,639-Speed 9415.48 samples/sec Loss 6.9354 LearningRate 0.0430 Epoch: 6 Global Step: 115000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:55,751-Speed 9215.77 samples/sec Loss 6.7554 LearningRate 0.0430 Epoch: 6 Global Step: 115010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:56,818-Speed 9596.40 samples/sec Loss 6.8512 LearningRate 0.0430 Epoch: 6 Global Step: 115020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:57,916-Speed 9334.62 samples/sec Loss 6.8720 LearningRate 0.0430 Epoch: 6 Global Step: 115030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:09:58,970-Speed 9724.04 samples/sec Loss 6.9151 LearningRate 0.0430 Epoch: 6 Global Step: 115040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:00,066-Speed 9344.40 samples/sec Loss 6.6992 LearningRate 0.0429 Epoch: 6 Global Step: 115050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:01,147-Speed 9479.80 samples/sec Loss 6.8705 LearningRate 0.0429 Epoch: 6 Global Step: 115060 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:10:02,256-Speed 9244.73 samples/sec Loss 6.8015 LearningRate 0.0429 Epoch: 6 Global Step: 115070 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:10:03,345-Speed 9404.73 samples/sec Loss 6.8478 LearningRate 0.0429 Epoch: 6 Global Step: 115080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:04,415-Speed 9573.96 samples/sec Loss 6.7789 LearningRate 0.0429 Epoch: 6 Global Step: 115090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:05,451-Speed 9888.22 samples/sec Loss 6.8797 LearningRate 0.0429 Epoch: 6 Global Step: 115100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:06,545-Speed 9368.29 samples/sec Loss 6.8074 LearningRate 0.0429 Epoch: 6 Global Step: 115110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:07,618-Speed 9546.58 samples/sec Loss 6.9552 LearningRate 0.0429 Epoch: 6 Global Step: 115120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:08,704-Speed 9441.09 samples/sec Loss 6.8958 LearningRate 0.0429 Epoch: 6 Global Step: 115130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:09,754-Speed 9756.17 samples/sec Loss 6.8729 LearningRate 0.0429 Epoch: 6 Global Step: 115140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:10,841-Speed 9424.07 samples/sec Loss 6.9175 LearningRate 0.0429 Epoch: 6 Global Step: 115150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:11,887-Speed 9802.14 samples/sec Loss 6.7839 LearningRate 0.0429 Epoch: 6 Global Step: 115160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:12,947-Speed 9666.18 samples/sec Loss 6.8077 LearningRate 0.0429 Epoch: 6 Global Step: 115170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:14,015-Speed 9594.99 samples/sec Loss 6.8501 LearningRate 0.0429 Epoch: 6 Global Step: 115180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:15,085-Speed 9573.79 samples/sec Loss 6.9267 LearningRate 0.0429 Epoch: 6 Global Step: 115190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:16,140-Speed 9707.46 samples/sec Loss 6.8166 LearningRate 0.0429 Epoch: 6 Global Step: 115200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:17,233-Speed 9373.83 samples/sec Loss 6.9076 LearningRate 0.0429 Epoch: 6 Global Step: 115210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:18,327-Speed 9371.06 samples/sec Loss 6.7706 LearningRate 0.0429 Epoch: 6 Global Step: 115220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:19,439-Speed 9212.55 samples/sec Loss 6.9051 LearningRate 0.0429 Epoch: 6 Global Step: 115230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:20,549-Speed 9226.65 samples/sec Loss 6.8433 LearningRate 0.0429 Epoch: 6 Global Step: 115240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:21,643-Speed 9370.74 samples/sec Loss 6.8427 LearningRate 0.0429 Epoch: 6 Global Step: 115250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:22,715-Speed 9556.92 samples/sec Loss 6.7806 LearningRate 0.0429 Epoch: 6 Global Step: 115260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:23,806-Speed 9391.42 samples/sec Loss 6.7312 LearningRate 0.0429 Epoch: 6 Global Step: 115270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:24,912-Speed 9265.75 samples/sec Loss 6.9263 LearningRate 0.0429 Epoch: 6 Global Step: 115280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:26,013-Speed 9305.89 samples/sec Loss 6.8553 LearningRate 0.0429 Epoch: 6 Global Step: 115290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:27,119-Speed 9263.17 samples/sec Loss 6.8585 LearningRate 0.0429 Epoch: 6 Global Step: 115300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:28,216-Speed 9340.33 samples/sec Loss 6.8225 LearningRate 0.0428 Epoch: 6 Global Step: 115310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:29,280-Speed 9637.27 samples/sec Loss 6.7790 LearningRate 0.0428 Epoch: 6 Global Step: 115320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:30,359-Speed 9499.10 samples/sec Loss 6.8635 LearningRate 0.0428 Epoch: 6 Global Step: 115330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:31,429-Speed 9570.36 samples/sec Loss 6.7168 LearningRate 0.0428 Epoch: 6 Global Step: 115340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:32,497-Speed 9596.84 samples/sec Loss 6.8169 LearningRate 0.0428 Epoch: 6 Global Step: 115350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:33,603-Speed 9259.87 samples/sec Loss 6.7033 LearningRate 0.0428 Epoch: 6 Global Step: 115360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:34,701-Speed 9335.07 samples/sec Loss 6.6931 LearningRate 0.0428 Epoch: 6 Global Step: 115370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:35,773-Speed 9557.24 samples/sec Loss 6.7402 LearningRate 0.0428 Epoch: 6 Global Step: 115380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:36,810-Speed 9874.23 samples/sec Loss 6.8411 LearningRate 0.0428 Epoch: 6 Global Step: 115390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:37,905-Speed 9355.92 samples/sec Loss 6.8366 LearningRate 0.0428 Epoch: 6 Global Step: 115400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:38,982-Speed 9518.41 samples/sec Loss 6.7936 LearningRate 0.0428 Epoch: 6 Global Step: 115410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:40,038-Speed 9698.47 samples/sec Loss 6.7845 LearningRate 0.0428 Epoch: 6 Global Step: 115420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:41,115-Speed 9511.71 samples/sec Loss 6.7936 LearningRate 0.0428 Epoch: 6 Global Step: 115430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:42,212-Speed 9346.10 samples/sec Loss 6.9250 LearningRate 0.0428 Epoch: 6 Global Step: 115440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:43,305-Speed 9376.18 samples/sec Loss 6.8850 LearningRate 0.0428 Epoch: 6 Global Step: 115450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:44,405-Speed 9313.06 samples/sec Loss 6.9016 LearningRate 0.0428 Epoch: 6 Global Step: 115460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:45,498-Speed 9369.77 samples/sec Loss 6.7718 LearningRate 0.0428 Epoch: 6 Global Step: 115470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:46,568-Speed 9574.49 samples/sec Loss 6.7564 LearningRate 0.0428 Epoch: 6 Global Step: 115480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:47,659-Speed 9395.83 samples/sec Loss 6.7364 LearningRate 0.0428 Epoch: 6 Global Step: 115490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:48,740-Speed 9480.53 samples/sec Loss 6.7965 LearningRate 0.0428 Epoch: 6 Global Step: 115500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:49,786-Speed 9793.03 samples/sec Loss 6.7757 LearningRate 0.0428 Epoch: 6 Global Step: 115510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:10:50,863-Speed 9517.87 samples/sec Loss 6.8439 LearningRate 0.0428 Epoch: 6 Global Step: 115520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:51,931-Speed 9592.90 samples/sec Loss 6.8330 LearningRate 0.0428 Epoch: 6 Global Step: 115530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:53,018-Speed 9419.23 samples/sec Loss 6.7661 LearningRate 0.0428 Epoch: 6 Global Step: 115540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:54,118-Speed 9321.20 samples/sec Loss 6.7741 LearningRate 0.0428 Epoch: 6 Global Step: 115550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:55,167-Speed 9768.52 samples/sec Loss 6.7646 LearningRate 0.0427 Epoch: 6 Global Step: 115560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:56,278-Speed 9217.05 samples/sec Loss 6.7555 LearningRate 0.0427 Epoch: 6 Global Step: 115570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:57,363-Speed 9447.03 samples/sec Loss 6.7469 LearningRate 0.0427 Epoch: 6 Global Step: 115580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:58,452-Speed 9409.42 samples/sec Loss 6.6978 LearningRate 0.0427 Epoch: 6 Global Step: 115590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:10:59,571-Speed 9149.82 samples/sec Loss 6.6912 LearningRate 0.0427 Epoch: 6 Global Step: 115600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:00,687-Speed 9190.99 samples/sec Loss 6.8460 LearningRate 0.0427 Epoch: 6 Global Step: 115610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:01,754-Speed 9600.05 samples/sec Loss 6.9192 LearningRate 0.0427 Epoch: 6 Global Step: 115620 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:11:02,815-Speed 9651.20 samples/sec Loss 6.7971 LearningRate 0.0427 Epoch: 6 Global Step: 115630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:03,889-Speed 9540.08 samples/sec Loss 6.8499 LearningRate 0.0427 Epoch: 6 Global Step: 115640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:04,977-Speed 9419.23 samples/sec Loss 6.8341 LearningRate 0.0427 Epoch: 6 Global Step: 115650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:06,035-Speed 9687.64 samples/sec Loss 6.8897 LearningRate 0.0427 Epoch: 6 Global Step: 115660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:07,128-Speed 9370.88 samples/sec Loss 6.6629 LearningRate 0.0427 Epoch: 6 Global Step: 115670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:08,214-Speed 9440.22 samples/sec Loss 6.7278 LearningRate 0.0427 Epoch: 6 Global Step: 115680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:09,316-Speed 9295.48 samples/sec Loss 6.8444 LearningRate 0.0427 Epoch: 6 Global Step: 115690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:10,374-Speed 9678.98 samples/sec Loss 6.8390 LearningRate 0.0427 Epoch: 6 Global Step: 115700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:11,494-Speed 9154.89 samples/sec Loss 6.8464 LearningRate 0.0427 Epoch: 6 Global Step: 115710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:12,577-Speed 9461.93 samples/sec Loss 6.9165 LearningRate 0.0427 Epoch: 6 Global Step: 115720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:13,625-Speed 9782.60 samples/sec Loss 6.7373 LearningRate 0.0427 Epoch: 6 Global Step: 115730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:14,691-Speed 9604.62 samples/sec Loss 6.7264 LearningRate 0.0427 Epoch: 6 Global Step: 115740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:15,778-Speed 9429.06 samples/sec Loss 6.8681 LearningRate 0.0427 Epoch: 6 Global Step: 115750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:16,847-Speed 9584.02 samples/sec Loss 6.8914 LearningRate 0.0427 Epoch: 6 Global Step: 115760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:17,932-Speed 9442.05 samples/sec Loss 6.8369 LearningRate 0.0427 Epoch: 6 Global Step: 115770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:19,014-Speed 9471.98 samples/sec Loss 6.8252 LearningRate 0.0427 Epoch: 6 Global Step: 115780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:20,067-Speed 9732.03 samples/sec Loss 6.9060 LearningRate 0.0427 Epoch: 6 Global Step: 115790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:21,131-Speed 9628.49 samples/sec Loss 6.9062 LearningRate 0.0427 Epoch: 6 Global Step: 115800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:22,204-Speed 9551.14 samples/sec Loss 6.8881 LearningRate 0.0427 Epoch: 6 Global Step: 115810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:23,310-Speed 9257.35 samples/sec Loss 6.7932 LearningRate 0.0426 Epoch: 6 Global Step: 115820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:24,394-Speed 9459.75 samples/sec Loss 6.7137 LearningRate 0.0426 Epoch: 6 Global Step: 115830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:25,512-Speed 9161.81 samples/sec Loss 6.7871 LearningRate 0.0426 Epoch: 6 Global Step: 115840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:26,603-Speed 9398.20 samples/sec Loss 6.7238 LearningRate 0.0426 Epoch: 6 Global Step: 115850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:27,692-Speed 9402.89 samples/sec Loss 6.7144 LearningRate 0.0426 Epoch: 6 Global Step: 115860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:28,744-Speed 9743.83 samples/sec Loss 6.8919 LearningRate 0.0426 Epoch: 6 Global Step: 115870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:29,799-Speed 9710.15 samples/sec Loss 6.8565 LearningRate 0.0426 Epoch: 6 Global Step: 115880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:30,849-Speed 9759.98 samples/sec Loss 6.7709 LearningRate 0.0426 Epoch: 6 Global Step: 115890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:31,941-Speed 9385.67 samples/sec Loss 6.7633 LearningRate 0.0426 Epoch: 6 Global Step: 115900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:33,027-Speed 9432.21 samples/sec Loss 6.7334 LearningRate 0.0426 Epoch: 6 Global Step: 115910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:34,114-Speed 9423.40 samples/sec Loss 6.8907 LearningRate 0.0426 Epoch: 6 Global Step: 115920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:35,154-Speed 9853.62 samples/sec Loss 6.8744 LearningRate 0.0426 Epoch: 6 Global Step: 115930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:36,224-Speed 9571.88 samples/sec Loss 6.8606 LearningRate 0.0426 Epoch: 6 Global Step: 115940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:37,278-Speed 9721.75 samples/sec Loss 6.7090 LearningRate 0.0426 Epoch: 6 Global Step: 115950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:38,357-Speed 9501.31 samples/sec Loss 6.7716 LearningRate 0.0426 Epoch: 6 Global Step: 115960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:39,491-Speed 9034.88 samples/sec Loss 6.8229 LearningRate 0.0426 Epoch: 6 Global Step: 115970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:11:40,569-Speed 9502.55 samples/sec Loss 6.8684 LearningRate 0.0426 Epoch: 6 Global Step: 115980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:41,661-Speed 9380.31 samples/sec Loss 6.8963 LearningRate 0.0426 Epoch: 6 Global Step: 115990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:11:42,780-Speed 9160.94 samples/sec Loss 6.7564 LearningRate 0.0426 Epoch: 6 Global Step: 116000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:12:04,860-[lfw][116000]XNorm: 10.876187 Training: 2022-04-11 16:12:04,861-[lfw][116000]Accuracy-Flip: 0.99617+-0.00259 Training: 2022-04-11 16:12:04,861-[lfw][116000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:12:30,405-[cfp_fp][116000]XNorm: 9.313619 Training: 2022-04-11 16:12:30,406-[cfp_fp][116000]Accuracy-Flip: 0.95600+-0.00733 Training: 2022-04-11 16:12:30,406-[cfp_fp][116000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:12:52,542-[agedb_30][116000]XNorm: 10.510777 Training: 2022-04-11 16:12:52,542-[agedb_30][116000]Accuracy-Flip: 0.96383+-0.00943 Training: 2022-04-11 16:12:52,543-[agedb_30][116000]Accuracy-Highest: 0.96483 Training: 2022-04-11 16:12:53,663-Speed 144.46 samples/sec Loss 6.8444 LearningRate 0.0426 Epoch: 6 Global Step: 116010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:12:54,755-Speed 9392.65 samples/sec Loss 6.8382 LearningRate 0.0426 Epoch: 6 Global Step: 116020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:12:55,826-Speed 9562.53 samples/sec Loss 6.8632 LearningRate 0.0426 Epoch: 6 Global Step: 116030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:12:56,869-Speed 9827.05 samples/sec Loss 6.8440 LearningRate 0.0426 Epoch: 6 Global Step: 116040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:12:57,931-Speed 9645.01 samples/sec Loss 6.8923 LearningRate 0.0426 Epoch: 6 Global Step: 116050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:12:59,028-Speed 9341.83 samples/sec Loss 6.7904 LearningRate 0.0426 Epoch: 6 Global Step: 116060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:00,116-Speed 9416.75 samples/sec Loss 6.8433 LearningRate 0.0425 Epoch: 6 Global Step: 116070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:01,196-Speed 9493.99 samples/sec Loss 6.7977 LearningRate 0.0425 Epoch: 6 Global Step: 116080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:02,277-Speed 9473.71 samples/sec Loss 6.7539 LearningRate 0.0425 Epoch: 6 Global Step: 116090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:03,363-Speed 9434.98 samples/sec Loss 6.7922 LearningRate 0.0425 Epoch: 6 Global Step: 116100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:04,412-Speed 9762.45 samples/sec Loss 6.8160 LearningRate 0.0425 Epoch: 6 Global Step: 116110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:05,462-Speed 9763.84 samples/sec Loss 6.8039 LearningRate 0.0425 Epoch: 6 Global Step: 116120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:06,522-Speed 9665.60 samples/sec Loss 6.7647 LearningRate 0.0425 Epoch: 6 Global Step: 116130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:07,632-Speed 9228.88 samples/sec Loss 6.7579 LearningRate 0.0425 Epoch: 6 Global Step: 116140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:08,785-Speed 8883.24 samples/sec Loss 6.8356 LearningRate 0.0425 Epoch: 6 Global Step: 116150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:09,895-Speed 9230.22 samples/sec Loss 6.6386 LearningRate 0.0425 Epoch: 6 Global Step: 116160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:10,962-Speed 9602.10 samples/sec Loss 6.8029 LearningRate 0.0425 Epoch: 6 Global Step: 116170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:12,059-Speed 9338.85 samples/sec Loss 6.8606 LearningRate 0.0425 Epoch: 6 Global Step: 116180 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:13:13,122-Speed 9642.41 samples/sec Loss 6.7721 LearningRate 0.0425 Epoch: 6 Global Step: 116190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:14,160-Speed 9871.88 samples/sec Loss 6.8503 LearningRate 0.0425 Epoch: 6 Global Step: 116200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:15,212-Speed 9741.05 samples/sec Loss 6.9138 LearningRate 0.0425 Epoch: 6 Global Step: 116210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:16,308-Speed 9347.64 samples/sec Loss 6.7850 LearningRate 0.0425 Epoch: 6 Global Step: 116220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:17,377-Speed 9587.79 samples/sec Loss 6.7372 LearningRate 0.0425 Epoch: 6 Global Step: 116230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:18,446-Speed 9577.35 samples/sec Loss 6.7639 LearningRate 0.0425 Epoch: 6 Global Step: 116240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:19,566-Speed 9153.38 samples/sec Loss 6.6945 LearningRate 0.0425 Epoch: 6 Global Step: 116250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:20,639-Speed 9545.99 samples/sec Loss 6.7828 LearningRate 0.0425 Epoch: 6 Global Step: 116260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:21,693-Speed 9721.77 samples/sec Loss 6.7243 LearningRate 0.0425 Epoch: 6 Global Step: 116270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:22,830-Speed 9009.17 samples/sec Loss 6.8677 LearningRate 0.0425 Epoch: 6 Global Step: 116280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:23,901-Speed 9569.47 samples/sec Loss 6.7213 LearningRate 0.0425 Epoch: 6 Global Step: 116290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:24,979-Speed 9509.19 samples/sec Loss 6.8007 LearningRate 0.0425 Epoch: 6 Global Step: 116300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:26,016-Speed 9873.99 samples/sec Loss 6.7794 LearningRate 0.0425 Epoch: 6 Global Step: 116310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:27,086-Speed 9581.62 samples/sec Loss 6.7108 LearningRate 0.0425 Epoch: 6 Global Step: 116320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:28,155-Speed 9579.34 samples/sec Loss 6.7069 LearningRate 0.0424 Epoch: 6 Global Step: 116330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:29,217-Speed 9654.87 samples/sec Loss 6.7858 LearningRate 0.0424 Epoch: 6 Global Step: 116340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:30,294-Speed 9507.71 samples/sec Loss 6.6951 LearningRate 0.0424 Epoch: 6 Global Step: 116350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:31,395-Speed 9316.02 samples/sec Loss 6.8288 LearningRate 0.0424 Epoch: 6 Global Step: 116360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:32,475-Speed 9480.19 samples/sec Loss 6.8647 LearningRate 0.0424 Epoch: 6 Global Step: 116370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:33,564-Speed 9411.25 samples/sec Loss 6.8995 LearningRate 0.0424 Epoch: 6 Global Step: 116380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:34,621-Speed 9694.87 samples/sec Loss 6.7678 LearningRate 0.0424 Epoch: 6 Global Step: 116390 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:13:35,720-Speed 9319.35 samples/sec Loss 6.8917 LearningRate 0.0424 Epoch: 6 Global Step: 116400 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:13:36,813-Speed 9377.93 samples/sec Loss 6.8212 LearningRate 0.0424 Epoch: 6 Global Step: 116410 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:13:37,875-Speed 9642.99 samples/sec Loss 6.8312 LearningRate 0.0424 Epoch: 6 Global Step: 116420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:38,953-Speed 9505.83 samples/sec Loss 6.7818 LearningRate 0.0424 Epoch: 6 Global Step: 116430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:40,065-Speed 9211.66 samples/sec Loss 6.7728 LearningRate 0.0424 Epoch: 6 Global Step: 116440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:41,153-Speed 9419.99 samples/sec Loss 6.8107 LearningRate 0.0424 Epoch: 6 Global Step: 116450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:42,222-Speed 9592.73 samples/sec Loss 6.6950 LearningRate 0.0424 Epoch: 6 Global Step: 116460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:43,306-Speed 9451.26 samples/sec Loss 6.8388 LearningRate 0.0424 Epoch: 6 Global Step: 116470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:44,392-Speed 9445.09 samples/sec Loss 6.8057 LearningRate 0.0424 Epoch: 6 Global Step: 116480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:45,479-Speed 9419.77 samples/sec Loss 6.7455 LearningRate 0.0424 Epoch: 6 Global Step: 116490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:46,526-Speed 9792.74 samples/sec Loss 6.7653 LearningRate 0.0424 Epoch: 6 Global Step: 116500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:47,560-Speed 9902.88 samples/sec Loss 6.8762 LearningRate 0.0424 Epoch: 6 Global Step: 116510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:48,604-Speed 9820.98 samples/sec Loss 6.7240 LearningRate 0.0424 Epoch: 6 Global Step: 116520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:49,640-Speed 9892.86 samples/sec Loss 6.7671 LearningRate 0.0424 Epoch: 6 Global Step: 116530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:50,695-Speed 9709.54 samples/sec Loss 6.8118 LearningRate 0.0424 Epoch: 6 Global Step: 116540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:51,767-Speed 9553.80 samples/sec Loss 6.8849 LearningRate 0.0424 Epoch: 6 Global Step: 116550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:52,875-Speed 9245.54 samples/sec Loss 6.7546 LearningRate 0.0424 Epoch: 6 Global Step: 116560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:53,950-Speed 9530.65 samples/sec Loss 6.7700 LearningRate 0.0424 Epoch: 6 Global Step: 116570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:55,036-Speed 9439.73 samples/sec Loss 6.7335 LearningRate 0.0424 Epoch: 6 Global Step: 116580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:56,124-Speed 9416.81 samples/sec Loss 6.8869 LearningRate 0.0423 Epoch: 6 Global Step: 116590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:57,183-Speed 9675.22 samples/sec Loss 6.9086 LearningRate 0.0423 Epoch: 6 Global Step: 116600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:13:58,285-Speed 9302.22 samples/sec Loss 6.7956 LearningRate 0.0423 Epoch: 6 Global Step: 116610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:13:59,373-Speed 9415.48 samples/sec Loss 6.7880 LearningRate 0.0423 Epoch: 6 Global Step: 116620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:00,448-Speed 9529.61 samples/sec Loss 6.8139 LearningRate 0.0423 Epoch: 6 Global Step: 116630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:01,541-Speed 9380.95 samples/sec Loss 6.8098 LearningRate 0.0423 Epoch: 6 Global Step: 116640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:02,599-Speed 9684.85 samples/sec Loss 6.8755 LearningRate 0.0423 Epoch: 6 Global Step: 116650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:03,657-Speed 9678.23 samples/sec Loss 6.8018 LearningRate 0.0423 Epoch: 6 Global Step: 116660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:04,720-Speed 9638.28 samples/sec Loss 6.7296 LearningRate 0.0423 Epoch: 6 Global Step: 116670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:05,789-Speed 9589.94 samples/sec Loss 6.8653 LearningRate 0.0423 Epoch: 6 Global Step: 116680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:06,829-Speed 9852.52 samples/sec Loss 6.8439 LearningRate 0.0423 Epoch: 6 Global Step: 116690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:07,918-Speed 9409.70 samples/sec Loss 6.7978 LearningRate 0.0423 Epoch: 6 Global Step: 116700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:08,975-Speed 9694.38 samples/sec Loss 6.7868 LearningRate 0.0423 Epoch: 6 Global Step: 116710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:10,070-Speed 9353.43 samples/sec Loss 6.7396 LearningRate 0.0423 Epoch: 6 Global Step: 116720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:11,133-Speed 9640.03 samples/sec Loss 6.7261 LearningRate 0.0423 Epoch: 6 Global Step: 116730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:12,215-Speed 9466.58 samples/sec Loss 6.6966 LearningRate 0.0423 Epoch: 6 Global Step: 116740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:13,306-Speed 9396.24 samples/sec Loss 6.8243 LearningRate 0.0423 Epoch: 6 Global Step: 116750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:14,351-Speed 9804.07 samples/sec Loss 6.8759 LearningRate 0.0423 Epoch: 6 Global Step: 116760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:15,422-Speed 9571.70 samples/sec Loss 6.8180 LearningRate 0.0423 Epoch: 6 Global Step: 116770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:16,537-Speed 9185.83 samples/sec Loss 6.8224 LearningRate 0.0423 Epoch: 6 Global Step: 116780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:17,598-Speed 9652.07 samples/sec Loss 6.8309 LearningRate 0.0423 Epoch: 6 Global Step: 116790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:18,683-Speed 9444.00 samples/sec Loss 6.8324 LearningRate 0.0423 Epoch: 6 Global Step: 116800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:19,777-Speed 9368.25 samples/sec Loss 6.8477 LearningRate 0.0423 Epoch: 6 Global Step: 116810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:20,825-Speed 9778.82 samples/sec Loss 6.7086 LearningRate 0.0423 Epoch: 6 Global Step: 116820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:14:22,138-Speed 7799.20 samples/sec Loss 6.8485 LearningRate 0.0423 Epoch: 6 Global Step: 116830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:14:59,435-Speed 274.57 samples/sec Loss 6.4998 LearningRate 0.0422 Epoch: 7 Global Step: 116840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:15:00,528-Speed 9379.20 samples/sec Loss 6.0574 LearningRate 0.0422 Epoch: 7 Global Step: 116850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:15:01,628-Speed 9312.03 samples/sec Loss 6.0052 LearningRate 0.0422 Epoch: 7 Global Step: 116860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:15:03,219-Speed 6436.67 samples/sec Loss 5.9777 LearningRate 0.0422 Epoch: 7 Global Step: 116870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:15:04,522-Speed 7864.47 samples/sec Loss 6.0200 LearningRate 0.0422 Epoch: 7 Global Step: 116880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:15:05,813-Speed 7933.77 samples/sec Loss 6.0079 LearningRate 0.0422 Epoch: 7 Global Step: 116890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:15:07,037-Speed 8377.69 samples/sec Loss 5.9893 LearningRate 0.0422 Epoch: 7 Global Step: 116900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:15:08,135-Speed 9334.63 samples/sec Loss 6.0173 LearningRate 0.0422 Epoch: 7 Global Step: 116910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:15:09,210-Speed 9524.75 samples/sec Loss 6.0039 LearningRate 0.0422 Epoch: 7 Global Step: 116920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:15:10,283-Speed 9555.40 samples/sec Loss 5.9710 LearningRate 0.0422 Epoch: 7 Global Step: 116930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:11,389-Speed 9259.58 samples/sec Loss 6.0003 LearningRate 0.0422 Epoch: 7 Global Step: 116940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:12,511-Speed 9129.57 samples/sec Loss 6.0345 LearningRate 0.0422 Epoch: 7 Global Step: 116950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:13,601-Speed 9401.16 samples/sec Loss 6.0304 LearningRate 0.0422 Epoch: 7 Global Step: 116960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:14,693-Speed 9382.48 samples/sec Loss 5.8750 LearningRate 0.0422 Epoch: 7 Global Step: 116970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:15,789-Speed 9349.16 samples/sec Loss 5.9737 LearningRate 0.0422 Epoch: 7 Global Step: 116980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:16,911-Speed 9133.27 samples/sec Loss 5.9389 LearningRate 0.0422 Epoch: 7 Global Step: 116990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:18,041-Speed 9065.81 samples/sec Loss 5.9005 LearningRate 0.0422 Epoch: 7 Global Step: 117000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:19,138-Speed 9342.65 samples/sec Loss 6.0902 LearningRate 0.0422 Epoch: 7 Global Step: 117010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:20,241-Speed 9293.94 samples/sec Loss 6.1293 LearningRate 0.0422 Epoch: 7 Global Step: 117020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:21,400-Speed 8837.87 samples/sec Loss 5.9853 LearningRate 0.0422 Epoch: 7 Global Step: 117030 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:15:22,519-Speed 9155.05 samples/sec Loss 5.9693 LearningRate 0.0422 Epoch: 7 Global Step: 117040 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:15:23,602-Speed 9461.01 samples/sec Loss 6.1138 LearningRate 0.0422 Epoch: 7 Global Step: 117050 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:15:24,709-Speed 9258.83 samples/sec Loss 6.0351 LearningRate 0.0422 Epoch: 7 Global Step: 117060 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:15:25,802-Speed 9371.96 samples/sec Loss 6.0651 LearningRate 0.0422 Epoch: 7 Global Step: 117070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:26,874-Speed 9554.07 samples/sec Loss 6.0207 LearningRate 0.0422 Epoch: 7 Global Step: 117080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:27,940-Speed 9613.85 samples/sec Loss 5.9316 LearningRate 0.0422 Epoch: 7 Global Step: 117090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:28,998-Speed 9693.05 samples/sec Loss 6.0520 LearningRate 0.0421 Epoch: 7 Global Step: 117100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:30,098-Speed 9308.57 samples/sec Loss 6.1167 LearningRate 0.0421 Epoch: 7 Global Step: 117110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:31,165-Speed 9606.55 samples/sec Loss 6.0994 LearningRate 0.0421 Epoch: 7 Global Step: 117120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:32,253-Speed 9417.99 samples/sec Loss 6.0190 LearningRate 0.0421 Epoch: 7 Global Step: 117130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:33,352-Speed 9319.39 samples/sec Loss 6.0062 LearningRate 0.0421 Epoch: 7 Global Step: 117140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:34,656-Speed 7858.92 samples/sec Loss 6.0158 LearningRate 0.0421 Epoch: 7 Global Step: 117150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:36,245-Speed 6448.11 samples/sec Loss 6.0296 LearningRate 0.0421 Epoch: 7 Global Step: 117160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:37,347-Speed 9291.36 samples/sec Loss 6.0479 LearningRate 0.0421 Epoch: 7 Global Step: 117170 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:15:38,459-Speed 9218.44 samples/sec Loss 6.0981 LearningRate 0.0421 Epoch: 7 Global Step: 117180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:39,720-Speed 8121.76 samples/sec Loss 6.0590 LearningRate 0.0421 Epoch: 7 Global Step: 117190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:41,002-Speed 7993.92 samples/sec Loss 6.1045 LearningRate 0.0421 Epoch: 7 Global Step: 117200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:42,068-Speed 9615.37 samples/sec Loss 6.0524 LearningRate 0.0421 Epoch: 7 Global Step: 117210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:43,398-Speed 7701.25 samples/sec Loss 5.9512 LearningRate 0.0421 Epoch: 7 Global Step: 117220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:44,512-Speed 9198.61 samples/sec Loss 6.0564 LearningRate 0.0421 Epoch: 7 Global Step: 117230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:45,631-Speed 9150.70 samples/sec Loss 6.1120 LearningRate 0.0421 Epoch: 7 Global Step: 117240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:46,740-Speed 9245.43 samples/sec Loss 6.0704 LearningRate 0.0421 Epoch: 7 Global Step: 117250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:47,877-Speed 9007.06 samples/sec Loss 6.1339 LearningRate 0.0421 Epoch: 7 Global Step: 117260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:48,950-Speed 9553.94 samples/sec Loss 6.0538 LearningRate 0.0421 Epoch: 7 Global Step: 117270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:50,045-Speed 9359.31 samples/sec Loss 6.1674 LearningRate 0.0421 Epoch: 7 Global Step: 117280 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:15:51,127-Speed 9462.79 samples/sec Loss 6.0090 LearningRate 0.0421 Epoch: 7 Global Step: 117290 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:15:52,207-Speed 9485.04 samples/sec Loss 6.0198 LearningRate 0.0421 Epoch: 7 Global Step: 117300 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:15:53,265-Speed 9695.11 samples/sec Loss 6.1574 LearningRate 0.0421 Epoch: 7 Global Step: 117310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:54,337-Speed 9554.22 samples/sec Loss 6.0634 LearningRate 0.0421 Epoch: 7 Global Step: 117320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:55,423-Speed 9436.96 samples/sec Loss 5.9678 LearningRate 0.0421 Epoch: 7 Global Step: 117330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:56,453-Speed 9948.47 samples/sec Loss 6.0843 LearningRate 0.0421 Epoch: 7 Global Step: 117340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:57,535-Speed 9464.51 samples/sec Loss 6.0835 LearningRate 0.0421 Epoch: 7 Global Step: 117350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:58,670-Speed 9032.59 samples/sec Loss 6.1083 LearningRate 0.0420 Epoch: 7 Global Step: 117360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:15:59,742-Speed 9553.88 samples/sec Loss 6.2062 LearningRate 0.0420 Epoch: 7 Global Step: 117370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:00,844-Speed 9296.74 samples/sec Loss 6.1722 LearningRate 0.0420 Epoch: 7 Global Step: 117380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:01,932-Speed 9416.52 samples/sec Loss 6.0758 LearningRate 0.0420 Epoch: 7 Global Step: 117390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:03,044-Speed 9209.52 samples/sec Loss 6.1955 LearningRate 0.0420 Epoch: 7 Global Step: 117400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:04,086-Speed 9835.50 samples/sec Loss 6.1082 LearningRate 0.0420 Epoch: 7 Global Step: 117410 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:16:05,152-Speed 9622.02 samples/sec Loss 6.0335 LearningRate 0.0420 Epoch: 7 Global Step: 117420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:06,243-Speed 9388.92 samples/sec Loss 6.0622 LearningRate 0.0420 Epoch: 7 Global Step: 117430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:07,331-Speed 9425.07 samples/sec Loss 6.2117 LearningRate 0.0420 Epoch: 7 Global Step: 117440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:08,392-Speed 9655.90 samples/sec Loss 6.0471 LearningRate 0.0420 Epoch: 7 Global Step: 117450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:09,488-Speed 9349.52 samples/sec Loss 6.0847 LearningRate 0.0420 Epoch: 7 Global Step: 117460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:10,587-Speed 9319.73 samples/sec Loss 6.0726 LearningRate 0.0420 Epoch: 7 Global Step: 117470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:11,701-Speed 9196.43 samples/sec Loss 6.0899 LearningRate 0.0420 Epoch: 7 Global Step: 117480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:12,812-Speed 9226.10 samples/sec Loss 6.0847 LearningRate 0.0420 Epoch: 7 Global Step: 117490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:13,972-Speed 8832.48 samples/sec Loss 6.1704 LearningRate 0.0420 Epoch: 7 Global Step: 117500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:15,002-Speed 9942.13 samples/sec Loss 6.2094 LearningRate 0.0420 Epoch: 7 Global Step: 117510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:16,053-Speed 9752.03 samples/sec Loss 6.1346 LearningRate 0.0420 Epoch: 7 Global Step: 117520 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:16:17,127-Speed 9540.14 samples/sec Loss 6.1357 LearningRate 0.0420 Epoch: 7 Global Step: 117530 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:16:18,206-Speed 9492.30 samples/sec Loss 6.1228 LearningRate 0.0420 Epoch: 7 Global Step: 117540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:19,284-Speed 9502.29 samples/sec Loss 6.1520 LearningRate 0.0420 Epoch: 7 Global Step: 117550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:20,385-Speed 9309.93 samples/sec Loss 6.1480 LearningRate 0.0420 Epoch: 7 Global Step: 117560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:21,453-Speed 9590.89 samples/sec Loss 6.0449 LearningRate 0.0420 Epoch: 7 Global Step: 117570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:22,566-Speed 9208.43 samples/sec Loss 6.0183 LearningRate 0.0420 Epoch: 7 Global Step: 117580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:23,729-Speed 8810.06 samples/sec Loss 6.2122 LearningRate 0.0420 Epoch: 7 Global Step: 117590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:24,795-Speed 9617.08 samples/sec Loss 6.2628 LearningRate 0.0420 Epoch: 7 Global Step: 117600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:25,916-Speed 9136.28 samples/sec Loss 6.0810 LearningRate 0.0419 Epoch: 7 Global Step: 117610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:26,995-Speed 9496.48 samples/sec Loss 6.1210 LearningRate 0.0419 Epoch: 7 Global Step: 117620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:28,101-Speed 9269.08 samples/sec Loss 6.1782 LearningRate 0.0419 Epoch: 7 Global Step: 117630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:29,168-Speed 9597.25 samples/sec Loss 6.1806 LearningRate 0.0419 Epoch: 7 Global Step: 117640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:30,237-Speed 9585.48 samples/sec Loss 6.1702 LearningRate 0.0419 Epoch: 7 Global Step: 117650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:31,281-Speed 9810.08 samples/sec Loss 6.2138 LearningRate 0.0419 Epoch: 7 Global Step: 117660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:32,363-Speed 9469.47 samples/sec Loss 6.2389 LearningRate 0.0419 Epoch: 7 Global Step: 117670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:33,432-Speed 9591.57 samples/sec Loss 6.1978 LearningRate 0.0419 Epoch: 7 Global Step: 117680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:34,516-Speed 9451.30 samples/sec Loss 6.1223 LearningRate 0.0419 Epoch: 7 Global Step: 117690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:35,648-Speed 9051.49 samples/sec Loss 6.2410 LearningRate 0.0419 Epoch: 7 Global Step: 117700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:36,753-Speed 9272.69 samples/sec Loss 6.2309 LearningRate 0.0419 Epoch: 7 Global Step: 117710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:37,833-Speed 9487.27 samples/sec Loss 6.1285 LearningRate 0.0419 Epoch: 7 Global Step: 117720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:38,899-Speed 9607.03 samples/sec Loss 6.2645 LearningRate 0.0419 Epoch: 7 Global Step: 117730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:39,975-Speed 9527.97 samples/sec Loss 6.1480 LearningRate 0.0419 Epoch: 7 Global Step: 117740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:41,072-Speed 9335.92 samples/sec Loss 6.2410 LearningRate 0.0419 Epoch: 7 Global Step: 117750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:42,151-Speed 9501.81 samples/sec Loss 6.2561 LearningRate 0.0419 Epoch: 7 Global Step: 117760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:43,290-Speed 8993.46 samples/sec Loss 6.0772 LearningRate 0.0419 Epoch: 7 Global Step: 117770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:44,345-Speed 9713.15 samples/sec Loss 6.1787 LearningRate 0.0419 Epoch: 7 Global Step: 117780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:45,460-Speed 9189.92 samples/sec Loss 6.0504 LearningRate 0.0419 Epoch: 7 Global Step: 117790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:46,552-Speed 9384.16 samples/sec Loss 6.2766 LearningRate 0.0419 Epoch: 7 Global Step: 117800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:47,684-Speed 9052.43 samples/sec Loss 6.1958 LearningRate 0.0419 Epoch: 7 Global Step: 117810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:48,756-Speed 9557.33 samples/sec Loss 6.2349 LearningRate 0.0419 Epoch: 7 Global Step: 117820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:16:49,835-Speed 9495.75 samples/sec Loss 6.2629 LearningRate 0.0419 Epoch: 7 Global Step: 117830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:50,891-Speed 9700.54 samples/sec Loss 6.3197 LearningRate 0.0419 Epoch: 7 Global Step: 117840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:51,967-Speed 9520.00 samples/sec Loss 6.2808 LearningRate 0.0419 Epoch: 7 Global Step: 117850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:53,051-Speed 9450.42 samples/sec Loss 6.1966 LearningRate 0.0419 Epoch: 7 Global Step: 117860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:54,133-Speed 9478.07 samples/sec Loss 6.2641 LearningRate 0.0418 Epoch: 7 Global Step: 117870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:55,220-Speed 9422.11 samples/sec Loss 6.1671 LearningRate 0.0418 Epoch: 7 Global Step: 117880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:56,308-Speed 9420.30 samples/sec Loss 6.1902 LearningRate 0.0418 Epoch: 7 Global Step: 117890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:57,392-Speed 9447.43 samples/sec Loss 6.1230 LearningRate 0.0418 Epoch: 7 Global Step: 117900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:58,486-Speed 9371.21 samples/sec Loss 6.3059 LearningRate 0.0418 Epoch: 7 Global Step: 117910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:16:59,544-Speed 9675.96 samples/sec Loss 6.2125 LearningRate 0.0418 Epoch: 7 Global Step: 117920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:17:00,647-Speed 9290.19 samples/sec Loss 6.2824 LearningRate 0.0418 Epoch: 7 Global Step: 117930 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:17:01,709-Speed 9646.58 samples/sec Loss 6.3246 LearningRate 0.0418 Epoch: 7 Global Step: 117940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:17:02,808-Speed 9322.73 samples/sec Loss 6.2580 LearningRate 0.0418 Epoch: 7 Global Step: 117950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:17:03,927-Speed 9156.92 samples/sec Loss 6.1510 LearningRate 0.0418 Epoch: 7 Global Step: 117960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:17:05,010-Speed 9470.12 samples/sec Loss 6.1389 LearningRate 0.0418 Epoch: 7 Global Step: 117970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:17:06,105-Speed 9351.75 samples/sec Loss 6.1614 LearningRate 0.0418 Epoch: 7 Global Step: 117980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:17:07,160-Speed 9711.79 samples/sec Loss 6.2688 LearningRate 0.0418 Epoch: 7 Global Step: 117990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:17:08,240-Speed 9492.75 samples/sec Loss 6.1193 LearningRate 0.0418 Epoch: 7 Global Step: 118000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:17:30,314-[lfw][118000]XNorm: 10.867755 Training: 2022-04-11 16:17:30,314-[lfw][118000]Accuracy-Flip: 0.99467+-0.00356 Training: 2022-04-11 16:17:30,314-[lfw][118000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:17:55,799-[cfp_fp][118000]XNorm: 9.272217 Training: 2022-04-11 16:17:55,800-[cfp_fp][118000]Accuracy-Flip: 0.95943+-0.01125 Training: 2022-04-11 16:17:55,800-[cfp_fp][118000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:18:17,819-[agedb_30][118000]XNorm: 10.515524 Training: 2022-04-11 16:18:17,819-[agedb_30][118000]Accuracy-Flip: 0.96467+-0.00945 Training: 2022-04-11 16:18:17,820-[agedb_30][118000]Accuracy-Highest: 0.96483 Training: 2022-04-11 16:18:18,920-Speed 144.88 samples/sec Loss 6.1758 LearningRate 0.0418 Epoch: 7 Global Step: 118010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:20,018-Speed 9332.65 samples/sec Loss 6.2896 LearningRate 0.0418 Epoch: 7 Global Step: 118020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:21,128-Speed 9233.46 samples/sec Loss 6.2217 LearningRate 0.0418 Epoch: 7 Global Step: 118030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:22,241-Speed 9201.50 samples/sec Loss 6.1584 LearningRate 0.0418 Epoch: 7 Global Step: 118040 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:18:23,359-Speed 9169.36 samples/sec Loss 6.2597 LearningRate 0.0418 Epoch: 7 Global Step: 118050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:24,441-Speed 9472.26 samples/sec Loss 6.2237 LearningRate 0.0418 Epoch: 7 Global Step: 118060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:25,546-Speed 9270.18 samples/sec Loss 6.2117 LearningRate 0.0418 Epoch: 7 Global Step: 118070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:26,663-Speed 9178.44 samples/sec Loss 6.3328 LearningRate 0.0418 Epoch: 7 Global Step: 118080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:27,726-Speed 9635.85 samples/sec Loss 6.2365 LearningRate 0.0418 Epoch: 7 Global Step: 118090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:28,814-Speed 9415.95 samples/sec Loss 6.2805 LearningRate 0.0418 Epoch: 7 Global Step: 118100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:29,916-Speed 9292.41 samples/sec Loss 6.1732 LearningRate 0.0418 Epoch: 7 Global Step: 118110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:30,998-Speed 9473.80 samples/sec Loss 6.1801 LearningRate 0.0418 Epoch: 7 Global Step: 118120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:32,080-Speed 9466.97 samples/sec Loss 6.3021 LearningRate 0.0417 Epoch: 7 Global Step: 118130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:33,167-Speed 9426.84 samples/sec Loss 6.1693 LearningRate 0.0417 Epoch: 7 Global Step: 118140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:34,234-Speed 9604.47 samples/sec Loss 6.3366 LearningRate 0.0417 Epoch: 7 Global Step: 118150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:35,337-Speed 9282.87 samples/sec Loss 6.1924 LearningRate 0.0417 Epoch: 7 Global Step: 118160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:36,433-Speed 9354.46 samples/sec Loss 6.1564 LearningRate 0.0417 Epoch: 7 Global Step: 118170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:37,539-Speed 9260.54 samples/sec Loss 6.3019 LearningRate 0.0417 Epoch: 7 Global Step: 118180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:38,675-Speed 9020.46 samples/sec Loss 6.2072 LearningRate 0.0417 Epoch: 7 Global Step: 118190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:39,770-Speed 9361.41 samples/sec Loss 6.2916 LearningRate 0.0417 Epoch: 7 Global Step: 118200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:40,826-Speed 9698.79 samples/sec Loss 6.2068 LearningRate 0.0417 Epoch: 7 Global Step: 118210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:41,920-Speed 9362.82 samples/sec Loss 6.2365 LearningRate 0.0417 Epoch: 7 Global Step: 118220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:43,024-Speed 9285.64 samples/sec Loss 6.2982 LearningRate 0.0417 Epoch: 7 Global Step: 118230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:44,105-Speed 9479.33 samples/sec Loss 6.2435 LearningRate 0.0417 Epoch: 7 Global Step: 118240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:45,141-Speed 9881.60 samples/sec Loss 6.2418 LearningRate 0.0417 Epoch: 7 Global Step: 118250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:46,194-Speed 9728.90 samples/sec Loss 6.2794 LearningRate 0.0417 Epoch: 7 Global Step: 118260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:18:47,271-Speed 9518.08 samples/sec Loss 6.1991 LearningRate 0.0417 Epoch: 7 Global Step: 118270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:18:48,370-Speed 9318.87 samples/sec Loss 6.2509 LearningRate 0.0417 Epoch: 7 Global Step: 118280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:18:49,454-Speed 9456.41 samples/sec Loss 6.2790 LearningRate 0.0417 Epoch: 7 Global Step: 118290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:18:50,600-Speed 8943.48 samples/sec Loss 6.2595 LearningRate 0.0417 Epoch: 7 Global Step: 118300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:18:51,662-Speed 9644.46 samples/sec Loss 6.3019 LearningRate 0.0417 Epoch: 7 Global Step: 118310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:18:52,756-Speed 9374.01 samples/sec Loss 6.3041 LearningRate 0.0417 Epoch: 7 Global Step: 118320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:18:53,835-Speed 9493.03 samples/sec Loss 6.2803 LearningRate 0.0417 Epoch: 7 Global Step: 118330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:18:54,893-Speed 9684.23 samples/sec Loss 6.3169 LearningRate 0.0417 Epoch: 7 Global Step: 118340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:18:55,952-Speed 9675.01 samples/sec Loss 6.3493 LearningRate 0.0417 Epoch: 7 Global Step: 118350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:18:57,056-Speed 9279.63 samples/sec Loss 6.3046 LearningRate 0.0417 Epoch: 7 Global Step: 118360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:58,136-Speed 9481.09 samples/sec Loss 6.2710 LearningRate 0.0417 Epoch: 7 Global Step: 118370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:18:59,213-Speed 9523.88 samples/sec Loss 6.3336 LearningRate 0.0417 Epoch: 7 Global Step: 118380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:00,298-Speed 9439.97 samples/sec Loss 6.1978 LearningRate 0.0416 Epoch: 7 Global Step: 118390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:01,363-Speed 9617.59 samples/sec Loss 6.2322 LearningRate 0.0416 Epoch: 7 Global Step: 118400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:02,458-Speed 9358.28 samples/sec Loss 6.2345 LearningRate 0.0416 Epoch: 7 Global Step: 118410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:03,540-Speed 9471.97 samples/sec Loss 6.1961 LearningRate 0.0416 Epoch: 7 Global Step: 118420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:04,630-Speed 9399.47 samples/sec Loss 6.2567 LearningRate 0.0416 Epoch: 7 Global Step: 118430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:05,701-Speed 9570.83 samples/sec Loss 6.3305 LearningRate 0.0416 Epoch: 7 Global Step: 118440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:06,782-Speed 9477.27 samples/sec Loss 6.2247 LearningRate 0.0416 Epoch: 7 Global Step: 118450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:07,849-Speed 9602.58 samples/sec Loss 6.3128 LearningRate 0.0416 Epoch: 7 Global Step: 118460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:08,948-Speed 9318.87 samples/sec Loss 6.3514 LearningRate 0.0416 Epoch: 7 Global Step: 118470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:10,020-Speed 9554.18 samples/sec Loss 6.2831 LearningRate 0.0416 Epoch: 7 Global Step: 118480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:11,117-Speed 9346.07 samples/sec Loss 6.2953 LearningRate 0.0416 Epoch: 7 Global Step: 118490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:12,172-Speed 9715.35 samples/sec Loss 6.1637 LearningRate 0.0416 Epoch: 7 Global Step: 118500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:13,265-Speed 9373.66 samples/sec Loss 6.3437 LearningRate 0.0416 Epoch: 7 Global Step: 118510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:14,356-Speed 9390.06 samples/sec Loss 6.2903 LearningRate 0.0416 Epoch: 7 Global Step: 118520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:15,408-Speed 9732.96 samples/sec Loss 6.4232 LearningRate 0.0416 Epoch: 7 Global Step: 118530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:16,517-Speed 9243.63 samples/sec Loss 6.1923 LearningRate 0.0416 Epoch: 7 Global Step: 118540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:17,625-Speed 9248.69 samples/sec Loss 6.2657 LearningRate 0.0416 Epoch: 7 Global Step: 118550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:18,706-Speed 9482.62 samples/sec Loss 6.2999 LearningRate 0.0416 Epoch: 7 Global Step: 118560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:19,808-Speed 9298.91 samples/sec Loss 6.3689 LearningRate 0.0416 Epoch: 7 Global Step: 118570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:20,916-Speed 9241.34 samples/sec Loss 6.3536 LearningRate 0.0416 Epoch: 7 Global Step: 118580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:22,007-Speed 9390.00 samples/sec Loss 6.3394 LearningRate 0.0416 Epoch: 7 Global Step: 118590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:23,125-Speed 9173.02 samples/sec Loss 6.2553 LearningRate 0.0416 Epoch: 7 Global Step: 118600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:24,226-Speed 9300.01 samples/sec Loss 6.3796 LearningRate 0.0416 Epoch: 7 Global Step: 118610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:25,534-Speed 7834.47 samples/sec Loss 6.2800 LearningRate 0.0416 Epoch: 7 Global Step: 118620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:26,608-Speed 9538.79 samples/sec Loss 6.3566 LearningRate 0.0416 Epoch: 7 Global Step: 118630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:27,735-Speed 9092.42 samples/sec Loss 6.2891 LearningRate 0.0416 Epoch: 7 Global Step: 118640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:28,809-Speed 9535.03 samples/sec Loss 6.1808 LearningRate 0.0415 Epoch: 7 Global Step: 118650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:29,915-Speed 9270.57 samples/sec Loss 6.3717 LearningRate 0.0415 Epoch: 7 Global Step: 118660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:31,007-Speed 9385.80 samples/sec Loss 6.3180 LearningRate 0.0415 Epoch: 7 Global Step: 118670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:32,106-Speed 9322.96 samples/sec Loss 6.2653 LearningRate 0.0415 Epoch: 7 Global Step: 118680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:33,271-Speed 8788.78 samples/sec Loss 6.1264 LearningRate 0.0415 Epoch: 7 Global Step: 118690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:34,351-Speed 9485.34 samples/sec Loss 6.2678 LearningRate 0.0415 Epoch: 7 Global Step: 118700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:35,406-Speed 9713.14 samples/sec Loss 6.2450 LearningRate 0.0415 Epoch: 7 Global Step: 118710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:36,469-Speed 9642.89 samples/sec Loss 6.3308 LearningRate 0.0415 Epoch: 7 Global Step: 118720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:37,561-Speed 9383.08 samples/sec Loss 6.3239 LearningRate 0.0415 Epoch: 7 Global Step: 118730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:38,650-Speed 9411.99 samples/sec Loss 6.2832 LearningRate 0.0415 Epoch: 7 Global Step: 118740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:39,730-Speed 9480.54 samples/sec Loss 6.2924 LearningRate 0.0415 Epoch: 7 Global Step: 118750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:40,809-Speed 9493.38 samples/sec Loss 6.3266 LearningRate 0.0415 Epoch: 7 Global Step: 118760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:41,873-Speed 9639.56 samples/sec Loss 6.3209 LearningRate 0.0415 Epoch: 7 Global Step: 118770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:43,002-Speed 9071.86 samples/sec Loss 6.2522 LearningRate 0.0415 Epoch: 7 Global Step: 118780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:44,133-Speed 9063.03 samples/sec Loss 6.2767 LearningRate 0.0415 Epoch: 7 Global Step: 118790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:45,180-Speed 9776.65 samples/sec Loss 6.3638 LearningRate 0.0415 Epoch: 7 Global Step: 118800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:46,230-Speed 9764.75 samples/sec Loss 6.3532 LearningRate 0.0415 Epoch: 7 Global Step: 118810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:47,332-Speed 9294.61 samples/sec Loss 6.3443 LearningRate 0.0415 Epoch: 7 Global Step: 118820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:48,439-Speed 9253.53 samples/sec Loss 6.3058 LearningRate 0.0415 Epoch: 7 Global Step: 118830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:49,476-Speed 9878.53 samples/sec Loss 6.3100 LearningRate 0.0415 Epoch: 7 Global Step: 118840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:50,538-Speed 9651.91 samples/sec Loss 6.2967 LearningRate 0.0415 Epoch: 7 Global Step: 118850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:51,643-Speed 9269.14 samples/sec Loss 6.3695 LearningRate 0.0415 Epoch: 7 Global Step: 118860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:52,680-Speed 9889.19 samples/sec Loss 6.4235 LearningRate 0.0415 Epoch: 7 Global Step: 118870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:53,780-Speed 9307.02 samples/sec Loss 6.3275 LearningRate 0.0415 Epoch: 7 Global Step: 118880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:19:54,866-Speed 9441.09 samples/sec Loss 6.4756 LearningRate 0.0415 Epoch: 7 Global Step: 118890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:55,978-Speed 9211.68 samples/sec Loss 6.3391 LearningRate 0.0415 Epoch: 7 Global Step: 118900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:57,101-Speed 9125.57 samples/sec Loss 6.3334 LearningRate 0.0414 Epoch: 7 Global Step: 118910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:58,196-Speed 9357.36 samples/sec Loss 6.2926 LearningRate 0.0414 Epoch: 7 Global Step: 118920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:19:59,269-Speed 9552.54 samples/sec Loss 6.3426 LearningRate 0.0414 Epoch: 7 Global Step: 118930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:00,327-Speed 9684.62 samples/sec Loss 6.3141 LearningRate 0.0414 Epoch: 7 Global Step: 118940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:01,403-Speed 9521.39 samples/sec Loss 6.3503 LearningRate 0.0414 Epoch: 7 Global Step: 118950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:02,505-Speed 9298.11 samples/sec Loss 6.2646 LearningRate 0.0414 Epoch: 7 Global Step: 118960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:03,666-Speed 8825.39 samples/sec Loss 6.4473 LearningRate 0.0414 Epoch: 7 Global Step: 118970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:04,756-Speed 9399.69 samples/sec Loss 6.3260 LearningRate 0.0414 Epoch: 7 Global Step: 118980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:05,801-Speed 9809.37 samples/sec Loss 6.4195 LearningRate 0.0414 Epoch: 7 Global Step: 118990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:06,852-Speed 9746.17 samples/sec Loss 6.3685 LearningRate 0.0414 Epoch: 7 Global Step: 119000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:07,910-Speed 9681.33 samples/sec Loss 6.4051 LearningRate 0.0414 Epoch: 7 Global Step: 119010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:08,990-Speed 9483.07 samples/sec Loss 6.3078 LearningRate 0.0414 Epoch: 7 Global Step: 119020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:10,081-Speed 9398.88 samples/sec Loss 6.3842 LearningRate 0.0414 Epoch: 7 Global Step: 119030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:11,181-Speed 9310.32 samples/sec Loss 6.2783 LearningRate 0.0414 Epoch: 7 Global Step: 119040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:12,252-Speed 9571.90 samples/sec Loss 6.2897 LearningRate 0.0414 Epoch: 7 Global Step: 119050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:13,342-Speed 9394.10 samples/sec Loss 6.3117 LearningRate 0.0414 Epoch: 7 Global Step: 119060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:14,452-Speed 9236.53 samples/sec Loss 6.4500 LearningRate 0.0414 Epoch: 7 Global Step: 119070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:15,565-Speed 9204.81 samples/sec Loss 6.3612 LearningRate 0.0414 Epoch: 7 Global Step: 119080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:16,655-Speed 9399.38 samples/sec Loss 6.4287 LearningRate 0.0414 Epoch: 7 Global Step: 119090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:17,719-Speed 9635.69 samples/sec Loss 6.3519 LearningRate 0.0414 Epoch: 7 Global Step: 119100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:18,784-Speed 9620.47 samples/sec Loss 6.3469 LearningRate 0.0414 Epoch: 7 Global Step: 119110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:19,851-Speed 9600.94 samples/sec Loss 6.3530 LearningRate 0.0414 Epoch: 7 Global Step: 119120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:20,939-Speed 9414.47 samples/sec Loss 6.4710 LearningRate 0.0414 Epoch: 7 Global Step: 119130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:21,994-Speed 9714.01 samples/sec Loss 6.3286 LearningRate 0.0414 Epoch: 7 Global Step: 119140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:23,061-Speed 9605.63 samples/sec Loss 6.2033 LearningRate 0.0414 Epoch: 7 Global Step: 119150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:24,139-Speed 9498.38 samples/sec Loss 6.3312 LearningRate 0.0414 Epoch: 7 Global Step: 119160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:25,224-Speed 9447.08 samples/sec Loss 6.4449 LearningRate 0.0413 Epoch: 7 Global Step: 119170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:26,314-Speed 9400.27 samples/sec Loss 6.3910 LearningRate 0.0413 Epoch: 7 Global Step: 119180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:27,408-Speed 9365.57 samples/sec Loss 6.3421 LearningRate 0.0413 Epoch: 7 Global Step: 119190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:28,492-Speed 9454.58 samples/sec Loss 6.4174 LearningRate 0.0413 Epoch: 7 Global Step: 119200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:29,585-Speed 9377.64 samples/sec Loss 6.3898 LearningRate 0.0413 Epoch: 7 Global Step: 119210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:30,681-Speed 9343.60 samples/sec Loss 6.3982 LearningRate 0.0413 Epoch: 7 Global Step: 119220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:31,816-Speed 9025.93 samples/sec Loss 6.3995 LearningRate 0.0413 Epoch: 7 Global Step: 119230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:32,882-Speed 9616.03 samples/sec Loss 6.3355 LearningRate 0.0413 Epoch: 7 Global Step: 119240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:34,021-Speed 8994.61 samples/sec Loss 6.4833 LearningRate 0.0413 Epoch: 7 Global Step: 119250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:35,100-Speed 9494.95 samples/sec Loss 6.4593 LearningRate 0.0413 Epoch: 7 Global Step: 119260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:36,181-Speed 9476.05 samples/sec Loss 6.4218 LearningRate 0.0413 Epoch: 7 Global Step: 119270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:37,263-Speed 9467.36 samples/sec Loss 6.3669 LearningRate 0.0413 Epoch: 7 Global Step: 119280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:38,384-Speed 9140.38 samples/sec Loss 6.3699 LearningRate 0.0413 Epoch: 7 Global Step: 119290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:39,462-Speed 9503.18 samples/sec Loss 6.4273 LearningRate 0.0413 Epoch: 7 Global Step: 119300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:40,543-Speed 9484.58 samples/sec Loss 6.3982 LearningRate 0.0413 Epoch: 7 Global Step: 119310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:41,619-Speed 9524.47 samples/sec Loss 6.3496 LearningRate 0.0413 Epoch: 7 Global Step: 119320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:42,730-Speed 9228.15 samples/sec Loss 6.3914 LearningRate 0.0413 Epoch: 7 Global Step: 119330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:43,881-Speed 8901.80 samples/sec Loss 6.3524 LearningRate 0.0413 Epoch: 7 Global Step: 119340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:44,962-Speed 9478.66 samples/sec Loss 6.4801 LearningRate 0.0413 Epoch: 7 Global Step: 119350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:46,059-Speed 9345.34 samples/sec Loss 6.4342 LearningRate 0.0413 Epoch: 7 Global Step: 119360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:47,154-Speed 9351.61 samples/sec Loss 6.3955 LearningRate 0.0413 Epoch: 7 Global Step: 119370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:48,225-Speed 9569.56 samples/sec Loss 6.3281 LearningRate 0.0413 Epoch: 7 Global Step: 119380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:49,294-Speed 9583.10 samples/sec Loss 6.4164 LearningRate 0.0413 Epoch: 7 Global Step: 119390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:50,413-Speed 9154.56 samples/sec Loss 6.3501 LearningRate 0.0413 Epoch: 7 Global Step: 119400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:51,467-Speed 9720.25 samples/sec Loss 6.3960 LearningRate 0.0413 Epoch: 7 Global Step: 119410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:52,491-Speed 10012.86 samples/sec Loss 6.4565 LearningRate 0.0413 Epoch: 7 Global Step: 119420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:53,570-Speed 9495.43 samples/sec Loss 6.3911 LearningRate 0.0412 Epoch: 7 Global Step: 119430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:54,628-Speed 9681.48 samples/sec Loss 6.4006 LearningRate 0.0412 Epoch: 7 Global Step: 119440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:55,688-Speed 9669.80 samples/sec Loss 6.4736 LearningRate 0.0412 Epoch: 7 Global Step: 119450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:56,796-Speed 9246.41 samples/sec Loss 6.4231 LearningRate 0.0412 Epoch: 7 Global Step: 119460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:20:57,922-Speed 9097.62 samples/sec Loss 6.4709 LearningRate 0.0412 Epoch: 7 Global Step: 119470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:20:58,997-Speed 9539.87 samples/sec Loss 6.2846 LearningRate 0.0412 Epoch: 7 Global Step: 119480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:00,057-Speed 9657.69 samples/sec Loss 6.4137 LearningRate 0.0412 Epoch: 7 Global Step: 119490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:01,162-Speed 9277.25 samples/sec Loss 6.3384 LearningRate 0.0412 Epoch: 7 Global Step: 119500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:02,254-Speed 9381.67 samples/sec Loss 6.4558 LearningRate 0.0412 Epoch: 7 Global Step: 119510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:03,358-Speed 9285.83 samples/sec Loss 6.5145 LearningRate 0.0412 Epoch: 7 Global Step: 119520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:04,444-Speed 9435.46 samples/sec Loss 6.3780 LearningRate 0.0412 Epoch: 7 Global Step: 119530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:05,524-Speed 9482.36 samples/sec Loss 6.4499 LearningRate 0.0412 Epoch: 7 Global Step: 119540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:06,584-Speed 9668.77 samples/sec Loss 6.3871 LearningRate 0.0412 Epoch: 7 Global Step: 119550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:07,710-Speed 9097.60 samples/sec Loss 6.4062 LearningRate 0.0412 Epoch: 7 Global Step: 119560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:08,799-Speed 9412.21 samples/sec Loss 6.4444 LearningRate 0.0412 Epoch: 7 Global Step: 119570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:09,881-Speed 9468.23 samples/sec Loss 6.3494 LearningRate 0.0412 Epoch: 7 Global Step: 119580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:10,960-Speed 9494.37 samples/sec Loss 6.3295 LearningRate 0.0412 Epoch: 7 Global Step: 119590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:12,084-Speed 9114.79 samples/sec Loss 6.3648 LearningRate 0.0412 Epoch: 7 Global Step: 119600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:13,158-Speed 9541.74 samples/sec Loss 6.3352 LearningRate 0.0412 Epoch: 7 Global Step: 119610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:14,223-Speed 9625.88 samples/sec Loss 6.3811 LearningRate 0.0412 Epoch: 7 Global Step: 119620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:15,266-Speed 9816.30 samples/sec Loss 6.3414 LearningRate 0.0412 Epoch: 7 Global Step: 119630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:16,379-Speed 9207.32 samples/sec Loss 6.4653 LearningRate 0.0412 Epoch: 7 Global Step: 119640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:17,488-Speed 9241.90 samples/sec Loss 6.4089 LearningRate 0.0412 Epoch: 7 Global Step: 119650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:18,558-Speed 9575.56 samples/sec Loss 6.4292 LearningRate 0.0412 Epoch: 7 Global Step: 119660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:19,687-Speed 9070.97 samples/sec Loss 6.4609 LearningRate 0.0412 Epoch: 7 Global Step: 119670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:20,762-Speed 9533.91 samples/sec Loss 6.4745 LearningRate 0.0412 Epoch: 7 Global Step: 119680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:21,821-Speed 9679.99 samples/sec Loss 6.4397 LearningRate 0.0411 Epoch: 7 Global Step: 119690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:22,919-Speed 9332.73 samples/sec Loss 6.4778 LearningRate 0.0411 Epoch: 7 Global Step: 119700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:23,954-Speed 9901.63 samples/sec Loss 6.4313 LearningRate 0.0411 Epoch: 7 Global Step: 119710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:25,059-Speed 9270.57 samples/sec Loss 6.3751 LearningRate 0.0411 Epoch: 7 Global Step: 119720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:26,138-Speed 9492.93 samples/sec Loss 6.3569 LearningRate 0.0411 Epoch: 7 Global Step: 119730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:27,251-Speed 9206.35 samples/sec Loss 6.3865 LearningRate 0.0411 Epoch: 7 Global Step: 119740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:28,337-Speed 9430.43 samples/sec Loss 6.3472 LearningRate 0.0411 Epoch: 7 Global Step: 119750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:29,422-Speed 9447.94 samples/sec Loss 6.4496 LearningRate 0.0411 Epoch: 7 Global Step: 119760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:30,511-Speed 9412.42 samples/sec Loss 6.4348 LearningRate 0.0411 Epoch: 7 Global Step: 119770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:31,618-Speed 9252.09 samples/sec Loss 6.4456 LearningRate 0.0411 Epoch: 7 Global Step: 119780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:32,664-Speed 9800.75 samples/sec Loss 6.3677 LearningRate 0.0411 Epoch: 7 Global Step: 119790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:33,737-Speed 9546.98 samples/sec Loss 6.4241 LearningRate 0.0411 Epoch: 7 Global Step: 119800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:34,799-Speed 9645.74 samples/sec Loss 6.4700 LearningRate 0.0411 Epoch: 7 Global Step: 119810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:35,880-Speed 9473.99 samples/sec Loss 6.4245 LearningRate 0.0411 Epoch: 7 Global Step: 119820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:37,014-Speed 9038.33 samples/sec Loss 6.5128 LearningRate 0.0411 Epoch: 7 Global Step: 119830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:38,134-Speed 9144.45 samples/sec Loss 6.4824 LearningRate 0.0411 Epoch: 7 Global Step: 119840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:39,217-Speed 9461.46 samples/sec Loss 6.3298 LearningRate 0.0411 Epoch: 7 Global Step: 119850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:40,283-Speed 9615.99 samples/sec Loss 6.3507 LearningRate 0.0411 Epoch: 7 Global Step: 119860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:41,358-Speed 9529.83 samples/sec Loss 6.4710 LearningRate 0.0411 Epoch: 7 Global Step: 119870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:42,437-Speed 9504.19 samples/sec Loss 6.5262 LearningRate 0.0411 Epoch: 7 Global Step: 119880 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:21:43,514-Speed 9513.02 samples/sec Loss 6.3972 LearningRate 0.0411 Epoch: 7 Global Step: 119890 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:21:44,559-Speed 9803.30 samples/sec Loss 6.4485 LearningRate 0.0411 Epoch: 7 Global Step: 119900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:45,653-Speed 9361.30 samples/sec Loss 6.3015 LearningRate 0.0411 Epoch: 7 Global Step: 119910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:46,744-Speed 9389.30 samples/sec Loss 6.4031 LearningRate 0.0411 Epoch: 7 Global Step: 119920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:47,840-Speed 9349.17 samples/sec Loss 6.4960 LearningRate 0.0411 Epoch: 7 Global Step: 119930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:48,929-Speed 9414.17 samples/sec Loss 6.4151 LearningRate 0.0411 Epoch: 7 Global Step: 119940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:50,016-Speed 9426.64 samples/sec Loss 6.5397 LearningRate 0.0410 Epoch: 7 Global Step: 119950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:51,071-Speed 9710.80 samples/sec Loss 6.4463 LearningRate 0.0410 Epoch: 7 Global Step: 119960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:52,169-Speed 9329.30 samples/sec Loss 6.3540 LearningRate 0.0410 Epoch: 7 Global Step: 119970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:53,298-Speed 9081.23 samples/sec Loss 6.4611 LearningRate 0.0410 Epoch: 7 Global Step: 119980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:21:54,387-Speed 9402.06 samples/sec Loss 6.4183 LearningRate 0.0410 Epoch: 7 Global Step: 119990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:21:55,454-Speed 9601.28 samples/sec Loss 6.4138 LearningRate 0.0410 Epoch: 7 Global Step: 120000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:22:17,577-[lfw][120000]XNorm: 11.040144 Training: 2022-04-11 16:22:17,578-[lfw][120000]Accuracy-Flip: 0.99650+-0.00293 Training: 2022-04-11 16:22:17,578-[lfw][120000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:22:43,050-[cfp_fp][120000]XNorm: 9.430222 Training: 2022-04-11 16:22:43,051-[cfp_fp][120000]Accuracy-Flip: 0.95329+-0.01314 Training: 2022-04-11 16:22:43,051-[cfp_fp][120000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:23:04,878-[agedb_30][120000]XNorm: 10.759171 Training: 2022-04-11 16:23:04,879-[agedb_30][120000]Accuracy-Flip: 0.96283+-0.01038 Training: 2022-04-11 16:23:04,880-[agedb_30][120000]Accuracy-Highest: 0.96483 Training: 2022-04-11 16:23:05,985-Speed 145.19 samples/sec Loss 6.5329 LearningRate 0.0410 Epoch: 7 Global Step: 120010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:07,072-Speed 9418.65 samples/sec Loss 6.4126 LearningRate 0.0410 Epoch: 7 Global Step: 120020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:08,122-Speed 9763.41 samples/sec Loss 6.4684 LearningRate 0.0410 Epoch: 7 Global Step: 120030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:09,182-Speed 9666.10 samples/sec Loss 6.3350 LearningRate 0.0410 Epoch: 7 Global Step: 120040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:10,279-Speed 9342.69 samples/sec Loss 6.4944 LearningRate 0.0410 Epoch: 7 Global Step: 120050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:11,370-Speed 9382.55 samples/sec Loss 6.4769 LearningRate 0.0410 Epoch: 7 Global Step: 120060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:12,452-Speed 9468.68 samples/sec Loss 6.4146 LearningRate 0.0410 Epoch: 7 Global Step: 120070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:13,600-Speed 8929.55 samples/sec Loss 6.2905 LearningRate 0.0410 Epoch: 7 Global Step: 120080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:14,670-Speed 9573.63 samples/sec Loss 6.4457 LearningRate 0.0410 Epoch: 7 Global Step: 120090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:15,726-Speed 9698.98 samples/sec Loss 6.3863 LearningRate 0.0410 Epoch: 7 Global Step: 120100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:16,776-Speed 9763.97 samples/sec Loss 6.3918 LearningRate 0.0410 Epoch: 7 Global Step: 120110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:17,884-Speed 9250.39 samples/sec Loss 6.4814 LearningRate 0.0410 Epoch: 7 Global Step: 120120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:18,976-Speed 9378.76 samples/sec Loss 6.4358 LearningRate 0.0410 Epoch: 7 Global Step: 120130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:20,060-Speed 9455.07 samples/sec Loss 6.4599 LearningRate 0.0410 Epoch: 7 Global Step: 120140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:21,113-Speed 9726.74 samples/sec Loss 6.4492 LearningRate 0.0410 Epoch: 7 Global Step: 120150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:22,188-Speed 9531.14 samples/sec Loss 6.4460 LearningRate 0.0410 Epoch: 7 Global Step: 120160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:23,307-Speed 9163.09 samples/sec Loss 6.4412 LearningRate 0.0410 Epoch: 7 Global Step: 120170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:24,403-Speed 9350.99 samples/sec Loss 6.4193 LearningRate 0.0410 Epoch: 7 Global Step: 120180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:25,469-Speed 9616.47 samples/sec Loss 6.3498 LearningRate 0.0410 Epoch: 7 Global Step: 120190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:26,557-Speed 9417.29 samples/sec Loss 6.4644 LearningRate 0.0410 Epoch: 7 Global Step: 120200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:27,642-Speed 9435.69 samples/sec Loss 6.3959 LearningRate 0.0409 Epoch: 7 Global Step: 120210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:28,742-Speed 9320.27 samples/sec Loss 6.4833 LearningRate 0.0409 Epoch: 7 Global Step: 120220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:29,796-Speed 9719.67 samples/sec Loss 6.3904 LearningRate 0.0409 Epoch: 7 Global Step: 120230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:30,847-Speed 9746.09 samples/sec Loss 6.4881 LearningRate 0.0409 Epoch: 7 Global Step: 120240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:31,935-Speed 9413.65 samples/sec Loss 6.4551 LearningRate 0.0409 Epoch: 7 Global Step: 120250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:33,034-Speed 9323.83 samples/sec Loss 6.3560 LearningRate 0.0409 Epoch: 7 Global Step: 120260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:34,110-Speed 9520.82 samples/sec Loss 6.3806 LearningRate 0.0409 Epoch: 7 Global Step: 120270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:35,189-Speed 9494.52 samples/sec Loss 6.4757 LearningRate 0.0409 Epoch: 7 Global Step: 120280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:36,292-Speed 9293.16 samples/sec Loss 6.3807 LearningRate 0.0409 Epoch: 7 Global Step: 120290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:37,358-Speed 9609.30 samples/sec Loss 6.3786 LearningRate 0.0409 Epoch: 7 Global Step: 120300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:38,456-Speed 9336.89 samples/sec Loss 6.3815 LearningRate 0.0409 Epoch: 7 Global Step: 120310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:39,538-Speed 9463.41 samples/sec Loss 6.4980 LearningRate 0.0409 Epoch: 7 Global Step: 120320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:40,644-Speed 9266.72 samples/sec Loss 6.4125 LearningRate 0.0409 Epoch: 7 Global Step: 120330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:41,742-Speed 9331.69 samples/sec Loss 6.5060 LearningRate 0.0409 Epoch: 7 Global Step: 120340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:42,835-Speed 9380.13 samples/sec Loss 6.5995 LearningRate 0.0409 Epoch: 7 Global Step: 120350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:43,941-Speed 9265.20 samples/sec Loss 6.6226 LearningRate 0.0409 Epoch: 7 Global Step: 120360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:44,973-Speed 9927.13 samples/sec Loss 6.4240 LearningRate 0.0409 Epoch: 7 Global Step: 120370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:46,028-Speed 9710.80 samples/sec Loss 6.5369 LearningRate 0.0409 Epoch: 7 Global Step: 120380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:47,090-Speed 9649.76 samples/sec Loss 6.5783 LearningRate 0.0409 Epoch: 7 Global Step: 120390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:48,216-Speed 9099.42 samples/sec Loss 6.4293 LearningRate 0.0409 Epoch: 7 Global Step: 120400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:49,281-Speed 9619.62 samples/sec Loss 6.4631 LearningRate 0.0409 Epoch: 7 Global Step: 120410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:50,376-Speed 9358.45 samples/sec Loss 6.5838 LearningRate 0.0409 Epoch: 7 Global Step: 120420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:51,438-Speed 9648.91 samples/sec Loss 6.3614 LearningRate 0.0409 Epoch: 7 Global Step: 120430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:52,510-Speed 9558.99 samples/sec Loss 6.5414 LearningRate 0.0409 Epoch: 7 Global Step: 120440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:53,596-Speed 9427.51 samples/sec Loss 6.4709 LearningRate 0.0409 Epoch: 7 Global Step: 120450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:54,651-Speed 9725.85 samples/sec Loss 6.4357 LearningRate 0.0409 Epoch: 7 Global Step: 120460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:55,729-Speed 9503.99 samples/sec Loss 6.5114 LearningRate 0.0408 Epoch: 7 Global Step: 120470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:23:56,854-Speed 9111.36 samples/sec Loss 6.4435 LearningRate 0.0408 Epoch: 7 Global Step: 120480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:57,984-Speed 9062.32 samples/sec Loss 6.4887 LearningRate 0.0408 Epoch: 7 Global Step: 120490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:23:59,040-Speed 9704.29 samples/sec Loss 6.5130 LearningRate 0.0408 Epoch: 7 Global Step: 120500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:00,126-Speed 9435.46 samples/sec Loss 6.3846 LearningRate 0.0408 Epoch: 7 Global Step: 120510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:01,236-Speed 9237.64 samples/sec Loss 6.3904 LearningRate 0.0408 Epoch: 7 Global Step: 120520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:02,311-Speed 9527.26 samples/sec Loss 6.3811 LearningRate 0.0408 Epoch: 7 Global Step: 120530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:03,407-Speed 9349.70 samples/sec Loss 6.5282 LearningRate 0.0408 Epoch: 7 Global Step: 120540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:04,493-Speed 9431.41 samples/sec Loss 6.5416 LearningRate 0.0408 Epoch: 7 Global Step: 120550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:05,579-Speed 9437.82 samples/sec Loss 6.5517 LearningRate 0.0408 Epoch: 7 Global Step: 120560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:06,644-Speed 9621.98 samples/sec Loss 6.5429 LearningRate 0.0408 Epoch: 7 Global Step: 120570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:07,706-Speed 9651.80 samples/sec Loss 6.4328 LearningRate 0.0408 Epoch: 7 Global Step: 120580 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:24:08,791-Speed 9446.13 samples/sec Loss 6.4605 LearningRate 0.0408 Epoch: 7 Global Step: 120590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:09,856-Speed 9613.32 samples/sec Loss 6.4903 LearningRate 0.0408 Epoch: 7 Global Step: 120600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:10,974-Speed 9170.55 samples/sec Loss 6.4515 LearningRate 0.0408 Epoch: 7 Global Step: 120610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:12,049-Speed 9532.02 samples/sec Loss 6.4747 LearningRate 0.0408 Epoch: 7 Global Step: 120620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:13,262-Speed 8443.99 samples/sec Loss 6.5174 LearningRate 0.0408 Epoch: 7 Global Step: 120630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:14,323-Speed 9659.08 samples/sec Loss 6.5681 LearningRate 0.0408 Epoch: 7 Global Step: 120640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:15,456-Speed 9037.75 samples/sec Loss 6.5375 LearningRate 0.0408 Epoch: 7 Global Step: 120650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:16,575-Speed 9155.08 samples/sec Loss 6.4409 LearningRate 0.0408 Epoch: 7 Global Step: 120660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:17,693-Speed 9178.90 samples/sec Loss 6.4354 LearningRate 0.0408 Epoch: 7 Global Step: 120670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:18,808-Speed 9190.69 samples/sec Loss 6.5075 LearningRate 0.0408 Epoch: 7 Global Step: 120680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:19,909-Speed 9302.88 samples/sec Loss 6.4860 LearningRate 0.0408 Epoch: 7 Global Step: 120690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:20,997-Speed 9421.77 samples/sec Loss 6.4224 LearningRate 0.0408 Epoch: 7 Global Step: 120700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:22,073-Speed 9521.60 samples/sec Loss 6.3321 LearningRate 0.0408 Epoch: 7 Global Step: 120710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:23,149-Speed 9516.10 samples/sec Loss 6.4761 LearningRate 0.0408 Epoch: 7 Global Step: 120720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:24,227-Speed 9506.57 samples/sec Loss 6.5198 LearningRate 0.0407 Epoch: 7 Global Step: 120730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:25,333-Speed 9265.42 samples/sec Loss 6.4075 LearningRate 0.0407 Epoch: 7 Global Step: 120740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:26,455-Speed 9129.86 samples/sec Loss 6.4240 LearningRate 0.0407 Epoch: 7 Global Step: 120750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:27,544-Speed 9410.54 samples/sec Loss 6.4888 LearningRate 0.0407 Epoch: 7 Global Step: 120760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:28,640-Speed 9349.77 samples/sec Loss 6.6073 LearningRate 0.0407 Epoch: 7 Global Step: 120770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:29,774-Speed 9032.11 samples/sec Loss 6.5501 LearningRate 0.0407 Epoch: 7 Global Step: 120780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:30,899-Speed 9111.80 samples/sec Loss 6.4453 LearningRate 0.0407 Epoch: 7 Global Step: 120790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:31,984-Speed 9438.02 samples/sec Loss 6.4164 LearningRate 0.0407 Epoch: 7 Global Step: 120800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:33,065-Speed 9478.21 samples/sec Loss 6.4529 LearningRate 0.0407 Epoch: 7 Global Step: 120810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:34,148-Speed 9458.20 samples/sec Loss 6.4259 LearningRate 0.0407 Epoch: 7 Global Step: 120820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:35,234-Speed 9435.61 samples/sec Loss 6.4271 LearningRate 0.0407 Epoch: 7 Global Step: 120830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:36,325-Speed 9400.59 samples/sec Loss 6.4736 LearningRate 0.0407 Epoch: 7 Global Step: 120840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:37,368-Speed 9820.07 samples/sec Loss 6.5029 LearningRate 0.0407 Epoch: 7 Global Step: 120850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:38,423-Speed 9715.25 samples/sec Loss 6.5033 LearningRate 0.0407 Epoch: 7 Global Step: 120860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:39,517-Speed 9367.06 samples/sec Loss 6.4450 LearningRate 0.0407 Epoch: 7 Global Step: 120870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:40,650-Speed 9039.18 samples/sec Loss 6.4152 LearningRate 0.0407 Epoch: 7 Global Step: 120880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:41,737-Speed 9430.40 samples/sec Loss 6.4181 LearningRate 0.0407 Epoch: 7 Global Step: 120890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:42,843-Speed 9256.52 samples/sec Loss 6.6182 LearningRate 0.0407 Epoch: 7 Global Step: 120900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:43,955-Speed 9213.72 samples/sec Loss 6.4560 LearningRate 0.0407 Epoch: 7 Global Step: 120910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:45,071-Speed 9180.64 samples/sec Loss 6.3953 LearningRate 0.0407 Epoch: 7 Global Step: 120920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:46,124-Speed 9729.94 samples/sec Loss 6.5601 LearningRate 0.0407 Epoch: 7 Global Step: 120930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:47,193-Speed 9587.87 samples/sec Loss 6.4537 LearningRate 0.0407 Epoch: 7 Global Step: 120940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:48,283-Speed 9400.13 samples/sec Loss 6.3464 LearningRate 0.0407 Epoch: 7 Global Step: 120950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:24:49,385-Speed 9302.45 samples/sec Loss 6.3972 LearningRate 0.0407 Epoch: 7 Global Step: 120960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:50,448-Speed 9632.99 samples/sec Loss 6.4448 LearningRate 0.0407 Epoch: 7 Global Step: 120970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:51,486-Speed 9873.72 samples/sec Loss 6.4627 LearningRate 0.0407 Epoch: 7 Global Step: 120980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:52,567-Speed 9477.44 samples/sec Loss 6.3426 LearningRate 0.0406 Epoch: 7 Global Step: 120990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:53,652-Speed 9439.03 samples/sec Loss 6.4352 LearningRate 0.0406 Epoch: 7 Global Step: 121000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:54,747-Speed 9362.89 samples/sec Loss 6.4918 LearningRate 0.0406 Epoch: 7 Global Step: 121010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:55,834-Speed 9430.21 samples/sec Loss 6.4170 LearningRate 0.0406 Epoch: 7 Global Step: 121020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:56,969-Speed 9030.74 samples/sec Loss 6.4241 LearningRate 0.0406 Epoch: 7 Global Step: 121030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:58,043-Speed 9535.90 samples/sec Loss 6.5042 LearningRate 0.0406 Epoch: 7 Global Step: 121040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:24:59,197-Speed 8876.11 samples/sec Loss 6.5622 LearningRate 0.0406 Epoch: 7 Global Step: 121050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:00,281-Speed 9455.96 samples/sec Loss 6.5223 LearningRate 0.0406 Epoch: 7 Global Step: 121060 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:25:01,353-Speed 9551.48 samples/sec Loss 6.4021 LearningRate 0.0406 Epoch: 7 Global Step: 121070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:02,438-Speed 9444.58 samples/sec Loss 6.5123 LearningRate 0.0406 Epoch: 7 Global Step: 121080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:03,501-Speed 9638.71 samples/sec Loss 6.4460 LearningRate 0.0406 Epoch: 7 Global Step: 121090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:04,564-Speed 9638.28 samples/sec Loss 6.5339 LearningRate 0.0406 Epoch: 7 Global Step: 121100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:05,676-Speed 9214.47 samples/sec Loss 6.5030 LearningRate 0.0406 Epoch: 7 Global Step: 121110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:06,752-Speed 9528.22 samples/sec Loss 6.5477 LearningRate 0.0406 Epoch: 7 Global Step: 121120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:07,870-Speed 9163.76 samples/sec Loss 6.4486 LearningRate 0.0406 Epoch: 7 Global Step: 121130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:08,974-Speed 9281.96 samples/sec Loss 6.4347 LearningRate 0.0406 Epoch: 7 Global Step: 121140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:10,101-Speed 9088.76 samples/sec Loss 6.5506 LearningRate 0.0406 Epoch: 7 Global Step: 121150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:11,214-Speed 9203.32 samples/sec Loss 6.5166 LearningRate 0.0406 Epoch: 7 Global Step: 121160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:12,281-Speed 9604.36 samples/sec Loss 6.4564 LearningRate 0.0406 Epoch: 7 Global Step: 121170 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:25:13,381-Speed 9310.47 samples/sec Loss 6.5135 LearningRate 0.0406 Epoch: 7 Global Step: 121180 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:25:14,455-Speed 9545.07 samples/sec Loss 6.4984 LearningRate 0.0406 Epoch: 7 Global Step: 121190 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:25:15,611-Speed 8862.57 samples/sec Loss 6.5316 LearningRate 0.0406 Epoch: 7 Global Step: 121200 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:25:16,718-Speed 9254.55 samples/sec Loss 6.4868 LearningRate 0.0406 Epoch: 7 Global Step: 121210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:17,791-Speed 9555.42 samples/sec Loss 6.4603 LearningRate 0.0406 Epoch: 7 Global Step: 121220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:18,868-Speed 9515.84 samples/sec Loss 6.5713 LearningRate 0.0406 Epoch: 7 Global Step: 121230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:19,979-Speed 9219.86 samples/sec Loss 6.4506 LearningRate 0.0406 Epoch: 7 Global Step: 121240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:21,062-Speed 9460.70 samples/sec Loss 6.5452 LearningRate 0.0405 Epoch: 7 Global Step: 121250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:22,160-Speed 9327.63 samples/sec Loss 6.5122 LearningRate 0.0405 Epoch: 7 Global Step: 121260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:23,287-Speed 9094.82 samples/sec Loss 6.5032 LearningRate 0.0405 Epoch: 7 Global Step: 121270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:24,336-Speed 9766.55 samples/sec Loss 6.4831 LearningRate 0.0405 Epoch: 7 Global Step: 121280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:25,405-Speed 9583.62 samples/sec Loss 6.3698 LearningRate 0.0405 Epoch: 7 Global Step: 121290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:26,469-Speed 9634.80 samples/sec Loss 6.4331 LearningRate 0.0405 Epoch: 7 Global Step: 121300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:27,598-Speed 9071.84 samples/sec Loss 6.4766 LearningRate 0.0405 Epoch: 7 Global Step: 121310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:28,720-Speed 9130.22 samples/sec Loss 6.5756 LearningRate 0.0405 Epoch: 7 Global Step: 121320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:29,829-Speed 9238.21 samples/sec Loss 6.4370 LearningRate 0.0405 Epoch: 7 Global Step: 121330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:30,893-Speed 9632.38 samples/sec Loss 6.5788 LearningRate 0.0405 Epoch: 7 Global Step: 121340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:32,010-Speed 9170.15 samples/sec Loss 6.5206 LearningRate 0.0405 Epoch: 7 Global Step: 121350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:33,107-Speed 9340.05 samples/sec Loss 6.5682 LearningRate 0.0405 Epoch: 7 Global Step: 121360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:34,189-Speed 9474.38 samples/sec Loss 6.5275 LearningRate 0.0405 Epoch: 7 Global Step: 121370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:35,270-Speed 9474.44 samples/sec Loss 6.5572 LearningRate 0.0405 Epoch: 7 Global Step: 121380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:36,332-Speed 9648.64 samples/sec Loss 6.5387 LearningRate 0.0405 Epoch: 7 Global Step: 121390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:37,443-Speed 9222.26 samples/sec Loss 6.4652 LearningRate 0.0405 Epoch: 7 Global Step: 121400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:38,519-Speed 9521.48 samples/sec Loss 6.5129 LearningRate 0.0405 Epoch: 7 Global Step: 121410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:39,592-Speed 9553.17 samples/sec Loss 6.5054 LearningRate 0.0405 Epoch: 7 Global Step: 121420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:40,686-Speed 9365.18 samples/sec Loss 6.4465 LearningRate 0.0405 Epoch: 7 Global Step: 121430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:41,764-Speed 9501.89 samples/sec Loss 6.4223 LearningRate 0.0405 Epoch: 7 Global Step: 121440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:42,851-Speed 9425.18 samples/sec Loss 6.5047 LearningRate 0.0405 Epoch: 7 Global Step: 121450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:43,936-Speed 9448.07 samples/sec Loss 6.5423 LearningRate 0.0405 Epoch: 7 Global Step: 121460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:45,011-Speed 9525.29 samples/sec Loss 6.4946 LearningRate 0.0405 Epoch: 7 Global Step: 121470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:46,083-Speed 9556.43 samples/sec Loss 6.4584 LearningRate 0.0405 Epoch: 7 Global Step: 121480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:47,134-Speed 9753.25 samples/sec Loss 6.4587 LearningRate 0.0405 Epoch: 7 Global Step: 121490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:48,201-Speed 9600.11 samples/sec Loss 6.4454 LearningRate 0.0405 Epoch: 7 Global Step: 121500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:25:49,334-Speed 9048.68 samples/sec Loss 6.5335 LearningRate 0.0404 Epoch: 7 Global Step: 121510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:50,434-Speed 9309.44 samples/sec Loss 6.5404 LearningRate 0.0404 Epoch: 7 Global Step: 121520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:51,545-Speed 9227.13 samples/sec Loss 6.3990 LearningRate 0.0404 Epoch: 7 Global Step: 121530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:52,655-Speed 9226.15 samples/sec Loss 6.5067 LearningRate 0.0404 Epoch: 7 Global Step: 121540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:53,776-Speed 9145.65 samples/sec Loss 6.5649 LearningRate 0.0404 Epoch: 7 Global Step: 121550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:54,894-Speed 9163.17 samples/sec Loss 6.5555 LearningRate 0.0404 Epoch: 7 Global Step: 121560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:56,061-Speed 8781.37 samples/sec Loss 6.4425 LearningRate 0.0404 Epoch: 7 Global Step: 121570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:57,126-Speed 9625.46 samples/sec Loss 6.4474 LearningRate 0.0404 Epoch: 7 Global Step: 121580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:58,225-Speed 9318.17 samples/sec Loss 6.4515 LearningRate 0.0404 Epoch: 7 Global Step: 121590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:25:59,313-Speed 9418.44 samples/sec Loss 6.5039 LearningRate 0.0404 Epoch: 7 Global Step: 121600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:00,366-Speed 9732.58 samples/sec Loss 6.6260 LearningRate 0.0404 Epoch: 7 Global Step: 121610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:01,440-Speed 9539.95 samples/sec Loss 6.4964 LearningRate 0.0404 Epoch: 7 Global Step: 121620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:02,508-Speed 9588.85 samples/sec Loss 6.4841 LearningRate 0.0404 Epoch: 7 Global Step: 121630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:03,585-Speed 9512.80 samples/sec Loss 6.5179 LearningRate 0.0404 Epoch: 7 Global Step: 121640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:04,658-Speed 9548.82 samples/sec Loss 6.4607 LearningRate 0.0404 Epoch: 7 Global Step: 121650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:05,729-Speed 9562.26 samples/sec Loss 6.5231 LearningRate 0.0404 Epoch: 7 Global Step: 121660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:06,819-Speed 9405.18 samples/sec Loss 6.5528 LearningRate 0.0404 Epoch: 7 Global Step: 121670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:07,894-Speed 9532.48 samples/sec Loss 6.4923 LearningRate 0.0404 Epoch: 7 Global Step: 121680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:08,959-Speed 9625.39 samples/sec Loss 6.4591 LearningRate 0.0404 Epoch: 7 Global Step: 121690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:10,055-Speed 9343.39 samples/sec Loss 6.5286 LearningRate 0.0404 Epoch: 7 Global Step: 121700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:11,142-Speed 9431.63 samples/sec Loss 6.4956 LearningRate 0.0404 Epoch: 7 Global Step: 121710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:12,218-Speed 9519.81 samples/sec Loss 6.4267 LearningRate 0.0404 Epoch: 7 Global Step: 121720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:13,305-Speed 9425.12 samples/sec Loss 6.3210 LearningRate 0.0404 Epoch: 7 Global Step: 121730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:14,363-Speed 9683.21 samples/sec Loss 6.4732 LearningRate 0.0404 Epoch: 7 Global Step: 121740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:15,437-Speed 9543.96 samples/sec Loss 6.5774 LearningRate 0.0404 Epoch: 7 Global Step: 121750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:16,523-Speed 9433.12 samples/sec Loss 6.5065 LearningRate 0.0404 Epoch: 7 Global Step: 121760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:17,603-Speed 9487.71 samples/sec Loss 6.5532 LearningRate 0.0404 Epoch: 7 Global Step: 121770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:18,655-Speed 9744.13 samples/sec Loss 6.4588 LearningRate 0.0403 Epoch: 7 Global Step: 121780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:19,730-Speed 9529.44 samples/sec Loss 6.5669 LearningRate 0.0403 Epoch: 7 Global Step: 121790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:20,796-Speed 9611.78 samples/sec Loss 6.5569 LearningRate 0.0403 Epoch: 7 Global Step: 121800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:21,939-Speed 8959.51 samples/sec Loss 6.5057 LearningRate 0.0403 Epoch: 7 Global Step: 121810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:22,999-Speed 9669.28 samples/sec Loss 6.5498 LearningRate 0.0403 Epoch: 7 Global Step: 121820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:24,083-Speed 9460.23 samples/sec Loss 6.4819 LearningRate 0.0403 Epoch: 7 Global Step: 121830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:25,172-Speed 9403.42 samples/sec Loss 6.5351 LearningRate 0.0403 Epoch: 7 Global Step: 121840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:26,275-Speed 9292.89 samples/sec Loss 6.4625 LearningRate 0.0403 Epoch: 7 Global Step: 121850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:27,355-Speed 9485.24 samples/sec Loss 6.5379 LearningRate 0.0403 Epoch: 7 Global Step: 121860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:28,494-Speed 8989.39 samples/sec Loss 6.5777 LearningRate 0.0403 Epoch: 7 Global Step: 121870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:29,605-Speed 9224.36 samples/sec Loss 6.6498 LearningRate 0.0403 Epoch: 7 Global Step: 121880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:30,692-Speed 9433.64 samples/sec Loss 6.5504 LearningRate 0.0403 Epoch: 7 Global Step: 121890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:31,762-Speed 9573.77 samples/sec Loss 6.5686 LearningRate 0.0403 Epoch: 7 Global Step: 121900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:32,838-Speed 9519.09 samples/sec Loss 6.5285 LearningRate 0.0403 Epoch: 7 Global Step: 121910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:33,956-Speed 9162.93 samples/sec Loss 6.5477 LearningRate 0.0403 Epoch: 7 Global Step: 121920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:35,071-Speed 9190.76 samples/sec Loss 6.5085 LearningRate 0.0403 Epoch: 7 Global Step: 121930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:26:36,159-Speed 9418.39 samples/sec Loss 6.4652 LearningRate 0.0403 Epoch: 7 Global Step: 121940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:37,219-Speed 9667.34 samples/sec Loss 6.5217 LearningRate 0.0403 Epoch: 7 Global Step: 121950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:38,316-Speed 9341.13 samples/sec Loss 6.5353 LearningRate 0.0403 Epoch: 7 Global Step: 121960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:39,415-Speed 9322.62 samples/sec Loss 6.5033 LearningRate 0.0403 Epoch: 7 Global Step: 121970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:40,474-Speed 9677.23 samples/sec Loss 6.4854 LearningRate 0.0403 Epoch: 7 Global Step: 121980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:41,554-Speed 9482.59 samples/sec Loss 6.5213 LearningRate 0.0403 Epoch: 7 Global Step: 121990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:26:42,642-Speed 9415.50 samples/sec Loss 6.5202 LearningRate 0.0403 Epoch: 7 Global Step: 122000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:27:04,539-[lfw][122000]XNorm: 10.660572 Training: 2022-04-11 16:27:04,539-[lfw][122000]Accuracy-Flip: 0.99550+-0.00269 Training: 2022-04-11 16:27:04,540-[lfw][122000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:27:29,860-[cfp_fp][122000]XNorm: 9.040363 Training: 2022-04-11 16:27:29,860-[cfp_fp][122000]Accuracy-Flip: 0.95871+-0.00918 Training: 2022-04-11 16:27:29,861-[cfp_fp][122000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:27:51,765-[agedb_30][122000]XNorm: 10.287548 Training: 2022-04-11 16:27:51,766-[agedb_30][122000]Accuracy-Flip: 0.96333+-0.00940 Training: 2022-04-11 16:27:51,766-[agedb_30][122000]Accuracy-Highest: 0.96483 Training: 2022-04-11 16:27:52,885-Speed 145.78 samples/sec Loss 6.5032 LearningRate 0.0403 Epoch: 7 Global Step: 122010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:27:53,994-Speed 9237.17 samples/sec Loss 6.5450 LearningRate 0.0403 Epoch: 7 Global Step: 122020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:27:55,065-Speed 9568.28 samples/sec Loss 6.5814 LearningRate 0.0403 Epoch: 7 Global Step: 122030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:27:56,151-Speed 9430.48 samples/sec Loss 6.6019 LearningRate 0.0402 Epoch: 7 Global Step: 122040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:27:57,263-Speed 9213.22 samples/sec Loss 6.5080 LearningRate 0.0402 Epoch: 7 Global Step: 122050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:27:58,336-Speed 9548.13 samples/sec Loss 6.4860 LearningRate 0.0402 Epoch: 7 Global Step: 122060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:27:59,396-Speed 9665.04 samples/sec Loss 6.5643 LearningRate 0.0402 Epoch: 7 Global Step: 122070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:00,495-Speed 9324.08 samples/sec Loss 6.5595 LearningRate 0.0402 Epoch: 7 Global Step: 122080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:01,571-Speed 9527.52 samples/sec Loss 6.4822 LearningRate 0.0402 Epoch: 7 Global Step: 122090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:02,624-Speed 9721.85 samples/sec Loss 6.4852 LearningRate 0.0402 Epoch: 7 Global Step: 122100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:03,726-Speed 9302.35 samples/sec Loss 6.5438 LearningRate 0.0402 Epoch: 7 Global Step: 122110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:04,770-Speed 9816.77 samples/sec Loss 6.4528 LearningRate 0.0402 Epoch: 7 Global Step: 122120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:05,885-Speed 9185.48 samples/sec Loss 6.5698 LearningRate 0.0402 Epoch: 7 Global Step: 122130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:06,989-Speed 9282.68 samples/sec Loss 6.5847 LearningRate 0.0402 Epoch: 7 Global Step: 122140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:08,068-Speed 9490.01 samples/sec Loss 6.4855 LearningRate 0.0402 Epoch: 7 Global Step: 122150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:09,144-Speed 9527.33 samples/sec Loss 6.5479 LearningRate 0.0402 Epoch: 7 Global Step: 122160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:10,203-Speed 9672.99 samples/sec Loss 6.5129 LearningRate 0.0402 Epoch: 7 Global Step: 122170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:11,314-Speed 9227.26 samples/sec Loss 6.4850 LearningRate 0.0402 Epoch: 7 Global Step: 122180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:12,433-Speed 9152.60 samples/sec Loss 6.4530 LearningRate 0.0402 Epoch: 7 Global Step: 122190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:13,540-Speed 9253.22 samples/sec Loss 6.5545 LearningRate 0.0402 Epoch: 7 Global Step: 122200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:14,636-Speed 9347.17 samples/sec Loss 6.5366 LearningRate 0.0402 Epoch: 7 Global Step: 122210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:15,743-Speed 9261.47 samples/sec Loss 6.5828 LearningRate 0.0402 Epoch: 7 Global Step: 122220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:16,851-Speed 9245.26 samples/sec Loss 6.5155 LearningRate 0.0402 Epoch: 7 Global Step: 122230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:17,937-Speed 9435.11 samples/sec Loss 6.5881 LearningRate 0.0402 Epoch: 7 Global Step: 122240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:19,006-Speed 9589.04 samples/sec Loss 6.6002 LearningRate 0.0402 Epoch: 7 Global Step: 122250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:20,058-Speed 9733.87 samples/sec Loss 6.5550 LearningRate 0.0402 Epoch: 7 Global Step: 122260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:21,138-Speed 9491.12 samples/sec Loss 6.5695 LearningRate 0.0402 Epoch: 7 Global Step: 122270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:22,254-Speed 9176.80 samples/sec Loss 6.4598 LearningRate 0.0402 Epoch: 7 Global Step: 122280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:23,328-Speed 9543.54 samples/sec Loss 6.5424 LearningRate 0.0402 Epoch: 7 Global Step: 122290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:24,399-Speed 9570.17 samples/sec Loss 6.6100 LearningRate 0.0401 Epoch: 7 Global Step: 122300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:25,464-Speed 9617.25 samples/sec Loss 6.5091 LearningRate 0.0401 Epoch: 7 Global Step: 122310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:26,539-Speed 9528.72 samples/sec Loss 6.5990 LearningRate 0.0401 Epoch: 7 Global Step: 122320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:27,606-Speed 9601.99 samples/sec Loss 6.5473 LearningRate 0.0401 Epoch: 7 Global Step: 122330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:28,690-Speed 9461.30 samples/sec Loss 6.6365 LearningRate 0.0401 Epoch: 7 Global Step: 122340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:29,753-Speed 9634.51 samples/sec Loss 6.6173 LearningRate 0.0401 Epoch: 7 Global Step: 122350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:30,818-Speed 9622.23 samples/sec Loss 6.6267 LearningRate 0.0401 Epoch: 7 Global Step: 122360 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:28:31,920-Speed 9296.94 samples/sec Loss 6.6180 LearningRate 0.0401 Epoch: 7 Global Step: 122370 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-04-11 16:28:33,026-Speed 9257.70 samples/sec Loss 6.4570 LearningRate 0.0401 Epoch: 7 Global Step: 122380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:34,106-Speed 9487.97 samples/sec Loss 6.4763 LearningRate 0.0401 Epoch: 7 Global Step: 122390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:35,186-Speed 9488.97 samples/sec Loss 6.5750 LearningRate 0.0401 Epoch: 7 Global Step: 122400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:36,266-Speed 9487.62 samples/sec Loss 6.5296 LearningRate 0.0401 Epoch: 7 Global Step: 122410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:37,351-Speed 9442.24 samples/sec Loss 6.5111 LearningRate 0.0401 Epoch: 7 Global Step: 122420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:38,480-Speed 9070.40 samples/sec Loss 6.6951 LearningRate 0.0401 Epoch: 7 Global Step: 122430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:39,576-Speed 9354.09 samples/sec Loss 6.4271 LearningRate 0.0401 Epoch: 7 Global Step: 122440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:40,661-Speed 9441.48 samples/sec Loss 6.4723 LearningRate 0.0401 Epoch: 7 Global Step: 122450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:41,743-Speed 9472.62 samples/sec Loss 6.4929 LearningRate 0.0401 Epoch: 7 Global Step: 122460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:42,776-Speed 9923.34 samples/sec Loss 6.5448 LearningRate 0.0401 Epoch: 7 Global Step: 122470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:43,920-Speed 8950.82 samples/sec Loss 6.5456 LearningRate 0.0401 Epoch: 7 Global Step: 122480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:44,975-Speed 9719.19 samples/sec Loss 6.5302 LearningRate 0.0401 Epoch: 7 Global Step: 122490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:46,101-Speed 9096.63 samples/sec Loss 6.5380 LearningRate 0.0401 Epoch: 7 Global Step: 122500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:47,161-Speed 9663.78 samples/sec Loss 6.5016 LearningRate 0.0401 Epoch: 7 Global Step: 122510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:48,242-Speed 9483.53 samples/sec Loss 6.4963 LearningRate 0.0401 Epoch: 7 Global Step: 122520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:49,319-Speed 9509.48 samples/sec Loss 6.6372 LearningRate 0.0401 Epoch: 7 Global Step: 122530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:50,421-Speed 9299.01 samples/sec Loss 6.4977 LearningRate 0.0401 Epoch: 7 Global Step: 122540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:51,525-Speed 9279.08 samples/sec Loss 6.6260 LearningRate 0.0401 Epoch: 7 Global Step: 122550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:52,635-Speed 9233.26 samples/sec Loss 6.5639 LearningRate 0.0401 Epoch: 7 Global Step: 122560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:53,747-Speed 9210.68 samples/sec Loss 6.5227 LearningRate 0.0400 Epoch: 7 Global Step: 122570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:54,838-Speed 9390.69 samples/sec Loss 6.5132 LearningRate 0.0400 Epoch: 7 Global Step: 122580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 16:28:55,942-Speed 9283.75 samples/sec Loss 6.5838 LearningRate 0.0400 Epoch: 7 Global Step: 122590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:57,072-Speed 9063.21 samples/sec Loss 6.4262 LearningRate 0.0400 Epoch: 7 Global Step: 122600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:58,131-Speed 9675.39 samples/sec Loss 6.5860 LearningRate 0.0400 Epoch: 7 Global Step: 122610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:28:59,239-Speed 9251.09 samples/sec Loss 6.6457 LearningRate 0.0400 Epoch: 7 Global Step: 122620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:29:00,289-Speed 9754.83 samples/sec Loss 6.4529 LearningRate 0.0400 Epoch: 7 Global Step: 122630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:29:01,362-Speed 9554.19 samples/sec Loss 6.5223 LearningRate 0.0400 Epoch: 7 Global Step: 122640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 16:29:02,459-Speed 9334.51 samples/sec Loss 6.5923 LearningRate 0.0400 Epoch: 7 Global Step: 122650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:03,518-Speed 9676.95 samples/sec Loss 6.4616 LearningRate 0.0400 Epoch: 7 Global Step: 122660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:04,644-Speed 9103.55 samples/sec Loss 6.6116 LearningRate 0.0400 Epoch: 7 Global Step: 122670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:05,743-Speed 9319.55 samples/sec Loss 6.5536 LearningRate 0.0400 Epoch: 7 Global Step: 122680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:06,782-Speed 9860.89 samples/sec Loss 6.5345 LearningRate 0.0400 Epoch: 7 Global Step: 122690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:07,868-Speed 9430.34 samples/sec Loss 6.4828 LearningRate 0.0400 Epoch: 7 Global Step: 122700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:08,944-Speed 9528.91 samples/sec Loss 6.5307 LearningRate 0.0400 Epoch: 7 Global Step: 122710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:10,005-Speed 9657.47 samples/sec Loss 6.4534 LearningRate 0.0400 Epoch: 7 Global Step: 122720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:11,072-Speed 9604.21 samples/sec Loss 6.6295 LearningRate 0.0400 Epoch: 7 Global Step: 122730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:12,125-Speed 9731.48 samples/sec Loss 6.5016 LearningRate 0.0400 Epoch: 7 Global Step: 122740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:13,227-Speed 9290.91 samples/sec Loss 6.5776 LearningRate 0.0400 Epoch: 7 Global Step: 122750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:14,323-Speed 9352.91 samples/sec Loss 6.4926 LearningRate 0.0400 Epoch: 7 Global Step: 122760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:15,385-Speed 9647.85 samples/sec Loss 6.4592 LearningRate 0.0400 Epoch: 7 Global Step: 122770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:16,473-Speed 9418.03 samples/sec Loss 6.4189 LearningRate 0.0400 Epoch: 7 Global Step: 122780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:17,555-Speed 9468.98 samples/sec Loss 6.5401 LearningRate 0.0400 Epoch: 7 Global Step: 122790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:18,706-Speed 8904.33 samples/sec Loss 6.6226 LearningRate 0.0400 Epoch: 7 Global Step: 122800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:19,787-Speed 9478.65 samples/sec Loss 6.3900 LearningRate 0.0400 Epoch: 7 Global Step: 122810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:20,870-Speed 9465.62 samples/sec Loss 6.4805 LearningRate 0.0400 Epoch: 7 Global Step: 122820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:21,952-Speed 9466.67 samples/sec Loss 6.5416 LearningRate 0.0399 Epoch: 7 Global Step: 122830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:23,030-Speed 9507.83 samples/sec Loss 6.4673 LearningRate 0.0399 Epoch: 7 Global Step: 122840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:24,117-Speed 9426.55 samples/sec Loss 6.6004 LearningRate 0.0399 Epoch: 7 Global Step: 122850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:25,197-Speed 9487.39 samples/sec Loss 6.4715 LearningRate 0.0399 Epoch: 7 Global Step: 122860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:26,283-Speed 9427.75 samples/sec Loss 6.5834 LearningRate 0.0399 Epoch: 7 Global Step: 122870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:27,355-Speed 9556.65 samples/sec Loss 6.5584 LearningRate 0.0399 Epoch: 7 Global Step: 122880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:28,436-Speed 9486.43 samples/sec Loss 6.5602 LearningRate 0.0399 Epoch: 7 Global Step: 122890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:29,497-Speed 9657.37 samples/sec Loss 6.4873 LearningRate 0.0399 Epoch: 7 Global Step: 122900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:30,582-Speed 9442.03 samples/sec Loss 6.5058 LearningRate 0.0399 Epoch: 7 Global Step: 122910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:31,661-Speed 9491.95 samples/sec Loss 6.3931 LearningRate 0.0399 Epoch: 7 Global Step: 122920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:32,767-Speed 9268.55 samples/sec Loss 6.5044 LearningRate 0.0399 Epoch: 7 Global Step: 122930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:33,884-Speed 9171.63 samples/sec Loss 6.5028 LearningRate 0.0399 Epoch: 7 Global Step: 122940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:35,010-Speed 9098.72 samples/sec Loss 6.5719 LearningRate 0.0399 Epoch: 7 Global Step: 122950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:36,134-Speed 9113.92 samples/sec Loss 6.4882 LearningRate 0.0399 Epoch: 7 Global Step: 122960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:37,215-Speed 9475.58 samples/sec Loss 6.4887 LearningRate 0.0399 Epoch: 7 Global Step: 122970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:38,318-Speed 9289.13 samples/sec Loss 6.5579 LearningRate 0.0399 Epoch: 7 Global Step: 122980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:39,398-Speed 9497.07 samples/sec Loss 6.4763 LearningRate 0.0399 Epoch: 7 Global Step: 122990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:40,452-Speed 9722.19 samples/sec Loss 6.6094 LearningRate 0.0399 Epoch: 7 Global Step: 123000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:41,535-Speed 9461.89 samples/sec Loss 6.5263 LearningRate 0.0399 Epoch: 7 Global Step: 123010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:42,671-Speed 9014.89 samples/sec Loss 6.5015 LearningRate 0.0399 Epoch: 7 Global Step: 123020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:43,754-Speed 9463.84 samples/sec Loss 6.5830 LearningRate 0.0399 Epoch: 7 Global Step: 123030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:44,834-Speed 9508.91 samples/sec Loss 6.5656 LearningRate 0.0399 Epoch: 7 Global Step: 123040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:45,951-Speed 9169.65 samples/sec Loss 6.5477 LearningRate 0.0399 Epoch: 7 Global Step: 123050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:47,049-Speed 9330.32 samples/sec Loss 6.5126 LearningRate 0.0399 Epoch: 7 Global Step: 123060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:48,136-Speed 9424.10 samples/sec Loss 6.5870 LearningRate 0.0399 Epoch: 7 Global Step: 123070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:49,215-Speed 9495.72 samples/sec Loss 6.5656 LearningRate 0.0399 Epoch: 7 Global Step: 123080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:29:50,249-Speed 9912.04 samples/sec Loss 6.5867 LearningRate 0.0398 Epoch: 7 Global Step: 123090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:51,363-Speed 9195.64 samples/sec Loss 6.5758 LearningRate 0.0398 Epoch: 7 Global Step: 123100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:52,427-Speed 9631.77 samples/sec Loss 6.5964 LearningRate 0.0398 Epoch: 7 Global Step: 123110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:53,509-Speed 9470.13 samples/sec Loss 6.5065 LearningRate 0.0398 Epoch: 7 Global Step: 123120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:54,624-Speed 9185.15 samples/sec Loss 6.5387 LearningRate 0.0398 Epoch: 7 Global Step: 123130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:55,726-Speed 9298.71 samples/sec Loss 6.5574 LearningRate 0.0398 Epoch: 7 Global Step: 123140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:56,836-Speed 9233.63 samples/sec Loss 6.6142 LearningRate 0.0398 Epoch: 7 Global Step: 123150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:57,903-Speed 9600.12 samples/sec Loss 6.5257 LearningRate 0.0398 Epoch: 7 Global Step: 123160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:29:58,990-Speed 9426.45 samples/sec Loss 6.5607 LearningRate 0.0398 Epoch: 7 Global Step: 123170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:00,085-Speed 9362.62 samples/sec Loss 6.5663 LearningRate 0.0398 Epoch: 7 Global Step: 123180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:01,170-Speed 9443.49 samples/sec Loss 6.4482 LearningRate 0.0398 Epoch: 7 Global Step: 123190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:02,282-Speed 9209.18 samples/sec Loss 6.5827 LearningRate 0.0398 Epoch: 7 Global Step: 123200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:03,412-Speed 9078.01 samples/sec Loss 6.5302 LearningRate 0.0398 Epoch: 7 Global Step: 123210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:04,499-Speed 9427.42 samples/sec Loss 6.5050 LearningRate 0.0398 Epoch: 7 Global Step: 123220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:05,569-Speed 9575.27 samples/sec Loss 6.6010 LearningRate 0.0398 Epoch: 7 Global Step: 123230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:06,663-Speed 9363.30 samples/sec Loss 6.5421 LearningRate 0.0398 Epoch: 7 Global Step: 123240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:07,751-Speed 9416.66 samples/sec Loss 6.5820 LearningRate 0.0398 Epoch: 7 Global Step: 123250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:08,820-Speed 9586.32 samples/sec Loss 6.4791 LearningRate 0.0398 Epoch: 7 Global Step: 123260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:09,909-Speed 9412.53 samples/sec Loss 6.5704 LearningRate 0.0398 Epoch: 7 Global Step: 123270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:10,993-Speed 9451.90 samples/sec Loss 6.5039 LearningRate 0.0398 Epoch: 7 Global Step: 123280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:12,078-Speed 9437.88 samples/sec Loss 6.4779 LearningRate 0.0398 Epoch: 7 Global Step: 123290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:13,191-Speed 9204.58 samples/sec Loss 6.4704 LearningRate 0.0398 Epoch: 7 Global Step: 123300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:14,229-Speed 9874.48 samples/sec Loss 6.6832 LearningRate 0.0398 Epoch: 7 Global Step: 123310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:15,363-Speed 9035.94 samples/sec Loss 6.4841 LearningRate 0.0398 Epoch: 7 Global Step: 123320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:16,442-Speed 9503.43 samples/sec Loss 6.5351 LearningRate 0.0398 Epoch: 7 Global Step: 123330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:17,524-Speed 9463.34 samples/sec Loss 6.6150 LearningRate 0.0398 Epoch: 7 Global Step: 123340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:18,635-Speed 9221.96 samples/sec Loss 6.4803 LearningRate 0.0398 Epoch: 7 Global Step: 123350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:19,717-Speed 9470.54 samples/sec Loss 6.5616 LearningRate 0.0397 Epoch: 7 Global Step: 123360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:20,770-Speed 9735.63 samples/sec Loss 6.5084 LearningRate 0.0397 Epoch: 7 Global Step: 123370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:21,834-Speed 9624.95 samples/sec Loss 6.5767 LearningRate 0.0397 Epoch: 7 Global Step: 123380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:22,889-Speed 9713.12 samples/sec Loss 6.5320 LearningRate 0.0397 Epoch: 7 Global Step: 123390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:24,030-Speed 8979.00 samples/sec Loss 6.6159 LearningRate 0.0397 Epoch: 7 Global Step: 123400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:25,151-Speed 9140.59 samples/sec Loss 6.5872 LearningRate 0.0397 Epoch: 7 Global Step: 123410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:30:26,248-Speed 9342.03 samples/sec Loss 6.6162 LearningRate 0.0397 Epoch: 7 Global Step: 123420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:30:27,340-Speed 9386.69 samples/sec Loss 6.4835 LearningRate 0.0397 Epoch: 7 Global Step: 123430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:28,508-Speed 8767.80 samples/sec Loss 6.5174 LearningRate 0.0397 Epoch: 7 Global Step: 123440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:29,658-Speed 8910.66 samples/sec Loss 6.6035 LearningRate 0.0397 Epoch: 7 Global Step: 123450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:30,751-Speed 9371.91 samples/sec Loss 6.5728 LearningRate 0.0397 Epoch: 7 Global Step: 123460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:31,856-Speed 9275.16 samples/sec Loss 6.5401 LearningRate 0.0397 Epoch: 7 Global Step: 123470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:32,904-Speed 9784.38 samples/sec Loss 6.5317 LearningRate 0.0397 Epoch: 7 Global Step: 123480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:33,983-Speed 9495.20 samples/sec Loss 6.5481 LearningRate 0.0397 Epoch: 7 Global Step: 123490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:35,043-Speed 9670.46 samples/sec Loss 6.6438 LearningRate 0.0397 Epoch: 7 Global Step: 123500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:36,101-Speed 9682.42 samples/sec Loss 6.5759 LearningRate 0.0397 Epoch: 7 Global Step: 123510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:37,196-Speed 9355.02 samples/sec Loss 6.4699 LearningRate 0.0397 Epoch: 7 Global Step: 123520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:38,292-Speed 9350.71 samples/sec Loss 6.6104 LearningRate 0.0397 Epoch: 7 Global Step: 123530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:39,385-Speed 9375.31 samples/sec Loss 6.4934 LearningRate 0.0397 Epoch: 7 Global Step: 123540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:40,469-Speed 9443.39 samples/sec Loss 6.4957 LearningRate 0.0397 Epoch: 7 Global Step: 123550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:41,532-Speed 9646.45 samples/sec Loss 6.5400 LearningRate 0.0397 Epoch: 7 Global Step: 123560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:42,570-Speed 9869.09 samples/sec Loss 6.5231 LearningRate 0.0397 Epoch: 7 Global Step: 123570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:43,665-Speed 9355.18 samples/sec Loss 6.5308 LearningRate 0.0397 Epoch: 7 Global Step: 123580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:44,751-Speed 9439.61 samples/sec Loss 6.5830 LearningRate 0.0397 Epoch: 7 Global Step: 123590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:45,852-Speed 9301.70 samples/sec Loss 6.6039 LearningRate 0.0397 Epoch: 7 Global Step: 123600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:46,967-Speed 9195.62 samples/sec Loss 6.5239 LearningRate 0.0397 Epoch: 7 Global Step: 123610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:48,057-Speed 9393.77 samples/sec Loss 6.5316 LearningRate 0.0396 Epoch: 7 Global Step: 123620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:49,141-Speed 9452.24 samples/sec Loss 6.5664 LearningRate 0.0396 Epoch: 7 Global Step: 123630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:50,259-Speed 9167.31 samples/sec Loss 6.5837 LearningRate 0.0396 Epoch: 7 Global Step: 123640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:51,340-Speed 9477.41 samples/sec Loss 6.5389 LearningRate 0.0396 Epoch: 7 Global Step: 123650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:52,397-Speed 9696.30 samples/sec Loss 6.5575 LearningRate 0.0396 Epoch: 7 Global Step: 123660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:53,503-Speed 9273.58 samples/sec Loss 6.6578 LearningRate 0.0396 Epoch: 7 Global Step: 123670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:54,585-Speed 9467.35 samples/sec Loss 6.6548 LearningRate 0.0396 Epoch: 7 Global Step: 123680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:55,679-Speed 9367.18 samples/sec Loss 6.4896 LearningRate 0.0396 Epoch: 7 Global Step: 123690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:56,727-Speed 9776.20 samples/sec Loss 6.6520 LearningRate 0.0396 Epoch: 7 Global Step: 123700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:30:57,814-Speed 9422.27 samples/sec Loss 6.5898 LearningRate 0.0396 Epoch: 7 Global Step: 123710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:58,868-Speed 9719.82 samples/sec Loss 6.4809 LearningRate 0.0396 Epoch: 7 Global Step: 123720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:30:59,937-Speed 9586.94 samples/sec Loss 6.5838 LearningRate 0.0396 Epoch: 7 Global Step: 123730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:01,057-Speed 9147.43 samples/sec Loss 6.5948 LearningRate 0.0396 Epoch: 7 Global Step: 123740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:02,182-Speed 9109.32 samples/sec Loss 6.5581 LearningRate 0.0396 Epoch: 7 Global Step: 123750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:03,334-Speed 8897.00 samples/sec Loss 6.6211 LearningRate 0.0396 Epoch: 7 Global Step: 123760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:04,407-Speed 9549.01 samples/sec Loss 6.6005 LearningRate 0.0396 Epoch: 7 Global Step: 123770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:05,501-Speed 9365.06 samples/sec Loss 6.4855 LearningRate 0.0396 Epoch: 7 Global Step: 123780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:06,579-Speed 9504.81 samples/sec Loss 6.4023 LearningRate 0.0396 Epoch: 7 Global Step: 123790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:07,678-Speed 9324.25 samples/sec Loss 6.6141 LearningRate 0.0396 Epoch: 7 Global Step: 123800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:08,778-Speed 9316.06 samples/sec Loss 6.5406 LearningRate 0.0396 Epoch: 7 Global Step: 123810 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:31:09,892-Speed 9192.25 samples/sec Loss 6.5048 LearningRate 0.0396 Epoch: 7 Global Step: 123820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:31:11,016-Speed 9118.37 samples/sec Loss 6.6199 LearningRate 0.0396 Epoch: 7 Global Step: 123830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:31:12,106-Speed 9407.19 samples/sec Loss 6.6005 LearningRate 0.0396 Epoch: 7 Global Step: 123840 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:31:13,185-Speed 9494.32 samples/sec Loss 6.5720 LearningRate 0.0396 Epoch: 7 Global Step: 123850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:14,296-Speed 9224.23 samples/sec Loss 6.5770 LearningRate 0.0396 Epoch: 7 Global Step: 123860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:15,351-Speed 9709.60 samples/sec Loss 6.5134 LearningRate 0.0396 Epoch: 7 Global Step: 123870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:31:16,406-Speed 9709.72 samples/sec Loss 6.4864 LearningRate 0.0396 Epoch: 7 Global Step: 123880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:31:17,474-Speed 9595.69 samples/sec Loss 6.6650 LearningRate 0.0395 Epoch: 7 Global Step: 123890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:31:18,541-Speed 9604.50 samples/sec Loss 6.5742 LearningRate 0.0395 Epoch: 7 Global Step: 123900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:31:19,599-Speed 9684.28 samples/sec Loss 6.4964 LearningRate 0.0395 Epoch: 7 Global Step: 123910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:31:20,671-Speed 9560.89 samples/sec Loss 6.5707 LearningRate 0.0395 Epoch: 7 Global Step: 123920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:31:21,708-Speed 9884.14 samples/sec Loss 6.4750 LearningRate 0.0395 Epoch: 7 Global Step: 123930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:31:22,809-Speed 9299.52 samples/sec Loss 6.5796 LearningRate 0.0395 Epoch: 7 Global Step: 123940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:31:23,964-Speed 8873.81 samples/sec Loss 6.5186 LearningRate 0.0395 Epoch: 7 Global Step: 123950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:31:25,108-Speed 8953.24 samples/sec Loss 6.5969 LearningRate 0.0395 Epoch: 7 Global Step: 123960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:31:26,197-Speed 9406.35 samples/sec Loss 6.6614 LearningRate 0.0395 Epoch: 7 Global Step: 123970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:27,295-Speed 9332.86 samples/sec Loss 6.5136 LearningRate 0.0395 Epoch: 7 Global Step: 123980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:28,388-Speed 9375.22 samples/sec Loss 6.6116 LearningRate 0.0395 Epoch: 7 Global Step: 123990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:29,493-Speed 9271.47 samples/sec Loss 6.5459 LearningRate 0.0395 Epoch: 7 Global Step: 124000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:31:51,514-[lfw][124000]XNorm: 10.820141 Training: 2022-04-11 16:31:51,514-[lfw][124000]Accuracy-Flip: 0.99550+-0.00289 Training: 2022-04-11 16:31:51,515-[lfw][124000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:32:17,010-[cfp_fp][124000]XNorm: 9.170380 Training: 2022-04-11 16:32:17,011-[cfp_fp][124000]Accuracy-Flip: 0.95914+-0.00862 Training: 2022-04-11 16:32:17,011-[cfp_fp][124000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:32:38,990-[agedb_30][124000]XNorm: 10.493963 Training: 2022-04-11 16:32:38,991-[agedb_30][124000]Accuracy-Flip: 0.95933+-0.01001 Training: 2022-04-11 16:32:38,991-[agedb_30][124000]Accuracy-Highest: 0.96483 Training: 2022-04-11 16:32:40,078-Speed 145.07 samples/sec Loss 6.6009 LearningRate 0.0395 Epoch: 7 Global Step: 124010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:32:41,132-Speed 9713.95 samples/sec Loss 6.5636 LearningRate 0.0395 Epoch: 7 Global Step: 124020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:32:42,212-Speed 9494.48 samples/sec Loss 6.5537 LearningRate 0.0395 Epoch: 7 Global Step: 124030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:32:43,288-Speed 9519.85 samples/sec Loss 6.4991 LearningRate 0.0395 Epoch: 7 Global Step: 124040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:32:44,353-Speed 9621.07 samples/sec Loss 6.5307 LearningRate 0.0395 Epoch: 7 Global Step: 124050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:32:45,432-Speed 9494.00 samples/sec Loss 6.5816 LearningRate 0.0395 Epoch: 7 Global Step: 124060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:32:46,501-Speed 9583.99 samples/sec Loss 6.5139 LearningRate 0.0395 Epoch: 7 Global Step: 124070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:32:47,625-Speed 9115.32 samples/sec Loss 6.5305 LearningRate 0.0395 Epoch: 7 Global Step: 124080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:32:48,657-Speed 9927.45 samples/sec Loss 6.4776 LearningRate 0.0395 Epoch: 7 Global Step: 124090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:32:49,729-Speed 9555.41 samples/sec Loss 6.4927 LearningRate 0.0395 Epoch: 7 Global Step: 124100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:32:50,809-Speed 9488.40 samples/sec Loss 6.6139 LearningRate 0.0395 Epoch: 7 Global Step: 124110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:32:51,854-Speed 9809.49 samples/sec Loss 6.5315 LearningRate 0.0395 Epoch: 7 Global Step: 124120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:32:52,968-Speed 9195.98 samples/sec Loss 6.5496 LearningRate 0.0395 Epoch: 7 Global Step: 124130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:32:54,032-Speed 9633.69 samples/sec Loss 6.5662 LearningRate 0.0395 Epoch: 7 Global Step: 124140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:32:55,120-Speed 9415.47 samples/sec Loss 6.6383 LearningRate 0.0395 Epoch: 7 Global Step: 124150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:32:56,176-Speed 9703.70 samples/sec Loss 6.5116 LearningRate 0.0394 Epoch: 7 Global Step: 124160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:32:57,285-Speed 9239.48 samples/sec Loss 6.6071 LearningRate 0.0394 Epoch: 7 Global Step: 124170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:32:58,391-Speed 9266.98 samples/sec Loss 6.5237 LearningRate 0.0394 Epoch: 7 Global Step: 124180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:32:59,460-Speed 9581.30 samples/sec Loss 6.5224 LearningRate 0.0394 Epoch: 7 Global Step: 124190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:00,557-Speed 9344.31 samples/sec Loss 6.6482 LearningRate 0.0394 Epoch: 7 Global Step: 124200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:01,637-Speed 9484.78 samples/sec Loss 6.5194 LearningRate 0.0394 Epoch: 7 Global Step: 124210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:02,695-Speed 9686.38 samples/sec Loss 6.4590 LearningRate 0.0394 Epoch: 7 Global Step: 124220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:03,775-Speed 9480.57 samples/sec Loss 6.5067 LearningRate 0.0394 Epoch: 7 Global Step: 124230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:04,853-Speed 9503.39 samples/sec Loss 6.6301 LearningRate 0.0394 Epoch: 7 Global Step: 124240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:33:05,943-Speed 9403.83 samples/sec Loss 6.5470 LearningRate 0.0394 Epoch: 7 Global Step: 124250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:33:07,020-Speed 9513.75 samples/sec Loss 6.7602 LearningRate 0.0394 Epoch: 7 Global Step: 124260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:08,097-Speed 9515.54 samples/sec Loss 6.5127 LearningRate 0.0394 Epoch: 7 Global Step: 124270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:09,237-Speed 8981.44 samples/sec Loss 6.5138 LearningRate 0.0394 Epoch: 7 Global Step: 124280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:10,337-Speed 9316.77 samples/sec Loss 6.5934 LearningRate 0.0394 Epoch: 7 Global Step: 124290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:11,430-Speed 9377.97 samples/sec Loss 6.6200 LearningRate 0.0394 Epoch: 7 Global Step: 124300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:12,510-Speed 9490.97 samples/sec Loss 6.5972 LearningRate 0.0394 Epoch: 7 Global Step: 124310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:13,636-Speed 9103.48 samples/sec Loss 6.5980 LearningRate 0.0394 Epoch: 7 Global Step: 124320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:14,768-Speed 9049.05 samples/sec Loss 6.6231 LearningRate 0.0394 Epoch: 7 Global Step: 124330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:15,852-Speed 9449.79 samples/sec Loss 6.5771 LearningRate 0.0394 Epoch: 7 Global Step: 124340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:16,923-Speed 9567.90 samples/sec Loss 6.5247 LearningRate 0.0394 Epoch: 7 Global Step: 124350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:18,010-Speed 9421.84 samples/sec Loss 6.6658 LearningRate 0.0394 Epoch: 7 Global Step: 124360 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:33:19,081-Speed 9572.64 samples/sec Loss 6.6137 LearningRate 0.0394 Epoch: 7 Global Step: 124370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:20,181-Speed 9314.60 samples/sec Loss 6.5268 LearningRate 0.0394 Epoch: 7 Global Step: 124380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:21,289-Speed 9249.54 samples/sec Loss 6.6028 LearningRate 0.0394 Epoch: 7 Global Step: 124390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:22,404-Speed 9188.47 samples/sec Loss 6.5982 LearningRate 0.0394 Epoch: 7 Global Step: 124400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:23,493-Speed 9411.73 samples/sec Loss 6.4950 LearningRate 0.0394 Epoch: 7 Global Step: 124410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:24,568-Speed 9526.76 samples/sec Loss 6.6406 LearningRate 0.0393 Epoch: 7 Global Step: 124420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:25,669-Speed 9308.97 samples/sec Loss 6.7051 LearningRate 0.0393 Epoch: 7 Global Step: 124430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:26,736-Speed 9597.03 samples/sec Loss 6.5559 LearningRate 0.0393 Epoch: 7 Global Step: 124440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:27,814-Speed 9505.82 samples/sec Loss 6.5310 LearningRate 0.0393 Epoch: 7 Global Step: 124450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:28,903-Speed 9411.06 samples/sec Loss 6.4822 LearningRate 0.0393 Epoch: 7 Global Step: 124460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:29,977-Speed 9540.36 samples/sec Loss 6.4906 LearningRate 0.0393 Epoch: 7 Global Step: 124470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:31,077-Speed 9313.04 samples/sec Loss 6.4436 LearningRate 0.0393 Epoch: 7 Global Step: 124480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:32,158-Speed 9479.61 samples/sec Loss 6.6148 LearningRate 0.0393 Epoch: 7 Global Step: 124490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:33,231-Speed 9552.12 samples/sec Loss 6.6023 LearningRate 0.0393 Epoch: 7 Global Step: 124500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:34,297-Speed 9607.55 samples/sec Loss 6.4952 LearningRate 0.0393 Epoch: 7 Global Step: 124510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:35,370-Speed 9549.26 samples/sec Loss 6.5437 LearningRate 0.0393 Epoch: 7 Global Step: 124520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:36,484-Speed 9200.82 samples/sec Loss 6.5996 LearningRate 0.0393 Epoch: 7 Global Step: 124530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:37,558-Speed 9537.39 samples/sec Loss 6.5387 LearningRate 0.0393 Epoch: 7 Global Step: 124540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:38,648-Speed 9399.15 samples/sec Loss 6.5461 LearningRate 0.0393 Epoch: 7 Global Step: 124550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:39,747-Speed 9320.13 samples/sec Loss 6.5120 LearningRate 0.0393 Epoch: 7 Global Step: 124560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:40,867-Speed 9154.07 samples/sec Loss 6.4425 LearningRate 0.0393 Epoch: 7 Global Step: 124570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:33:41,925-Speed 9680.91 samples/sec Loss 6.5314 LearningRate 0.0393 Epoch: 7 Global Step: 124580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:42,988-Speed 9638.11 samples/sec Loss 6.5194 LearningRate 0.0393 Epoch: 7 Global Step: 124590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:44,062-Speed 9544.20 samples/sec Loss 6.5000 LearningRate 0.0393 Epoch: 7 Global Step: 124600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:45,125-Speed 9636.46 samples/sec Loss 6.5223 LearningRate 0.0393 Epoch: 7 Global Step: 124610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:46,216-Speed 9392.37 samples/sec Loss 6.6032 LearningRate 0.0393 Epoch: 7 Global Step: 124620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:47,294-Speed 9504.29 samples/sec Loss 6.6352 LearningRate 0.0393 Epoch: 7 Global Step: 124630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:48,386-Speed 9381.06 samples/sec Loss 6.6288 LearningRate 0.0393 Epoch: 7 Global Step: 124640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:49,438-Speed 9738.84 samples/sec Loss 6.6694 LearningRate 0.0393 Epoch: 7 Global Step: 124650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:50,555-Speed 9173.12 samples/sec Loss 6.5895 LearningRate 0.0393 Epoch: 7 Global Step: 124660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:51,637-Speed 9475.85 samples/sec Loss 6.6432 LearningRate 0.0393 Epoch: 7 Global Step: 124670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:52,702-Speed 9627.27 samples/sec Loss 6.4711 LearningRate 0.0393 Epoch: 7 Global Step: 124680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:33:53,776-Speed 9535.47 samples/sec Loss 6.6175 LearningRate 0.0392 Epoch: 7 Global Step: 124690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:54,878-Speed 9296.55 samples/sec Loss 6.5278 LearningRate 0.0392 Epoch: 7 Global Step: 124700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:56,000-Speed 9133.56 samples/sec Loss 6.5529 LearningRate 0.0392 Epoch: 7 Global Step: 124710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:57,096-Speed 9343.52 samples/sec Loss 6.5333 LearningRate 0.0392 Epoch: 7 Global Step: 124720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:58,171-Speed 9536.93 samples/sec Loss 6.6011 LearningRate 0.0392 Epoch: 7 Global Step: 124730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:33:59,271-Speed 9310.65 samples/sec Loss 6.4881 LearningRate 0.0392 Epoch: 7 Global Step: 124740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:00,371-Speed 9321.01 samples/sec Loss 6.5687 LearningRate 0.0392 Epoch: 7 Global Step: 124750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:01,447-Speed 9521.86 samples/sec Loss 6.5200 LearningRate 0.0392 Epoch: 7 Global Step: 124760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:02,526-Speed 9495.87 samples/sec Loss 6.5579 LearningRate 0.0392 Epoch: 7 Global Step: 124770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:03,608-Speed 9466.07 samples/sec Loss 6.5286 LearningRate 0.0392 Epoch: 7 Global Step: 124780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:04,680-Speed 9560.13 samples/sec Loss 6.5343 LearningRate 0.0392 Epoch: 7 Global Step: 124790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:05,788-Speed 9241.12 samples/sec Loss 6.4823 LearningRate 0.0392 Epoch: 7 Global Step: 124800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:06,879-Speed 9397.40 samples/sec Loss 6.5367 LearningRate 0.0392 Epoch: 7 Global Step: 124810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:07,967-Speed 9418.74 samples/sec Loss 6.6107 LearningRate 0.0392 Epoch: 7 Global Step: 124820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:09,049-Speed 9463.06 samples/sec Loss 6.5624 LearningRate 0.0392 Epoch: 7 Global Step: 124830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:10,150-Speed 9315.44 samples/sec Loss 6.4395 LearningRate 0.0392 Epoch: 7 Global Step: 124840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:11,232-Speed 9474.23 samples/sec Loss 6.5455 LearningRate 0.0392 Epoch: 7 Global Step: 124850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:12,315-Speed 9458.11 samples/sec Loss 6.5194 LearningRate 0.0392 Epoch: 7 Global Step: 124860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:13,446-Speed 9055.87 samples/sec Loss 6.5187 LearningRate 0.0392 Epoch: 7 Global Step: 124870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:14,517-Speed 9563.19 samples/sec Loss 6.6047 LearningRate 0.0392 Epoch: 7 Global Step: 124880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:15,602-Speed 9447.41 samples/sec Loss 6.5919 LearningRate 0.0392 Epoch: 7 Global Step: 124890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:34:16,676-Speed 9535.26 samples/sec Loss 6.5544 LearningRate 0.0392 Epoch: 7 Global Step: 124900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:34:17,814-Speed 9001.08 samples/sec Loss 6.5144 LearningRate 0.0392 Epoch: 7 Global Step: 124910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:34:18,881-Speed 9607.59 samples/sec Loss 6.6163 LearningRate 0.0392 Epoch: 7 Global Step: 124920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:19,978-Speed 9337.98 samples/sec Loss 6.3647 LearningRate 0.0392 Epoch: 7 Global Step: 124930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:21,108-Speed 9063.71 samples/sec Loss 6.5384 LearningRate 0.0392 Epoch: 7 Global Step: 124940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:22,167-Speed 9677.80 samples/sec Loss 6.4831 LearningRate 0.0391 Epoch: 7 Global Step: 124950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:23,241-Speed 9544.64 samples/sec Loss 6.6297 LearningRate 0.0391 Epoch: 7 Global Step: 124960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:24,381-Speed 8985.16 samples/sec Loss 6.5645 LearningRate 0.0391 Epoch: 7 Global Step: 124970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:25,509-Speed 9082.68 samples/sec Loss 6.5325 LearningRate 0.0391 Epoch: 7 Global Step: 124980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:26,643-Speed 9041.17 samples/sec Loss 6.5537 LearningRate 0.0391 Epoch: 7 Global Step: 124990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:27,703-Speed 9662.93 samples/sec Loss 6.5004 LearningRate 0.0391 Epoch: 7 Global Step: 125000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:34:28,761-Speed 9687.61 samples/sec Loss 6.6107 LearningRate 0.0391 Epoch: 7 Global Step: 125010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:34:29,869-Speed 9249.52 samples/sec Loss 6.5866 LearningRate 0.0391 Epoch: 7 Global Step: 125020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:34:30,968-Speed 9321.52 samples/sec Loss 6.5830 LearningRate 0.0391 Epoch: 7 Global Step: 125030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:34:32,024-Speed 9705.59 samples/sec Loss 6.6806 LearningRate 0.0391 Epoch: 7 Global Step: 125040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:34:33,118-Speed 9368.37 samples/sec Loss 6.5646 LearningRate 0.0391 Epoch: 7 Global Step: 125050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:34:34,245-Speed 9084.01 samples/sec Loss 6.5468 LearningRate 0.0391 Epoch: 7 Global Step: 125060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:34:35,335-Speed 9400.41 samples/sec Loss 6.6214 LearningRate 0.0391 Epoch: 7 Global Step: 125070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:34:36,407-Speed 9557.21 samples/sec Loss 6.5318 LearningRate 0.0391 Epoch: 7 Global Step: 125080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:34:37,524-Speed 9175.62 samples/sec Loss 6.5285 LearningRate 0.0391 Epoch: 7 Global Step: 125090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:34:38,569-Speed 9807.75 samples/sec Loss 6.6080 LearningRate 0.0391 Epoch: 7 Global Step: 125100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:39,665-Speed 9344.18 samples/sec Loss 6.4962 LearningRate 0.0391 Epoch: 7 Global Step: 125110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:40,771-Speed 9265.72 samples/sec Loss 6.6127 LearningRate 0.0391 Epoch: 7 Global Step: 125120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:41,918-Speed 8933.64 samples/sec Loss 6.5786 LearningRate 0.0391 Epoch: 7 Global Step: 125130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:43,043-Speed 9110.42 samples/sec Loss 6.7184 LearningRate 0.0391 Epoch: 7 Global Step: 125140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:44,135-Speed 9380.47 samples/sec Loss 6.4987 LearningRate 0.0391 Epoch: 7 Global Step: 125150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:45,233-Speed 9326.69 samples/sec Loss 6.5854 LearningRate 0.0391 Epoch: 7 Global Step: 125160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:46,350-Speed 9178.66 samples/sec Loss 6.4990 LearningRate 0.0391 Epoch: 7 Global Step: 125170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:47,433-Speed 9463.69 samples/sec Loss 6.5086 LearningRate 0.0391 Epoch: 7 Global Step: 125180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:48,514-Speed 9471.79 samples/sec Loss 6.5048 LearningRate 0.0391 Epoch: 7 Global Step: 125190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:49,576-Speed 9657.05 samples/sec Loss 6.5351 LearningRate 0.0391 Epoch: 7 Global Step: 125200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:34:50,685-Speed 9234.44 samples/sec Loss 6.5311 LearningRate 0.0391 Epoch: 7 Global Step: 125210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:34:51,775-Speed 9401.65 samples/sec Loss 6.4606 LearningRate 0.0390 Epoch: 7 Global Step: 125220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:34:52,820-Speed 9804.18 samples/sec Loss 6.5504 LearningRate 0.0390 Epoch: 7 Global Step: 125230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:53,861-Speed 9844.34 samples/sec Loss 6.6547 LearningRate 0.0390 Epoch: 7 Global Step: 125240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:54,920-Speed 9674.55 samples/sec Loss 6.6019 LearningRate 0.0390 Epoch: 7 Global Step: 125250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:55,988-Speed 9597.85 samples/sec Loss 6.5719 LearningRate 0.0390 Epoch: 7 Global Step: 125260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:57,102-Speed 9188.84 samples/sec Loss 6.5722 LearningRate 0.0390 Epoch: 7 Global Step: 125270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:58,216-Speed 9206.65 samples/sec Loss 6.5892 LearningRate 0.0390 Epoch: 7 Global Step: 125280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:34:59,312-Speed 9344.39 samples/sec Loss 6.6124 LearningRate 0.0390 Epoch: 7 Global Step: 125290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:00,395-Speed 9457.80 samples/sec Loss 6.7126 LearningRate 0.0390 Epoch: 7 Global Step: 125300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:01,499-Speed 9284.35 samples/sec Loss 6.6108 LearningRate 0.0390 Epoch: 7 Global Step: 125310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:02,582-Speed 9459.02 samples/sec Loss 6.4038 LearningRate 0.0390 Epoch: 7 Global Step: 125320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:03,700-Speed 9166.10 samples/sec Loss 6.6030 LearningRate 0.0390 Epoch: 7 Global Step: 125330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:35:04,758-Speed 9685.05 samples/sec Loss 6.5034 LearningRate 0.0390 Epoch: 7 Global Step: 125340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:35:05,816-Speed 9678.47 samples/sec Loss 6.6737 LearningRate 0.0390 Epoch: 7 Global Step: 125350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:06,926-Speed 9233.11 samples/sec Loss 6.6006 LearningRate 0.0390 Epoch: 7 Global Step: 125360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:08,007-Speed 9480.23 samples/sec Loss 6.5319 LearningRate 0.0390 Epoch: 7 Global Step: 125370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:09,147-Speed 8987.00 samples/sec Loss 6.4585 LearningRate 0.0390 Epoch: 7 Global Step: 125380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:10,244-Speed 9344.66 samples/sec Loss 6.5074 LearningRate 0.0390 Epoch: 7 Global Step: 125390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:11,295-Speed 9750.98 samples/sec Loss 6.4375 LearningRate 0.0390 Epoch: 7 Global Step: 125400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:12,437-Speed 8965.01 samples/sec Loss 6.4880 LearningRate 0.0390 Epoch: 7 Global Step: 125410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:13,543-Speed 9267.40 samples/sec Loss 6.5794 LearningRate 0.0390 Epoch: 7 Global Step: 125420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:14,598-Speed 9708.46 samples/sec Loss 6.6542 LearningRate 0.0390 Epoch: 7 Global Step: 125430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:15,695-Speed 9344.21 samples/sec Loss 6.4847 LearningRate 0.0390 Epoch: 7 Global Step: 125440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:16,807-Speed 9212.17 samples/sec Loss 6.4493 LearningRate 0.0390 Epoch: 7 Global Step: 125450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:35:17,904-Speed 9338.57 samples/sec Loss 6.5130 LearningRate 0.0390 Epoch: 7 Global Step: 125460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:19,021-Speed 9171.14 samples/sec Loss 6.5321 LearningRate 0.0390 Epoch: 7 Global Step: 125470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:20,071-Speed 9754.10 samples/sec Loss 6.4493 LearningRate 0.0390 Epoch: 7 Global Step: 125480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:21,140-Speed 9587.30 samples/sec Loss 6.5234 LearningRate 0.0389 Epoch: 7 Global Step: 125490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:22,283-Speed 8967.88 samples/sec Loss 6.6182 LearningRate 0.0389 Epoch: 7 Global Step: 125500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:23,403-Speed 9150.24 samples/sec Loss 6.5614 LearningRate 0.0389 Epoch: 7 Global Step: 125510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:24,496-Speed 9376.99 samples/sec Loss 6.5064 LearningRate 0.0389 Epoch: 7 Global Step: 125520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:25,590-Speed 9367.26 samples/sec Loss 6.5198 LearningRate 0.0389 Epoch: 7 Global Step: 125530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:26,670-Speed 9486.32 samples/sec Loss 6.6562 LearningRate 0.0389 Epoch: 7 Global Step: 125540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:27,756-Speed 9437.76 samples/sec Loss 6.5103 LearningRate 0.0389 Epoch: 7 Global Step: 125550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:28,852-Speed 9352.79 samples/sec Loss 6.4393 LearningRate 0.0389 Epoch: 7 Global Step: 125560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:35:29,949-Speed 9337.53 samples/sec Loss 6.5297 LearningRate 0.0389 Epoch: 7 Global Step: 125570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:31,040-Speed 9390.56 samples/sec Loss 6.5807 LearningRate 0.0389 Epoch: 7 Global Step: 125580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:32,155-Speed 9185.75 samples/sec Loss 6.6465 LearningRate 0.0389 Epoch: 7 Global Step: 125590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:33,314-Speed 8845.52 samples/sec Loss 6.4647 LearningRate 0.0389 Epoch: 7 Global Step: 125600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:34,390-Speed 9521.95 samples/sec Loss 6.4707 LearningRate 0.0389 Epoch: 7 Global Step: 125610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:35,468-Speed 9499.71 samples/sec Loss 6.5615 LearningRate 0.0389 Epoch: 7 Global Step: 125620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:36,571-Speed 9285.44 samples/sec Loss 6.5995 LearningRate 0.0389 Epoch: 7 Global Step: 125630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:37,659-Speed 9423.27 samples/sec Loss 6.6776 LearningRate 0.0389 Epoch: 7 Global Step: 125640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:38,742-Speed 9459.71 samples/sec Loss 6.5597 LearningRate 0.0389 Epoch: 7 Global Step: 125650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:39,824-Speed 9473.94 samples/sec Loss 6.4246 LearningRate 0.0389 Epoch: 7 Global Step: 125660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:40,902-Speed 9501.80 samples/sec Loss 6.6743 LearningRate 0.0389 Epoch: 7 Global Step: 125670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:42,032-Speed 9069.97 samples/sec Loss 6.4467 LearningRate 0.0389 Epoch: 7 Global Step: 125680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:43,133-Speed 9308.67 samples/sec Loss 6.5354 LearningRate 0.0389 Epoch: 7 Global Step: 125690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:44,211-Speed 9502.23 samples/sec Loss 6.6062 LearningRate 0.0389 Epoch: 7 Global Step: 125700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:45,263-Speed 9744.63 samples/sec Loss 6.5300 LearningRate 0.0389 Epoch: 7 Global Step: 125710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:46,296-Speed 9919.06 samples/sec Loss 6.7759 LearningRate 0.0389 Epoch: 7 Global Step: 125720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:47,358-Speed 9646.71 samples/sec Loss 6.4963 LearningRate 0.0389 Epoch: 7 Global Step: 125730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:48,463-Speed 9269.98 samples/sec Loss 6.4224 LearningRate 0.0389 Epoch: 7 Global Step: 125740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:49,571-Speed 9248.33 samples/sec Loss 6.4809 LearningRate 0.0389 Epoch: 7 Global Step: 125750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:50,625-Speed 9721.77 samples/sec Loss 6.5049 LearningRate 0.0388 Epoch: 7 Global Step: 125760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:51,691-Speed 9614.68 samples/sec Loss 6.6453 LearningRate 0.0388 Epoch: 7 Global Step: 125770 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:35:52,759-Speed 9593.51 samples/sec Loss 6.4979 LearningRate 0.0388 Epoch: 7 Global Step: 125780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:35:53,807-Speed 9775.16 samples/sec Loss 6.5111 LearningRate 0.0388 Epoch: 7 Global Step: 125790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:54,886-Speed 9493.46 samples/sec Loss 6.4911 LearningRate 0.0388 Epoch: 7 Global Step: 125800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:55,977-Speed 9387.70 samples/sec Loss 6.6027 LearningRate 0.0388 Epoch: 7 Global Step: 125810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:57,072-Speed 9358.93 samples/sec Loss 6.5122 LearningRate 0.0388 Epoch: 7 Global Step: 125820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:58,160-Speed 9419.49 samples/sec Loss 6.5557 LearningRate 0.0388 Epoch: 7 Global Step: 125830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:35:59,240-Speed 9484.92 samples/sec Loss 6.4024 LearningRate 0.0388 Epoch: 7 Global Step: 125840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:00,338-Speed 9334.25 samples/sec Loss 6.5083 LearningRate 0.0388 Epoch: 7 Global Step: 125850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:01,403-Speed 9622.93 samples/sec Loss 6.6061 LearningRate 0.0388 Epoch: 7 Global Step: 125860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:02,495-Speed 9383.41 samples/sec Loss 6.6169 LearningRate 0.0388 Epoch: 7 Global Step: 125870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:03,602-Speed 9253.70 samples/sec Loss 6.5603 LearningRate 0.0388 Epoch: 7 Global Step: 125880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:04,627-Speed 9992.93 samples/sec Loss 6.6449 LearningRate 0.0388 Epoch: 7 Global Step: 125890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:36:05,691-Speed 9631.81 samples/sec Loss 6.5228 LearningRate 0.0388 Epoch: 7 Global Step: 125900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:36:06,769-Speed 9506.58 samples/sec Loss 6.6197 LearningRate 0.0388 Epoch: 7 Global Step: 125910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:36:07,847-Speed 9505.43 samples/sec Loss 6.6807 LearningRate 0.0388 Epoch: 7 Global Step: 125920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:36:08,898-Speed 9742.76 samples/sec Loss 6.5277 LearningRate 0.0388 Epoch: 7 Global Step: 125930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:09,957-Speed 9673.37 samples/sec Loss 6.5453 LearningRate 0.0388 Epoch: 7 Global Step: 125940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:11,057-Speed 9317.70 samples/sec Loss 6.6372 LearningRate 0.0388 Epoch: 7 Global Step: 125950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:12,155-Speed 9331.93 samples/sec Loss 6.5138 LearningRate 0.0388 Epoch: 7 Global Step: 125960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:13,239-Speed 9451.63 samples/sec Loss 6.6669 LearningRate 0.0388 Epoch: 7 Global Step: 125970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:14,307-Speed 9591.91 samples/sec Loss 6.6445 LearningRate 0.0388 Epoch: 7 Global Step: 125980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:15,376-Speed 9585.54 samples/sec Loss 6.6620 LearningRate 0.0388 Epoch: 7 Global Step: 125990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:16,467-Speed 9388.78 samples/sec Loss 6.5726 LearningRate 0.0388 Epoch: 7 Global Step: 126000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:36:38,383-[lfw][126000]XNorm: 10.732953 Training: 2022-04-11 16:36:38,384-[lfw][126000]Accuracy-Flip: 0.99650+-0.00252 Training: 2022-04-11 16:36:38,384-[lfw][126000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:37:03,668-[cfp_fp][126000]XNorm: 9.168110 Training: 2022-04-11 16:37:03,669-[cfp_fp][126000]Accuracy-Flip: 0.95843+-0.01266 Training: 2022-04-11 16:37:03,669-[cfp_fp][126000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:37:25,454-[agedb_30][126000]XNorm: 10.388509 Training: 2022-04-11 16:37:25,455-[agedb_30][126000]Accuracy-Flip: 0.96400+-0.00898 Training: 2022-04-11 16:37:25,456-[agedb_30][126000]Accuracy-Highest: 0.96483 Training: 2022-04-11 16:37:26,524-Speed 146.17 samples/sec Loss 6.5337 LearningRate 0.0388 Epoch: 7 Global Step: 126010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:27,613-Speed 9406.28 samples/sec Loss 6.5580 LearningRate 0.0387 Epoch: 7 Global Step: 126020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:28,742-Speed 9076.05 samples/sec Loss 6.6052 LearningRate 0.0387 Epoch: 7 Global Step: 126030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:37:29,842-Speed 9312.99 samples/sec Loss 6.5852 LearningRate 0.0387 Epoch: 7 Global Step: 126040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:37:30,927-Speed 9442.94 samples/sec Loss 6.4840 LearningRate 0.0387 Epoch: 7 Global Step: 126050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:32,017-Speed 9404.79 samples/sec Loss 6.5589 LearningRate 0.0387 Epoch: 7 Global Step: 126060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:33,124-Speed 9257.99 samples/sec Loss 6.5510 LearningRate 0.0387 Epoch: 7 Global Step: 126070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:34,199-Speed 9526.75 samples/sec Loss 6.4284 LearningRate 0.0387 Epoch: 7 Global Step: 126080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:35,252-Speed 9735.86 samples/sec Loss 6.5783 LearningRate 0.0387 Epoch: 7 Global Step: 126090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:36,377-Speed 9101.50 samples/sec Loss 6.6423 LearningRate 0.0387 Epoch: 7 Global Step: 126100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:37,479-Speed 9301.60 samples/sec Loss 6.4940 LearningRate 0.0387 Epoch: 7 Global Step: 126110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:38,567-Speed 9415.30 samples/sec Loss 6.5086 LearningRate 0.0387 Epoch: 7 Global Step: 126120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:39,696-Speed 9072.23 samples/sec Loss 6.5221 LearningRate 0.0387 Epoch: 7 Global Step: 126130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:40,770-Speed 9545.72 samples/sec Loss 6.6035 LearningRate 0.0387 Epoch: 7 Global Step: 126140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:41,852-Speed 9469.84 samples/sec Loss 6.5230 LearningRate 0.0387 Epoch: 7 Global Step: 126150 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:37:42,956-Speed 9278.62 samples/sec Loss 6.5246 LearningRate 0.0387 Epoch: 7 Global Step: 126160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:44,055-Speed 9328.35 samples/sec Loss 6.6063 LearningRate 0.0387 Epoch: 7 Global Step: 126170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:45,147-Speed 9380.30 samples/sec Loss 6.5584 LearningRate 0.0387 Epoch: 7 Global Step: 126180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:46,213-Speed 9605.45 samples/sec Loss 6.5901 LearningRate 0.0387 Epoch: 7 Global Step: 126190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:47,304-Speed 9398.52 samples/sec Loss 6.6187 LearningRate 0.0387 Epoch: 7 Global Step: 126200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:48,406-Speed 9296.43 samples/sec Loss 6.6171 LearningRate 0.0387 Epoch: 7 Global Step: 126210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:49,490-Speed 9463.21 samples/sec Loss 6.6118 LearningRate 0.0387 Epoch: 7 Global Step: 126220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:50,585-Speed 9351.19 samples/sec Loss 6.6456 LearningRate 0.0387 Epoch: 7 Global Step: 126230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:51,663-Speed 9505.49 samples/sec Loss 6.5609 LearningRate 0.0387 Epoch: 7 Global Step: 126240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:52,800-Speed 9013.70 samples/sec Loss 6.5161 LearningRate 0.0387 Epoch: 7 Global Step: 126250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:53,912-Speed 9215.30 samples/sec Loss 6.5704 LearningRate 0.0387 Epoch: 7 Global Step: 126260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:37:54,962-Speed 9760.39 samples/sec Loss 6.6548 LearningRate 0.0387 Epoch: 7 Global Step: 126270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:37:56,048-Speed 9428.69 samples/sec Loss 6.5088 LearningRate 0.0387 Epoch: 7 Global Step: 126280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:57,143-Speed 9357.96 samples/sec Loss 6.5906 LearningRate 0.0386 Epoch: 7 Global Step: 126290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:58,260-Speed 9175.29 samples/sec Loss 6.6026 LearningRate 0.0386 Epoch: 7 Global Step: 126300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:37:59,328-Speed 9591.32 samples/sec Loss 6.6255 LearningRate 0.0386 Epoch: 7 Global Step: 126310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:00,460-Speed 9054.63 samples/sec Loss 6.4440 LearningRate 0.0386 Epoch: 7 Global Step: 126320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:01,537-Speed 9515.67 samples/sec Loss 6.5597 LearningRate 0.0386 Epoch: 7 Global Step: 126330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:02,672-Speed 9025.69 samples/sec Loss 6.5668 LearningRate 0.0386 Epoch: 7 Global Step: 126340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:03,741-Speed 9582.00 samples/sec Loss 6.5885 LearningRate 0.0386 Epoch: 7 Global Step: 126350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:04,830-Speed 9410.55 samples/sec Loss 6.6444 LearningRate 0.0386 Epoch: 7 Global Step: 126360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:05,910-Speed 9483.03 samples/sec Loss 6.4897 LearningRate 0.0386 Epoch: 7 Global Step: 126370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:06,984-Speed 9546.49 samples/sec Loss 6.5225 LearningRate 0.0386 Epoch: 7 Global Step: 126380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:08,106-Speed 9130.88 samples/sec Loss 6.5753 LearningRate 0.0386 Epoch: 7 Global Step: 126390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:09,176-Speed 9574.23 samples/sec Loss 6.5978 LearningRate 0.0386 Epoch: 7 Global Step: 126400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:10,261-Speed 9442.98 samples/sec Loss 6.5088 LearningRate 0.0386 Epoch: 7 Global Step: 126410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:11,373-Speed 9214.03 samples/sec Loss 6.4960 LearningRate 0.0386 Epoch: 7 Global Step: 126420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:12,441-Speed 9593.10 samples/sec Loss 6.5631 LearningRate 0.0386 Epoch: 7 Global Step: 126430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:13,498-Speed 9702.75 samples/sec Loss 6.5903 LearningRate 0.0386 Epoch: 7 Global Step: 126440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:14,539-Speed 9840.33 samples/sec Loss 6.4755 LearningRate 0.0386 Epoch: 7 Global Step: 126450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:15,616-Speed 9514.20 samples/sec Loss 6.5461 LearningRate 0.0386 Epoch: 7 Global Step: 126460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:16,688-Speed 9554.33 samples/sec Loss 6.6268 LearningRate 0.0386 Epoch: 7 Global Step: 126470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:17,769-Speed 9483.88 samples/sec Loss 6.5964 LearningRate 0.0386 Epoch: 7 Global Step: 126480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:18,813-Speed 9806.88 samples/sec Loss 6.4693 LearningRate 0.0386 Epoch: 7 Global Step: 126490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:19,924-Speed 9227.60 samples/sec Loss 6.6512 LearningRate 0.0386 Epoch: 7 Global Step: 126500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:20,984-Speed 9662.99 samples/sec Loss 6.4699 LearningRate 0.0386 Epoch: 7 Global Step: 126510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:22,063-Speed 9498.98 samples/sec Loss 6.5174 LearningRate 0.0386 Epoch: 7 Global Step: 126520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:23,214-Speed 8898.17 samples/sec Loss 6.4909 LearningRate 0.0386 Epoch: 7 Global Step: 126530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:24,314-Speed 9317.01 samples/sec Loss 6.6554 LearningRate 0.0386 Epoch: 7 Global Step: 126540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:25,442-Speed 9081.94 samples/sec Loss 6.4402 LearningRate 0.0386 Epoch: 7 Global Step: 126550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:26,519-Speed 9516.12 samples/sec Loss 6.4767 LearningRate 0.0385 Epoch: 7 Global Step: 126560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:27,617-Speed 9328.09 samples/sec Loss 6.6195 LearningRate 0.0385 Epoch: 7 Global Step: 126570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:28,708-Speed 9391.78 samples/sec Loss 6.6249 LearningRate 0.0385 Epoch: 7 Global Step: 126580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:29,806-Speed 9331.19 samples/sec Loss 6.5809 LearningRate 0.0385 Epoch: 7 Global Step: 126590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:30,912-Speed 9263.17 samples/sec Loss 6.5340 LearningRate 0.0385 Epoch: 7 Global Step: 126600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:32,008-Speed 9354.33 samples/sec Loss 6.5455 LearningRate 0.0385 Epoch: 7 Global Step: 126610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:33,066-Speed 9679.59 samples/sec Loss 6.5812 LearningRate 0.0385 Epoch: 7 Global Step: 126620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:34,174-Speed 9250.10 samples/sec Loss 6.5760 LearningRate 0.0385 Epoch: 7 Global Step: 126630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:35,288-Speed 9199.57 samples/sec Loss 6.5185 LearningRate 0.0385 Epoch: 7 Global Step: 126640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:36,351-Speed 9638.52 samples/sec Loss 6.6439 LearningRate 0.0385 Epoch: 7 Global Step: 126650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:37,448-Speed 9335.28 samples/sec Loss 6.4652 LearningRate 0.0385 Epoch: 7 Global Step: 126660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:38,542-Speed 9367.54 samples/sec Loss 6.5467 LearningRate 0.0385 Epoch: 7 Global Step: 126670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:39,639-Speed 9338.56 samples/sec Loss 6.5281 LearningRate 0.0385 Epoch: 7 Global Step: 126680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:40,730-Speed 9394.76 samples/sec Loss 6.5561 LearningRate 0.0385 Epoch: 7 Global Step: 126690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:41,828-Speed 9324.50 samples/sec Loss 6.6436 LearningRate 0.0385 Epoch: 7 Global Step: 126700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:42,933-Speed 9275.95 samples/sec Loss 6.6199 LearningRate 0.0385 Epoch: 7 Global Step: 126710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:44,021-Speed 9426.52 samples/sec Loss 6.5642 LearningRate 0.0385 Epoch: 7 Global Step: 126720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:45,098-Speed 9508.66 samples/sec Loss 6.4961 LearningRate 0.0385 Epoch: 7 Global Step: 126730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:46,164-Speed 9609.79 samples/sec Loss 6.5896 LearningRate 0.0385 Epoch: 7 Global Step: 126740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:47,240-Speed 9528.25 samples/sec Loss 6.6047 LearningRate 0.0385 Epoch: 7 Global Step: 126750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:38:48,341-Speed 9304.86 samples/sec Loss 6.5367 LearningRate 0.0385 Epoch: 7 Global Step: 126760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:49,448-Speed 9254.12 samples/sec Loss 6.6772 LearningRate 0.0385 Epoch: 7 Global Step: 126770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:50,531-Speed 9466.88 samples/sec Loss 6.6050 LearningRate 0.0385 Epoch: 7 Global Step: 126780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:51,626-Speed 9350.10 samples/sec Loss 6.4954 LearningRate 0.0385 Epoch: 7 Global Step: 126790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:52,695-Speed 9583.49 samples/sec Loss 6.5120 LearningRate 0.0385 Epoch: 7 Global Step: 126800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:53,754-Speed 9678.80 samples/sec Loss 6.5168 LearningRate 0.0385 Epoch: 7 Global Step: 126810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:54,832-Speed 9501.93 samples/sec Loss 6.6696 LearningRate 0.0385 Epoch: 7 Global Step: 126820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:55,940-Speed 9251.00 samples/sec Loss 6.5317 LearningRate 0.0384 Epoch: 7 Global Step: 126830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:57,025-Speed 9442.78 samples/sec Loss 6.5763 LearningRate 0.0384 Epoch: 7 Global Step: 126840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:58,083-Speed 9682.01 samples/sec Loss 6.7260 LearningRate 0.0384 Epoch: 7 Global Step: 126850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:38:59,128-Speed 9805.12 samples/sec Loss 6.5581 LearningRate 0.0384 Epoch: 7 Global Step: 126860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:00,217-Speed 9408.09 samples/sec Loss 6.5786 LearningRate 0.0384 Epoch: 7 Global Step: 126870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:01,305-Speed 9417.78 samples/sec Loss 6.5063 LearningRate 0.0384 Epoch: 7 Global Step: 126880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:02,391-Speed 9435.73 samples/sec Loss 6.5371 LearningRate 0.0384 Epoch: 7 Global Step: 126890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:03,449-Speed 9687.82 samples/sec Loss 6.4711 LearningRate 0.0384 Epoch: 7 Global Step: 126900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:04,487-Speed 9867.40 samples/sec Loss 6.5689 LearningRate 0.0384 Epoch: 7 Global Step: 126910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:05,575-Speed 9420.27 samples/sec Loss 6.5863 LearningRate 0.0384 Epoch: 7 Global Step: 126920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:06,671-Speed 9350.28 samples/sec Loss 6.4743 LearningRate 0.0384 Epoch: 7 Global Step: 126930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:07,729-Speed 9683.29 samples/sec Loss 6.5272 LearningRate 0.0384 Epoch: 7 Global Step: 126940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:08,785-Speed 9702.90 samples/sec Loss 6.4617 LearningRate 0.0384 Epoch: 7 Global Step: 126950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:09,893-Speed 9244.00 samples/sec Loss 6.6686 LearningRate 0.0384 Epoch: 7 Global Step: 126960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:10,981-Speed 9413.97 samples/sec Loss 6.4577 LearningRate 0.0384 Epoch: 7 Global Step: 126970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:12,065-Speed 9451.61 samples/sec Loss 6.5034 LearningRate 0.0384 Epoch: 7 Global Step: 126980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:13,157-Speed 9387.39 samples/sec Loss 6.4695 LearningRate 0.0384 Epoch: 7 Global Step: 126990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:14,202-Speed 9807.31 samples/sec Loss 6.4985 LearningRate 0.0384 Epoch: 7 Global Step: 127000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:15,265-Speed 9635.63 samples/sec Loss 6.5410 LearningRate 0.0384 Epoch: 7 Global Step: 127010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:16,359-Speed 9364.21 samples/sec Loss 6.4978 LearningRate 0.0384 Epoch: 7 Global Step: 127020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:17,478-Speed 9158.37 samples/sec Loss 6.4687 LearningRate 0.0384 Epoch: 7 Global Step: 127030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:18,594-Speed 9183.16 samples/sec Loss 6.5348 LearningRate 0.0384 Epoch: 7 Global Step: 127040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:19,673-Speed 9492.58 samples/sec Loss 6.5750 LearningRate 0.0384 Epoch: 7 Global Step: 127050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:20,762-Speed 9413.60 samples/sec Loss 6.6047 LearningRate 0.0384 Epoch: 7 Global Step: 127060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:21,884-Speed 9135.02 samples/sec Loss 6.5329 LearningRate 0.0384 Epoch: 7 Global Step: 127070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:22,975-Speed 9397.49 samples/sec Loss 6.5885 LearningRate 0.0384 Epoch: 7 Global Step: 127080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:39:24,066-Speed 9389.92 samples/sec Loss 6.6379 LearningRate 0.0384 Epoch: 7 Global Step: 127090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:25,194-Speed 9076.64 samples/sec Loss 6.6059 LearningRate 0.0383 Epoch: 7 Global Step: 127100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:26,275-Speed 9481.97 samples/sec Loss 6.6119 LearningRate 0.0383 Epoch: 7 Global Step: 127110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:27,339-Speed 9624.54 samples/sec Loss 6.5373 LearningRate 0.0383 Epoch: 7 Global Step: 127120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:28,442-Speed 9293.64 samples/sec Loss 6.6250 LearningRate 0.0383 Epoch: 7 Global Step: 127130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:29,599-Speed 8850.36 samples/sec Loss 6.4072 LearningRate 0.0383 Epoch: 7 Global Step: 127140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:30,695-Speed 9354.94 samples/sec Loss 6.6018 LearningRate 0.0383 Epoch: 7 Global Step: 127150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:31,764-Speed 9583.45 samples/sec Loss 6.6265 LearningRate 0.0383 Epoch: 7 Global Step: 127160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:32,872-Speed 9244.73 samples/sec Loss 6.5980 LearningRate 0.0383 Epoch: 7 Global Step: 127170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:33,944-Speed 9565.66 samples/sec Loss 6.6010 LearningRate 0.0383 Epoch: 7 Global Step: 127180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:35,036-Speed 9376.42 samples/sec Loss 6.6201 LearningRate 0.0383 Epoch: 7 Global Step: 127190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:36,119-Speed 9464.05 samples/sec Loss 6.5793 LearningRate 0.0383 Epoch: 7 Global Step: 127200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:37,166-Speed 9784.50 samples/sec Loss 6.4922 LearningRate 0.0383 Epoch: 7 Global Step: 127210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:38,259-Speed 9376.27 samples/sec Loss 6.6388 LearningRate 0.0383 Epoch: 7 Global Step: 127220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:39,375-Speed 9187.69 samples/sec Loss 6.5018 LearningRate 0.0383 Epoch: 7 Global Step: 127230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:40,466-Speed 9391.89 samples/sec Loss 6.5627 LearningRate 0.0383 Epoch: 7 Global Step: 127240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:41,516-Speed 9761.42 samples/sec Loss 6.5227 LearningRate 0.0383 Epoch: 7 Global Step: 127250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:42,636-Speed 9147.12 samples/sec Loss 6.6119 LearningRate 0.0383 Epoch: 7 Global Step: 127260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:43,734-Speed 9332.13 samples/sec Loss 6.5652 LearningRate 0.0383 Epoch: 7 Global Step: 127270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:44,848-Speed 9199.70 samples/sec Loss 6.5799 LearningRate 0.0383 Epoch: 7 Global Step: 127280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:45,969-Speed 9140.30 samples/sec Loss 6.4276 LearningRate 0.0383 Epoch: 7 Global Step: 127290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:47,069-Speed 9317.72 samples/sec Loss 6.4400 LearningRate 0.0383 Epoch: 7 Global Step: 127300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:39:48,167-Speed 9328.21 samples/sec Loss 6.6043 LearningRate 0.0383 Epoch: 7 Global Step: 127310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:49,285-Speed 9164.74 samples/sec Loss 6.6853 LearningRate 0.0383 Epoch: 7 Global Step: 127320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:50,392-Speed 9254.91 samples/sec Loss 6.6610 LearningRate 0.0383 Epoch: 7 Global Step: 127330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:51,484-Speed 9387.02 samples/sec Loss 6.5686 LearningRate 0.0383 Epoch: 7 Global Step: 127340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:52,555-Speed 9568.64 samples/sec Loss 6.5345 LearningRate 0.0383 Epoch: 7 Global Step: 127350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:53,646-Speed 9388.85 samples/sec Loss 6.6421 LearningRate 0.0383 Epoch: 7 Global Step: 127360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:54,740-Speed 9358.90 samples/sec Loss 6.4401 LearningRate 0.0382 Epoch: 7 Global Step: 127370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:55,843-Speed 9290.73 samples/sec Loss 6.5805 LearningRate 0.0382 Epoch: 7 Global Step: 127380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:56,945-Speed 9300.21 samples/sec Loss 6.5383 LearningRate 0.0382 Epoch: 7 Global Step: 127390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:58,013-Speed 9591.81 samples/sec Loss 6.5166 LearningRate 0.0382 Epoch: 7 Global Step: 127400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:39:59,077-Speed 9631.33 samples/sec Loss 6.5596 LearningRate 0.0382 Epoch: 7 Global Step: 127410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:40:00,182-Speed 9277.24 samples/sec Loss 6.4060 LearningRate 0.0382 Epoch: 7 Global Step: 127420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:01,302-Speed 9143.48 samples/sec Loss 6.6195 LearningRate 0.0382 Epoch: 7 Global Step: 127430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:02,381-Speed 9496.79 samples/sec Loss 6.5447 LearningRate 0.0382 Epoch: 7 Global Step: 127440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:03,507-Speed 9100.79 samples/sec Loss 6.4184 LearningRate 0.0382 Epoch: 7 Global Step: 127450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:04,602-Speed 9355.28 samples/sec Loss 6.6283 LearningRate 0.0382 Epoch: 7 Global Step: 127460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:05,672-Speed 9579.99 samples/sec Loss 6.5672 LearningRate 0.0382 Epoch: 7 Global Step: 127470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:06,736-Speed 9626.11 samples/sec Loss 6.6759 LearningRate 0.0382 Epoch: 7 Global Step: 127480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:07,822-Speed 9433.82 samples/sec Loss 6.5251 LearningRate 0.0382 Epoch: 7 Global Step: 127490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:08,901-Speed 9497.79 samples/sec Loss 6.5475 LearningRate 0.0382 Epoch: 7 Global Step: 127500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:10,020-Speed 9153.19 samples/sec Loss 6.5647 LearningRate 0.0382 Epoch: 7 Global Step: 127510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:11,095-Speed 9536.15 samples/sec Loss 6.5183 LearningRate 0.0382 Epoch: 7 Global Step: 127520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:40:12,207-Speed 9214.36 samples/sec Loss 6.5446 LearningRate 0.0382 Epoch: 7 Global Step: 127530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:13,348-Speed 8979.56 samples/sec Loss 6.5349 LearningRate 0.0382 Epoch: 7 Global Step: 127540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:14,478-Speed 9066.70 samples/sec Loss 6.3772 LearningRate 0.0382 Epoch: 7 Global Step: 127550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:15,577-Speed 9327.95 samples/sec Loss 6.5811 LearningRate 0.0382 Epoch: 7 Global Step: 127560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:16,645-Speed 9589.95 samples/sec Loss 6.5909 LearningRate 0.0382 Epoch: 7 Global Step: 127570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:17,739-Speed 9361.78 samples/sec Loss 6.5326 LearningRate 0.0382 Epoch: 7 Global Step: 127580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:18,825-Speed 9436.78 samples/sec Loss 6.4215 LearningRate 0.0382 Epoch: 7 Global Step: 127590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:19,881-Speed 9704.49 samples/sec Loss 6.5883 LearningRate 0.0382 Epoch: 7 Global Step: 127600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:20,962-Speed 9486.73 samples/sec Loss 6.6007 LearningRate 0.0382 Epoch: 7 Global Step: 127610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:22,064-Speed 9291.03 samples/sec Loss 6.6223 LearningRate 0.0382 Epoch: 7 Global Step: 127620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:23,137-Speed 9546.61 samples/sec Loss 6.6658 LearningRate 0.0382 Epoch: 7 Global Step: 127630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:40:24,199-Speed 9654.86 samples/sec Loss 6.5726 LearningRate 0.0381 Epoch: 7 Global Step: 127640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:40:25,259-Speed 9658.04 samples/sec Loss 6.5825 LearningRate 0.0381 Epoch: 7 Global Step: 127650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:26,322-Speed 9646.66 samples/sec Loss 6.6643 LearningRate 0.0381 Epoch: 7 Global Step: 127660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:27,469-Speed 8931.20 samples/sec Loss 6.6651 LearningRate 0.0381 Epoch: 7 Global Step: 127670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:28,595-Speed 9092.75 samples/sec Loss 6.5553 LearningRate 0.0381 Epoch: 7 Global Step: 127680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:29,698-Speed 9290.47 samples/sec Loss 6.5436 LearningRate 0.0381 Epoch: 7 Global Step: 127690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:30,772-Speed 9542.06 samples/sec Loss 6.4742 LearningRate 0.0381 Epoch: 7 Global Step: 127700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:31,880-Speed 9249.16 samples/sec Loss 6.5990 LearningRate 0.0381 Epoch: 7 Global Step: 127710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:33,026-Speed 8942.36 samples/sec Loss 6.5453 LearningRate 0.0381 Epoch: 7 Global Step: 127720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:34,134-Speed 9245.01 samples/sec Loss 6.5433 LearningRate 0.0381 Epoch: 7 Global Step: 127730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:35,220-Speed 9438.80 samples/sec Loss 6.5960 LearningRate 0.0381 Epoch: 7 Global Step: 127740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:36,303-Speed 9462.03 samples/sec Loss 6.5235 LearningRate 0.0381 Epoch: 7 Global Step: 127750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:37,353-Speed 9759.91 samples/sec Loss 6.5687 LearningRate 0.0381 Epoch: 7 Global Step: 127760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:38,433-Speed 9486.51 samples/sec Loss 6.6054 LearningRate 0.0381 Epoch: 7 Global Step: 127770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:39,497-Speed 9629.71 samples/sec Loss 6.5453 LearningRate 0.0381 Epoch: 7 Global Step: 127780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:40,612-Speed 9189.15 samples/sec Loss 6.5687 LearningRate 0.0381 Epoch: 7 Global Step: 127790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:41,716-Speed 9276.33 samples/sec Loss 6.5126 LearningRate 0.0381 Epoch: 7 Global Step: 127800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:42,820-Speed 9283.87 samples/sec Loss 6.6287 LearningRate 0.0381 Epoch: 7 Global Step: 127810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:43,958-Speed 9005.04 samples/sec Loss 6.4932 LearningRate 0.0381 Epoch: 7 Global Step: 127820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:45,045-Speed 9427.63 samples/sec Loss 6.5589 LearningRate 0.0381 Epoch: 7 Global Step: 127830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:46,106-Speed 9653.61 samples/sec Loss 6.5218 LearningRate 0.0381 Epoch: 7 Global Step: 127840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:47,194-Speed 9417.44 samples/sec Loss 6.6078 LearningRate 0.0381 Epoch: 7 Global Step: 127850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:40:48,309-Speed 9188.88 samples/sec Loss 6.6190 LearningRate 0.0381 Epoch: 7 Global Step: 127860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:40:49,393-Speed 9454.43 samples/sec Loss 6.6143 LearningRate 0.0381 Epoch: 7 Global Step: 127870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:40:50,479-Speed 9437.74 samples/sec Loss 6.5487 LearningRate 0.0381 Epoch: 7 Global Step: 127880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:40:51,556-Speed 9507.99 samples/sec Loss 6.5401 LearningRate 0.0381 Epoch: 7 Global Step: 127890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:40:52,638-Speed 9473.76 samples/sec Loss 6.6286 LearningRate 0.0381 Epoch: 7 Global Step: 127900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:40:53,701-Speed 9635.57 samples/sec Loss 6.5259 LearningRate 0.0380 Epoch: 7 Global Step: 127910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:40:54,850-Speed 8919.94 samples/sec Loss 6.5456 LearningRate 0.0380 Epoch: 7 Global Step: 127920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:40:55,930-Speed 9496.37 samples/sec Loss 6.5259 LearningRate 0.0380 Epoch: 7 Global Step: 127930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:40:57,037-Speed 9254.13 samples/sec Loss 6.6119 LearningRate 0.0380 Epoch: 7 Global Step: 127940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:40:58,148-Speed 9224.36 samples/sec Loss 6.5838 LearningRate 0.0380 Epoch: 7 Global Step: 127950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:40:59,251-Speed 9285.96 samples/sec Loss 6.5434 LearningRate 0.0380 Epoch: 7 Global Step: 127960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:41:00,321-Speed 9577.18 samples/sec Loss 6.3960 LearningRate 0.0380 Epoch: 7 Global Step: 127970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:41:01,385-Speed 9629.84 samples/sec Loss 6.4520 LearningRate 0.0380 Epoch: 7 Global Step: 127980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:41:02,471-Speed 9435.86 samples/sec Loss 6.5674 LearningRate 0.0380 Epoch: 7 Global Step: 127990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:41:03,539-Speed 9591.07 samples/sec Loss 6.5755 LearningRate 0.0380 Epoch: 7 Global Step: 128000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:41:25,452-[lfw][128000]XNorm: 11.146215 Training: 2022-04-11 16:41:25,453-[lfw][128000]Accuracy-Flip: 0.99533+-0.00332 Training: 2022-04-11 16:41:25,453-[lfw][128000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:41:50,773-[cfp_fp][128000]XNorm: 9.536976 Training: 2022-04-11 16:41:50,774-[cfp_fp][128000]Accuracy-Flip: 0.95943+-0.00821 Training: 2022-04-11 16:41:50,774-[cfp_fp][128000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:42:12,621-[agedb_30][128000]XNorm: 10.868312 Training: 2022-04-11 16:42:12,622-[agedb_30][128000]Accuracy-Flip: 0.96283+-0.01135 Training: 2022-04-11 16:42:12,622-[agedb_30][128000]Accuracy-Highest: 0.96483 Training: 2022-04-11 16:42:13,708-Speed 145.94 samples/sec Loss 6.4688 LearningRate 0.0380 Epoch: 7 Global Step: 128010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:14,848-Speed 8985.73 samples/sec Loss 6.5609 LearningRate 0.0380 Epoch: 7 Global Step: 128020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:15,955-Speed 9256.13 samples/sec Loss 6.5735 LearningRate 0.0380 Epoch: 7 Global Step: 128030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:17,074-Speed 9157.16 samples/sec Loss 6.5321 LearningRate 0.0380 Epoch: 7 Global Step: 128040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:18,196-Speed 9131.93 samples/sec Loss 6.4494 LearningRate 0.0380 Epoch: 7 Global Step: 128050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:42:19,260-Speed 9633.70 samples/sec Loss 6.4519 LearningRate 0.0380 Epoch: 7 Global Step: 128060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:42:20,332-Speed 9549.52 samples/sec Loss 6.4460 LearningRate 0.0380 Epoch: 7 Global Step: 128070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:42:21,420-Speed 9422.14 samples/sec Loss 6.4918 LearningRate 0.0380 Epoch: 7 Global Step: 128080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:42:22,496-Speed 9516.17 samples/sec Loss 6.4911 LearningRate 0.0380 Epoch: 7 Global Step: 128090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:23,578-Speed 9469.73 samples/sec Loss 6.6149 LearningRate 0.0380 Epoch: 7 Global Step: 128100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:24,644-Speed 9614.65 samples/sec Loss 6.4362 LearningRate 0.0380 Epoch: 7 Global Step: 128110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:25,703-Speed 9681.55 samples/sec Loss 6.6211 LearningRate 0.0380 Epoch: 7 Global Step: 128120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:26,793-Speed 9400.63 samples/sec Loss 6.6172 LearningRate 0.0380 Epoch: 7 Global Step: 128130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:27,866-Speed 9550.28 samples/sec Loss 6.4228 LearningRate 0.0380 Epoch: 7 Global Step: 128140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:28,961-Speed 9355.92 samples/sec Loss 6.5710 LearningRate 0.0380 Epoch: 7 Global Step: 128150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:30,060-Speed 9320.02 samples/sec Loss 6.4513 LearningRate 0.0380 Epoch: 7 Global Step: 128160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:31,122-Speed 9643.89 samples/sec Loss 6.6564 LearningRate 0.0380 Epoch: 7 Global Step: 128170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:32,163-Speed 9846.53 samples/sec Loss 6.5387 LearningRate 0.0379 Epoch: 7 Global Step: 128180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:33,209-Speed 9796.29 samples/sec Loss 6.4929 LearningRate 0.0379 Epoch: 7 Global Step: 128190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:34,288-Speed 9498.69 samples/sec Loss 6.4575 LearningRate 0.0379 Epoch: 7 Global Step: 128200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:35,359-Speed 9561.71 samples/sec Loss 6.6700 LearningRate 0.0379 Epoch: 7 Global Step: 128210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:36,476-Speed 9175.30 samples/sec Loss 6.5853 LearningRate 0.0379 Epoch: 7 Global Step: 128220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:37,542-Speed 9613.76 samples/sec Loss 6.7208 LearningRate 0.0379 Epoch: 7 Global Step: 128230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:38,617-Speed 9525.11 samples/sec Loss 6.5006 LearningRate 0.0379 Epoch: 7 Global Step: 128240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:39,734-Speed 9176.98 samples/sec Loss 6.6093 LearningRate 0.0379 Epoch: 7 Global Step: 128250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:40,776-Speed 9830.92 samples/sec Loss 6.6498 LearningRate 0.0379 Epoch: 7 Global Step: 128260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:41,845-Speed 9585.01 samples/sec Loss 6.5488 LearningRate 0.0379 Epoch: 7 Global Step: 128270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:42,913-Speed 9589.66 samples/sec Loss 6.4865 LearningRate 0.0379 Epoch: 7 Global Step: 128280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:44,012-Speed 9326.16 samples/sec Loss 6.6229 LearningRate 0.0379 Epoch: 7 Global Step: 128290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:42:45,055-Speed 9835.34 samples/sec Loss 6.4660 LearningRate 0.0379 Epoch: 7 Global Step: 128300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:46,117-Speed 9642.69 samples/sec Loss 6.4108 LearningRate 0.0379 Epoch: 7 Global Step: 128310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:47,226-Speed 9237.04 samples/sec Loss 6.5003 LearningRate 0.0379 Epoch: 7 Global Step: 128320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:48,322-Speed 9353.36 samples/sec Loss 6.5895 LearningRate 0.0379 Epoch: 7 Global Step: 128330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:49,412-Speed 9400.95 samples/sec Loss 6.6066 LearningRate 0.0379 Epoch: 7 Global Step: 128340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:50,475-Speed 9636.77 samples/sec Loss 6.5899 LearningRate 0.0379 Epoch: 7 Global Step: 128350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:51,591-Speed 9180.40 samples/sec Loss 6.6697 LearningRate 0.0379 Epoch: 7 Global Step: 128360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:52,709-Speed 9164.93 samples/sec Loss 6.6609 LearningRate 0.0379 Epoch: 7 Global Step: 128370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:53,837-Speed 9080.18 samples/sec Loss 6.4847 LearningRate 0.0379 Epoch: 7 Global Step: 128380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:54,909-Speed 9557.89 samples/sec Loss 6.6180 LearningRate 0.0379 Epoch: 7 Global Step: 128390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:42:56,007-Speed 9339.49 samples/sec Loss 6.4945 LearningRate 0.0379 Epoch: 7 Global Step: 128400 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:42:57,090-Speed 9456.64 samples/sec Loss 6.5313 LearningRate 0.0379 Epoch: 7 Global Step: 128410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:42:58,204-Speed 9197.83 samples/sec Loss 6.5728 LearningRate 0.0379 Epoch: 7 Global Step: 128420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:42:59,302-Speed 9338.70 samples/sec Loss 6.5559 LearningRate 0.0379 Epoch: 7 Global Step: 128430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:00,385-Speed 9455.91 samples/sec Loss 6.5284 LearningRate 0.0379 Epoch: 7 Global Step: 128440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:01,450-Speed 9620.41 samples/sec Loss 6.5222 LearningRate 0.0378 Epoch: 7 Global Step: 128450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:02,568-Speed 9167.39 samples/sec Loss 6.5177 LearningRate 0.0378 Epoch: 7 Global Step: 128460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:03,666-Speed 9330.46 samples/sec Loss 6.5406 LearningRate 0.0378 Epoch: 7 Global Step: 128470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:04,741-Speed 9533.28 samples/sec Loss 6.5391 LearningRate 0.0378 Epoch: 7 Global Step: 128480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:05,818-Speed 9513.63 samples/sec Loss 6.5752 LearningRate 0.0378 Epoch: 7 Global Step: 128490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:06,914-Speed 9348.90 samples/sec Loss 6.5578 LearningRate 0.0378 Epoch: 7 Global Step: 128500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:08,006-Speed 9384.12 samples/sec Loss 6.5790 LearningRate 0.0378 Epoch: 7 Global Step: 128510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:09,131-Speed 9108.71 samples/sec Loss 6.5280 LearningRate 0.0378 Epoch: 7 Global Step: 128520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:10,237-Speed 9259.11 samples/sec Loss 6.4984 LearningRate 0.0378 Epoch: 7 Global Step: 128530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:43:11,329-Speed 9381.60 samples/sec Loss 6.4185 LearningRate 0.0378 Epoch: 7 Global Step: 128540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:12,411-Speed 9471.37 samples/sec Loss 6.5159 LearningRate 0.0378 Epoch: 7 Global Step: 128550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:13,513-Speed 9299.58 samples/sec Loss 6.5487 LearningRate 0.0378 Epoch: 7 Global Step: 128560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:14,615-Speed 9302.07 samples/sec Loss 6.5560 LearningRate 0.0378 Epoch: 7 Global Step: 128570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:15,704-Speed 9403.75 samples/sec Loss 6.6194 LearningRate 0.0378 Epoch: 7 Global Step: 128580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:16,819-Speed 9205.76 samples/sec Loss 6.5618 LearningRate 0.0378 Epoch: 7 Global Step: 128590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:17,926-Speed 9254.81 samples/sec Loss 6.5699 LearningRate 0.0378 Epoch: 7 Global Step: 128600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:19,048-Speed 9125.45 samples/sec Loss 6.4296 LearningRate 0.0378 Epoch: 7 Global Step: 128610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:20,121-Speed 9550.38 samples/sec Loss 6.5360 LearningRate 0.0378 Epoch: 7 Global Step: 128620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:21,226-Speed 9272.12 samples/sec Loss 6.5889 LearningRate 0.0378 Epoch: 7 Global Step: 128630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:22,314-Speed 9422.03 samples/sec Loss 6.6109 LearningRate 0.0378 Epoch: 7 Global Step: 128640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:23,394-Speed 9484.19 samples/sec Loss 6.4650 LearningRate 0.0378 Epoch: 7 Global Step: 128650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:24,472-Speed 9508.02 samples/sec Loss 6.6388 LearningRate 0.0378 Epoch: 7 Global Step: 128660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:25,583-Speed 9227.36 samples/sec Loss 6.5263 LearningRate 0.0378 Epoch: 7 Global Step: 128670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:26,656-Speed 9545.86 samples/sec Loss 6.5011 LearningRate 0.0378 Epoch: 7 Global Step: 128680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:27,743-Speed 9427.04 samples/sec Loss 6.5797 LearningRate 0.0378 Epoch: 7 Global Step: 128690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:28,869-Speed 9094.72 samples/sec Loss 6.4950 LearningRate 0.0378 Epoch: 7 Global Step: 128700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:29,942-Speed 9549.99 samples/sec Loss 6.6290 LearningRate 0.0378 Epoch: 7 Global Step: 128710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:30,979-Speed 9885.24 samples/sec Loss 6.5998 LearningRate 0.0377 Epoch: 7 Global Step: 128720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:32,101-Speed 9132.26 samples/sec Loss 6.5151 LearningRate 0.0377 Epoch: 7 Global Step: 128730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:33,267-Speed 8784.20 samples/sec Loss 6.5531 LearningRate 0.0377 Epoch: 7 Global Step: 128740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:34,358-Speed 9395.65 samples/sec Loss 6.5175 LearningRate 0.0377 Epoch: 7 Global Step: 128750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:35,467-Speed 9232.92 samples/sec Loss 6.5211 LearningRate 0.0377 Epoch: 7 Global Step: 128760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:36,548-Speed 9483.61 samples/sec Loss 6.4313 LearningRate 0.0377 Epoch: 7 Global Step: 128770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:37,638-Speed 9397.54 samples/sec Loss 6.5452 LearningRate 0.0377 Epoch: 7 Global Step: 128780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:38,700-Speed 9645.11 samples/sec Loss 6.5128 LearningRate 0.0377 Epoch: 7 Global Step: 128790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:39,766-Speed 9619.56 samples/sec Loss 6.5834 LearningRate 0.0377 Epoch: 7 Global Step: 128800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:40,829-Speed 9634.27 samples/sec Loss 6.5815 LearningRate 0.0377 Epoch: 7 Global Step: 128810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:41,945-Speed 9186.35 samples/sec Loss 6.5879 LearningRate 0.0377 Epoch: 7 Global Step: 128820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:43,021-Speed 9516.94 samples/sec Loss 6.4800 LearningRate 0.0377 Epoch: 7 Global Step: 128830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:44,082-Speed 9664.36 samples/sec Loss 6.4124 LearningRate 0.0377 Epoch: 7 Global Step: 128840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:45,168-Speed 9435.42 samples/sec Loss 6.5071 LearningRate 0.0377 Epoch: 7 Global Step: 128850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:46,299-Speed 9057.91 samples/sec Loss 6.5293 LearningRate 0.0377 Epoch: 7 Global Step: 128860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:47,388-Speed 9408.51 samples/sec Loss 6.5527 LearningRate 0.0377 Epoch: 7 Global Step: 128870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:48,462-Speed 9536.36 samples/sec Loss 6.5666 LearningRate 0.0377 Epoch: 7 Global Step: 128880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:49,545-Speed 9463.67 samples/sec Loss 6.5585 LearningRate 0.0377 Epoch: 7 Global Step: 128890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:50,618-Speed 9551.62 samples/sec Loss 6.6703 LearningRate 0.0377 Epoch: 7 Global Step: 128900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:43:51,671-Speed 9723.36 samples/sec Loss 6.6838 LearningRate 0.0377 Epoch: 7 Global Step: 128910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:52,747-Speed 9526.67 samples/sec Loss 6.5731 LearningRate 0.0377 Epoch: 7 Global Step: 128920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:53,841-Speed 9363.57 samples/sec Loss 6.5416 LearningRate 0.0377 Epoch: 7 Global Step: 128930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:54,948-Speed 9257.32 samples/sec Loss 6.5391 LearningRate 0.0377 Epoch: 7 Global Step: 128940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:56,012-Speed 9627.40 samples/sec Loss 6.4641 LearningRate 0.0377 Epoch: 7 Global Step: 128950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:57,098-Speed 9434.06 samples/sec Loss 6.5541 LearningRate 0.0377 Epoch: 7 Global Step: 128960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:58,187-Speed 9413.23 samples/sec Loss 6.4619 LearningRate 0.0377 Epoch: 7 Global Step: 128970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:43:59,270-Speed 9464.06 samples/sec Loss 6.6102 LearningRate 0.0377 Epoch: 7 Global Step: 128980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:00,356-Speed 9437.36 samples/sec Loss 6.4287 LearningRate 0.0376 Epoch: 7 Global Step: 128990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:01,454-Speed 9327.20 samples/sec Loss 6.5574 LearningRate 0.0376 Epoch: 7 Global Step: 129000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:02,580-Speed 9104.59 samples/sec Loss 6.5266 LearningRate 0.0376 Epoch: 7 Global Step: 129010 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:44:03,646-Speed 9612.94 samples/sec Loss 6.5666 LearningRate 0.0376 Epoch: 7 Global Step: 129020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:04,757-Speed 9215.11 samples/sec Loss 6.5505 LearningRate 0.0376 Epoch: 7 Global Step: 129030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:05,844-Speed 9425.69 samples/sec Loss 6.4687 LearningRate 0.0376 Epoch: 7 Global Step: 129040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:06,929-Speed 9448.27 samples/sec Loss 6.5865 LearningRate 0.0376 Epoch: 7 Global Step: 129050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:08,033-Speed 9274.12 samples/sec Loss 6.6293 LearningRate 0.0376 Epoch: 7 Global Step: 129060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:09,121-Speed 9422.69 samples/sec Loss 6.4515 LearningRate 0.0376 Epoch: 7 Global Step: 129070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:10,236-Speed 9187.50 samples/sec Loss 6.6672 LearningRate 0.0376 Epoch: 7 Global Step: 129080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:11,321-Speed 9444.93 samples/sec Loss 6.4253 LearningRate 0.0376 Epoch: 7 Global Step: 129090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:12,371-Speed 9756.65 samples/sec Loss 6.5175 LearningRate 0.0376 Epoch: 7 Global Step: 129100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:13,478-Speed 9250.38 samples/sec Loss 6.4987 LearningRate 0.0376 Epoch: 7 Global Step: 129110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:14,563-Speed 9447.46 samples/sec Loss 6.5476 LearningRate 0.0376 Epoch: 7 Global Step: 129120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:15,701-Speed 9003.20 samples/sec Loss 6.5652 LearningRate 0.0376 Epoch: 7 Global Step: 129130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:16,810-Speed 9240.63 samples/sec Loss 6.6157 LearningRate 0.0376 Epoch: 7 Global Step: 129140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:17,895-Speed 9444.74 samples/sec Loss 6.5842 LearningRate 0.0376 Epoch: 7 Global Step: 129150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:18,997-Speed 9297.07 samples/sec Loss 6.4870 LearningRate 0.0376 Epoch: 7 Global Step: 129160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:20,077-Speed 9488.06 samples/sec Loss 6.5237 LearningRate 0.0376 Epoch: 7 Global Step: 129170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:21,202-Speed 9105.99 samples/sec Loss 6.4587 LearningRate 0.0376 Epoch: 7 Global Step: 129180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:22,322-Speed 9147.00 samples/sec Loss 6.5385 LearningRate 0.0376 Epoch: 7 Global Step: 129190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:23,359-Speed 9880.27 samples/sec Loss 6.5510 LearningRate 0.0376 Epoch: 7 Global Step: 129200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:24,450-Speed 9396.96 samples/sec Loss 6.6234 LearningRate 0.0376 Epoch: 7 Global Step: 129210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:25,511-Speed 9656.38 samples/sec Loss 6.5953 LearningRate 0.0376 Epoch: 7 Global Step: 129220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:44:26,591-Speed 9490.91 samples/sec Loss 6.6117 LearningRate 0.0376 Epoch: 7 Global Step: 129230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:27,651-Speed 9660.98 samples/sec Loss 6.5196 LearningRate 0.0376 Epoch: 7 Global Step: 129240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:28,708-Speed 9694.83 samples/sec Loss 6.6541 LearningRate 0.0376 Epoch: 7 Global Step: 129250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:44:29,762-Speed 9717.35 samples/sec Loss 6.5744 LearningRate 0.0376 Epoch: 7 Global Step: 129260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:44:30,850-Speed 9418.31 samples/sec Loss 6.4496 LearningRate 0.0375 Epoch: 7 Global Step: 129270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:44:31,938-Speed 9424.19 samples/sec Loss 6.4989 LearningRate 0.0375 Epoch: 7 Global Step: 129280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:44:33,003-Speed 9616.77 samples/sec Loss 6.5423 LearningRate 0.0375 Epoch: 7 Global Step: 129290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:44:34,050-Speed 9788.66 samples/sec Loss 6.5584 LearningRate 0.0375 Epoch: 7 Global Step: 129300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:44:35,103-Speed 9729.26 samples/sec Loss 6.5238 LearningRate 0.0375 Epoch: 7 Global Step: 129310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:44:36,201-Speed 9336.18 samples/sec Loss 6.5210 LearningRate 0.0375 Epoch: 7 Global Step: 129320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:44:37,249-Speed 9768.93 samples/sec Loss 6.4542 LearningRate 0.0375 Epoch: 7 Global Step: 129330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:44:38,412-Speed 8809.53 samples/sec Loss 6.6745 LearningRate 0.0375 Epoch: 7 Global Step: 129340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:44:39,554-Speed 8977.03 samples/sec Loss 6.4720 LearningRate 0.0375 Epoch: 7 Global Step: 129350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:40,683-Speed 9073.86 samples/sec Loss 6.4770 LearningRate 0.0375 Epoch: 7 Global Step: 129360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:41,789-Speed 9262.60 samples/sec Loss 6.6085 LearningRate 0.0375 Epoch: 7 Global Step: 129370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:42,909-Speed 9145.38 samples/sec Loss 6.6198 LearningRate 0.0375 Epoch: 7 Global Step: 129380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:43,990-Speed 9480.82 samples/sec Loss 6.4653 LearningRate 0.0375 Epoch: 7 Global Step: 129390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:45,068-Speed 9507.32 samples/sec Loss 6.5410 LearningRate 0.0375 Epoch: 7 Global Step: 129400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:46,139-Speed 9562.95 samples/sec Loss 6.4703 LearningRate 0.0375 Epoch: 7 Global Step: 129410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:47,254-Speed 9190.61 samples/sec Loss 6.5365 LearningRate 0.0375 Epoch: 7 Global Step: 129420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:48,360-Speed 9261.75 samples/sec Loss 6.6024 LearningRate 0.0375 Epoch: 7 Global Step: 129430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:49,440-Speed 9488.90 samples/sec Loss 6.5589 LearningRate 0.0375 Epoch: 7 Global Step: 129440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:50,546-Speed 9267.79 samples/sec Loss 6.5413 LearningRate 0.0375 Epoch: 7 Global Step: 129450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:44:51,604-Speed 9681.31 samples/sec Loss 6.5910 LearningRate 0.0375 Epoch: 7 Global Step: 129460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:44:52,706-Speed 9291.73 samples/sec Loss 6.4964 LearningRate 0.0375 Epoch: 7 Global Step: 129470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:53,755-Speed 9767.52 samples/sec Loss 6.4864 LearningRate 0.0375 Epoch: 7 Global Step: 129480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:54,808-Speed 9734.28 samples/sec Loss 6.5541 LearningRate 0.0375 Epoch: 7 Global Step: 129490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:55,891-Speed 9465.89 samples/sec Loss 6.5502 LearningRate 0.0375 Epoch: 7 Global Step: 129500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:56,974-Speed 9457.42 samples/sec Loss 6.5527 LearningRate 0.0375 Epoch: 7 Global Step: 129510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:58,074-Speed 9315.40 samples/sec Loss 6.4597 LearningRate 0.0375 Epoch: 7 Global Step: 129520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:44:59,210-Speed 9021.73 samples/sec Loss 6.5180 LearningRate 0.0375 Epoch: 7 Global Step: 129530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:00,348-Speed 9006.47 samples/sec Loss 6.4811 LearningRate 0.0374 Epoch: 7 Global Step: 129540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:01,433-Speed 9440.31 samples/sec Loss 6.6235 LearningRate 0.0374 Epoch: 7 Global Step: 129550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:02,539-Speed 9268.79 samples/sec Loss 6.4931 LearningRate 0.0374 Epoch: 7 Global Step: 129560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:03,618-Speed 9491.19 samples/sec Loss 6.4625 LearningRate 0.0374 Epoch: 7 Global Step: 129570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:04,707-Speed 9412.17 samples/sec Loss 6.5964 LearningRate 0.0374 Epoch: 7 Global Step: 129580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:05,797-Speed 9394.62 samples/sec Loss 6.5430 LearningRate 0.0374 Epoch: 7 Global Step: 129590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:06,859-Speed 9649.44 samples/sec Loss 6.5654 LearningRate 0.0374 Epoch: 7 Global Step: 129600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:07,941-Speed 9465.34 samples/sec Loss 6.4472 LearningRate 0.0374 Epoch: 7 Global Step: 129610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:09,041-Speed 9315.62 samples/sec Loss 6.4495 LearningRate 0.0374 Epoch: 7 Global Step: 129620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:10,114-Speed 9554.25 samples/sec Loss 6.4493 LearningRate 0.0374 Epoch: 7 Global Step: 129630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:11,177-Speed 9636.57 samples/sec Loss 6.6304 LearningRate 0.0374 Epoch: 7 Global Step: 129640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:12,275-Speed 9325.72 samples/sec Loss 6.5525 LearningRate 0.0374 Epoch: 7 Global Step: 129650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:13,335-Speed 9671.14 samples/sec Loss 6.5364 LearningRate 0.0374 Epoch: 7 Global Step: 129660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:14,449-Speed 9199.38 samples/sec Loss 6.5325 LearningRate 0.0374 Epoch: 7 Global Step: 129670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:15,491-Speed 9836.59 samples/sec Loss 6.4775 LearningRate 0.0374 Epoch: 7 Global Step: 129680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:16,548-Speed 9689.20 samples/sec Loss 6.5381 LearningRate 0.0374 Epoch: 7 Global Step: 129690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:17,692-Speed 8956.33 samples/sec Loss 6.5779 LearningRate 0.0374 Epoch: 7 Global Step: 129700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:18,753-Speed 9661.91 samples/sec Loss 6.5257 LearningRate 0.0374 Epoch: 7 Global Step: 129710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:19,893-Speed 8984.14 samples/sec Loss 6.6120 LearningRate 0.0374 Epoch: 7 Global Step: 129720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:21,007-Speed 9201.84 samples/sec Loss 6.5802 LearningRate 0.0374 Epoch: 7 Global Step: 129730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:22,048-Speed 9834.61 samples/sec Loss 6.5113 LearningRate 0.0374 Epoch: 7 Global Step: 129740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:23,122-Speed 9547.29 samples/sec Loss 6.5482 LearningRate 0.0374 Epoch: 7 Global Step: 129750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:24,195-Speed 9546.25 samples/sec Loss 6.5119 LearningRate 0.0374 Epoch: 7 Global Step: 129760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:25,237-Speed 9833.36 samples/sec Loss 6.5500 LearningRate 0.0374 Epoch: 7 Global Step: 129770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:26,301-Speed 9633.78 samples/sec Loss 6.5401 LearningRate 0.0374 Epoch: 7 Global Step: 129780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:45:27,395-Speed 9366.64 samples/sec Loss 6.5446 LearningRate 0.0374 Epoch: 7 Global Step: 129790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:45:28,484-Speed 9406.81 samples/sec Loss 6.5663 LearningRate 0.0374 Epoch: 7 Global Step: 129800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:45:29,564-Speed 9481.00 samples/sec Loss 6.5282 LearningRate 0.0373 Epoch: 7 Global Step: 129810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:45:30,620-Speed 9705.96 samples/sec Loss 6.4871 LearningRate 0.0373 Epoch: 7 Global Step: 129820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:45:31,670-Speed 9766.01 samples/sec Loss 6.5009 LearningRate 0.0373 Epoch: 7 Global Step: 129830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:45:32,754-Speed 9451.90 samples/sec Loss 6.5005 LearningRate 0.0373 Epoch: 7 Global Step: 129840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:45:33,824-Speed 9576.80 samples/sec Loss 6.5481 LearningRate 0.0373 Epoch: 7 Global Step: 129850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:45:34,918-Speed 9365.22 samples/sec Loss 6.5100 LearningRate 0.0373 Epoch: 7 Global Step: 129860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:45:35,987-Speed 9585.54 samples/sec Loss 6.5543 LearningRate 0.0373 Epoch: 7 Global Step: 129870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:45:37,087-Speed 9311.83 samples/sec Loss 6.5313 LearningRate 0.0373 Epoch: 7 Global Step: 129880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:38,164-Speed 9516.31 samples/sec Loss 6.5771 LearningRate 0.0373 Epoch: 7 Global Step: 129890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:39,261-Speed 9337.65 samples/sec Loss 6.5560 LearningRate 0.0373 Epoch: 7 Global Step: 129900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:40,375-Speed 9200.48 samples/sec Loss 6.5505 LearningRate 0.0373 Epoch: 7 Global Step: 129910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:41,483-Speed 9247.74 samples/sec Loss 6.5249 LearningRate 0.0373 Epoch: 7 Global Step: 129920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:42,600-Speed 9171.12 samples/sec Loss 6.5019 LearningRate 0.0373 Epoch: 7 Global Step: 129930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:43,700-Speed 9316.46 samples/sec Loss 6.4648 LearningRate 0.0373 Epoch: 7 Global Step: 129940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:44,785-Speed 9445.20 samples/sec Loss 6.4929 LearningRate 0.0373 Epoch: 7 Global Step: 129950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:45,864-Speed 9498.57 samples/sec Loss 6.5669 LearningRate 0.0373 Epoch: 7 Global Step: 129960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:46,968-Speed 9277.73 samples/sec Loss 6.5193 LearningRate 0.0373 Epoch: 7 Global Step: 129970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:45:48,100-Speed 9055.75 samples/sec Loss 6.3604 LearningRate 0.0373 Epoch: 7 Global Step: 129980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:45:49,164-Speed 9629.13 samples/sec Loss 6.4980 LearningRate 0.0373 Epoch: 7 Global Step: 129990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:45:50,236-Speed 9552.75 samples/sec Loss 6.4821 LearningRate 0.0373 Epoch: 7 Global Step: 130000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:46:12,255-[lfw][130000]XNorm: 10.771086 Training: 2022-04-11 16:46:12,255-[lfw][130000]Accuracy-Flip: 0.99567+-0.00249 Training: 2022-04-11 16:46:12,256-[lfw][130000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:46:37,666-[cfp_fp][130000]XNorm: 9.138050 Training: 2022-04-11 16:46:37,667-[cfp_fp][130000]Accuracy-Flip: 0.95757+-0.01034 Training: 2022-04-11 16:46:37,667-[cfp_fp][130000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:46:59,637-[agedb_30][130000]XNorm: 10.391237 Training: 2022-04-11 16:46:59,638-[agedb_30][130000]Accuracy-Flip: 0.96650+-0.01045 Training: 2022-04-11 16:46:59,638-[agedb_30][130000]Accuracy-Highest: 0.96650 Training: 2022-04-11 16:47:00,716-Speed 145.29 samples/sec Loss 6.4578 LearningRate 0.0373 Epoch: 7 Global Step: 130010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:01,794-Speed 9511.42 samples/sec Loss 6.4471 LearningRate 0.0373 Epoch: 7 Global Step: 130020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:02,883-Speed 9409.16 samples/sec Loss 6.5216 LearningRate 0.0373 Epoch: 7 Global Step: 130030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:03,964-Speed 9473.23 samples/sec Loss 6.5258 LearningRate 0.0373 Epoch: 7 Global Step: 130040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:05,035-Speed 9566.57 samples/sec Loss 6.4306 LearningRate 0.0373 Epoch: 7 Global Step: 130050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:06,112-Speed 9513.85 samples/sec Loss 6.5677 LearningRate 0.0373 Epoch: 7 Global Step: 130060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:07,179-Speed 9605.22 samples/sec Loss 6.6483 LearningRate 0.0373 Epoch: 7 Global Step: 130070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:08,270-Speed 9386.64 samples/sec Loss 6.5282 LearningRate 0.0373 Epoch: 7 Global Step: 130080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:09,370-Speed 9319.66 samples/sec Loss 6.5044 LearningRate 0.0372 Epoch: 7 Global Step: 130090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:10,461-Speed 9388.30 samples/sec Loss 6.3950 LearningRate 0.0372 Epoch: 7 Global Step: 130100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:11,536-Speed 9525.76 samples/sec Loss 6.5637 LearningRate 0.0372 Epoch: 7 Global Step: 130110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:12,613-Speed 9519.76 samples/sec Loss 6.4238 LearningRate 0.0372 Epoch: 7 Global Step: 130120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:13,694-Speed 9479.31 samples/sec Loss 6.5249 LearningRate 0.0372 Epoch: 7 Global Step: 130130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:14,789-Speed 9354.86 samples/sec Loss 6.4981 LearningRate 0.0372 Epoch: 7 Global Step: 130140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:15,901-Speed 9213.98 samples/sec Loss 6.4364 LearningRate 0.0372 Epoch: 7 Global Step: 130150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:16,982-Speed 9482.91 samples/sec Loss 6.5644 LearningRate 0.0372 Epoch: 7 Global Step: 130160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:18,021-Speed 9861.98 samples/sec Loss 6.4935 LearningRate 0.0372 Epoch: 7 Global Step: 130170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:19,122-Speed 9312.08 samples/sec Loss 6.4528 LearningRate 0.0372 Epoch: 7 Global Step: 130180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:20,185-Speed 9638.19 samples/sec Loss 6.4757 LearningRate 0.0372 Epoch: 7 Global Step: 130190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:21,265-Speed 9484.62 samples/sec Loss 6.5143 LearningRate 0.0372 Epoch: 7 Global Step: 130200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:22,356-Speed 9391.92 samples/sec Loss 6.4858 LearningRate 0.0372 Epoch: 7 Global Step: 130210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:23,459-Speed 9282.47 samples/sec Loss 6.4837 LearningRate 0.0372 Epoch: 7 Global Step: 130220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:24,531-Speed 9562.89 samples/sec Loss 6.5107 LearningRate 0.0372 Epoch: 7 Global Step: 130230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:25,580-Speed 9767.35 samples/sec Loss 6.5438 LearningRate 0.0372 Epoch: 7 Global Step: 130240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:26,639-Speed 9670.76 samples/sec Loss 6.4934 LearningRate 0.0372 Epoch: 7 Global Step: 130250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:27,738-Speed 9327.26 samples/sec Loss 6.4897 LearningRate 0.0372 Epoch: 7 Global Step: 130260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:28,845-Speed 9253.66 samples/sec Loss 6.4632 LearningRate 0.0372 Epoch: 7 Global Step: 130270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:29,947-Speed 9297.53 samples/sec Loss 6.5203 LearningRate 0.0372 Epoch: 7 Global Step: 130280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:31,033-Speed 9432.42 samples/sec Loss 6.5245 LearningRate 0.0372 Epoch: 7 Global Step: 130290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:32,166-Speed 9042.62 samples/sec Loss 6.4299 LearningRate 0.0372 Epoch: 7 Global Step: 130300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:33,267-Speed 9304.57 samples/sec Loss 6.5645 LearningRate 0.0372 Epoch: 7 Global Step: 130310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:34,392-Speed 9106.92 samples/sec Loss 6.4744 LearningRate 0.0372 Epoch: 7 Global Step: 130320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:35,497-Speed 9280.19 samples/sec Loss 6.5019 LearningRate 0.0372 Epoch: 7 Global Step: 130330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:36,596-Speed 9317.97 samples/sec Loss 6.5588 LearningRate 0.0372 Epoch: 7 Global Step: 130340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:37,715-Speed 9160.71 samples/sec Loss 6.4478 LearningRate 0.0372 Epoch: 7 Global Step: 130350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:38,774-Speed 9676.37 samples/sec Loss 6.4816 LearningRate 0.0371 Epoch: 7 Global Step: 130360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:39,845-Speed 9559.54 samples/sec Loss 6.4750 LearningRate 0.0371 Epoch: 7 Global Step: 130370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:40,988-Speed 8963.24 samples/sec Loss 6.4985 LearningRate 0.0371 Epoch: 7 Global Step: 130380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:42,094-Speed 9265.34 samples/sec Loss 6.4082 LearningRate 0.0371 Epoch: 7 Global Step: 130390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:43,176-Speed 9469.13 samples/sec Loss 6.4311 LearningRate 0.0371 Epoch: 7 Global Step: 130400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:44,226-Speed 9762.50 samples/sec Loss 6.4662 LearningRate 0.0371 Epoch: 7 Global Step: 130410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:45,293-Speed 9601.53 samples/sec Loss 6.5452 LearningRate 0.0371 Epoch: 7 Global Step: 130420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:46,369-Speed 9519.00 samples/sec Loss 6.4445 LearningRate 0.0371 Epoch: 7 Global Step: 130430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:47,445-Speed 9527.50 samples/sec Loss 6.4566 LearningRate 0.0371 Epoch: 7 Global Step: 130440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:48,524-Speed 9499.55 samples/sec Loss 6.5420 LearningRate 0.0371 Epoch: 7 Global Step: 130450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:49,581-Speed 9695.51 samples/sec Loss 6.6589 LearningRate 0.0371 Epoch: 7 Global Step: 130460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:50,665-Speed 9451.63 samples/sec Loss 6.5395 LearningRate 0.0371 Epoch: 7 Global Step: 130470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:47:51,794-Speed 9072.48 samples/sec Loss 6.5318 LearningRate 0.0371 Epoch: 7 Global Step: 130480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:52,907-Speed 9207.72 samples/sec Loss 6.6530 LearningRate 0.0371 Epoch: 7 Global Step: 130490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:53,980-Speed 9544.27 samples/sec Loss 6.5199 LearningRate 0.0371 Epoch: 7 Global Step: 130500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:55,059-Speed 9500.44 samples/sec Loss 6.4442 LearningRate 0.0371 Epoch: 7 Global Step: 130510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:56,172-Speed 9206.98 samples/sec Loss 6.4187 LearningRate 0.0371 Epoch: 7 Global Step: 130520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:57,289-Speed 9168.95 samples/sec Loss 6.5594 LearningRate 0.0371 Epoch: 7 Global Step: 130530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:58,380-Speed 9389.62 samples/sec Loss 6.5222 LearningRate 0.0371 Epoch: 7 Global Step: 130540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:47:59,472-Speed 9382.42 samples/sec Loss 6.4827 LearningRate 0.0371 Epoch: 7 Global Step: 130550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:00,554-Speed 9477.26 samples/sec Loss 6.6069 LearningRate 0.0371 Epoch: 7 Global Step: 130560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:01,619-Speed 9613.64 samples/sec Loss 6.5832 LearningRate 0.0371 Epoch: 7 Global Step: 130570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:02,724-Speed 9275.60 samples/sec Loss 6.5849 LearningRate 0.0371 Epoch: 7 Global Step: 130580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:03,807-Speed 9461.17 samples/sec Loss 6.5683 LearningRate 0.0371 Epoch: 7 Global Step: 130590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:04,890-Speed 9454.86 samples/sec Loss 6.5088 LearningRate 0.0371 Epoch: 7 Global Step: 130600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:05,994-Speed 9282.31 samples/sec Loss 6.5544 LearningRate 0.0371 Epoch: 7 Global Step: 130610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:07,091-Speed 9338.78 samples/sec Loss 6.4839 LearningRate 0.0371 Epoch: 7 Global Step: 130620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:08,189-Speed 9335.56 samples/sec Loss 6.5887 LearningRate 0.0370 Epoch: 7 Global Step: 130630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:09,230-Speed 9838.32 samples/sec Loss 6.5134 LearningRate 0.0370 Epoch: 7 Global Step: 130640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:10,349-Speed 9157.56 samples/sec Loss 6.4616 LearningRate 0.0370 Epoch: 7 Global Step: 130650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:11,480-Speed 9059.83 samples/sec Loss 6.4673 LearningRate 0.0370 Epoch: 7 Global Step: 130660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:12,605-Speed 9107.91 samples/sec Loss 6.5015 LearningRate 0.0370 Epoch: 7 Global Step: 130670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:13,654-Speed 9774.85 samples/sec Loss 6.4736 LearningRate 0.0370 Epoch: 7 Global Step: 130680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:14,704-Speed 9754.84 samples/sec Loss 6.4796 LearningRate 0.0370 Epoch: 7 Global Step: 130690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:15,739-Speed 9900.60 samples/sec Loss 6.6195 LearningRate 0.0370 Epoch: 7 Global Step: 130700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:16,804-Speed 9617.17 samples/sec Loss 6.5168 LearningRate 0.0370 Epoch: 7 Global Step: 130710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:17,913-Speed 9241.23 samples/sec Loss 6.4463 LearningRate 0.0370 Epoch: 7 Global Step: 130720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:19,015-Speed 9294.20 samples/sec Loss 6.5025 LearningRate 0.0370 Epoch: 7 Global Step: 130730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:20,105-Speed 9402.00 samples/sec Loss 6.4439 LearningRate 0.0370 Epoch: 7 Global Step: 130740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:21,154-Speed 9765.58 samples/sec Loss 6.5227 LearningRate 0.0370 Epoch: 7 Global Step: 130750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:22,210-Speed 9703.78 samples/sec Loss 6.5925 LearningRate 0.0370 Epoch: 7 Global Step: 130760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:23,301-Speed 9389.52 samples/sec Loss 6.4778 LearningRate 0.0370 Epoch: 7 Global Step: 130770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:24,424-Speed 9128.36 samples/sec Loss 6.5338 LearningRate 0.0370 Epoch: 7 Global Step: 130780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:25,502-Speed 9504.31 samples/sec Loss 6.5401 LearningRate 0.0370 Epoch: 7 Global Step: 130790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:48:26,566-Speed 9629.58 samples/sec Loss 6.4924 LearningRate 0.0370 Epoch: 7 Global Step: 130800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:27,640-Speed 9535.59 samples/sec Loss 6.4834 LearningRate 0.0370 Epoch: 7 Global Step: 130810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:28,743-Speed 9289.91 samples/sec Loss 6.5086 LearningRate 0.0370 Epoch: 7 Global Step: 130820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:29,805-Speed 9645.39 samples/sec Loss 6.4952 LearningRate 0.0370 Epoch: 7 Global Step: 130830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:30,920-Speed 9188.81 samples/sec Loss 6.5412 LearningRate 0.0370 Epoch: 7 Global Step: 130840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:32,026-Speed 9268.89 samples/sec Loss 6.5825 LearningRate 0.0370 Epoch: 7 Global Step: 130850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:33,085-Speed 9677.69 samples/sec Loss 6.4214 LearningRate 0.0370 Epoch: 7 Global Step: 130860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:34,146-Speed 9657.65 samples/sec Loss 6.4045 LearningRate 0.0370 Epoch: 7 Global Step: 130870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:35,234-Speed 9418.15 samples/sec Loss 6.4970 LearningRate 0.0370 Epoch: 7 Global Step: 130880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:36,321-Speed 9422.84 samples/sec Loss 6.4713 LearningRate 0.0370 Epoch: 7 Global Step: 130890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:37,442-Speed 9136.29 samples/sec Loss 6.5511 LearningRate 0.0370 Epoch: 7 Global Step: 130900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:48:38,542-Speed 9321.75 samples/sec Loss 6.3697 LearningRate 0.0369 Epoch: 7 Global Step: 130910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:48:39,608-Speed 9606.72 samples/sec Loss 6.4592 LearningRate 0.0369 Epoch: 7 Global Step: 130920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:40,734-Speed 9096.01 samples/sec Loss 6.5055 LearningRate 0.0369 Epoch: 7 Global Step: 130930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:41,802-Speed 9597.07 samples/sec Loss 6.5201 LearningRate 0.0369 Epoch: 7 Global Step: 130940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:42,907-Speed 9274.46 samples/sec Loss 6.6090 LearningRate 0.0369 Epoch: 7 Global Step: 130950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:44,015-Speed 9247.51 samples/sec Loss 6.5592 LearningRate 0.0369 Epoch: 7 Global Step: 130960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:45,055-Speed 9853.97 samples/sec Loss 6.4878 LearningRate 0.0369 Epoch: 7 Global Step: 130970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:46,152-Speed 9340.27 samples/sec Loss 6.5064 LearningRate 0.0369 Epoch: 7 Global Step: 130980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:47,264-Speed 9215.05 samples/sec Loss 6.4502 LearningRate 0.0369 Epoch: 7 Global Step: 130990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:48,379-Speed 9183.75 samples/sec Loss 6.5819 LearningRate 0.0369 Epoch: 7 Global Step: 131000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:49,478-Speed 9331.54 samples/sec Loss 6.4379 LearningRate 0.0369 Epoch: 7 Global Step: 131010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:50,551-Speed 9551.63 samples/sec Loss 6.5051 LearningRate 0.0369 Epoch: 7 Global Step: 131020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:51,582-Speed 9931.55 samples/sec Loss 6.4379 LearningRate 0.0369 Epoch: 7 Global Step: 131030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:52,686-Speed 9287.05 samples/sec Loss 6.4670 LearningRate 0.0369 Epoch: 7 Global Step: 131040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:53,848-Speed 8810.10 samples/sec Loss 6.5883 LearningRate 0.0369 Epoch: 7 Global Step: 131050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:54,957-Speed 9245.18 samples/sec Loss 6.5650 LearningRate 0.0369 Epoch: 7 Global Step: 131060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:56,028-Speed 9566.15 samples/sec Loss 6.4676 LearningRate 0.0369 Epoch: 7 Global Step: 131070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:57,117-Speed 9404.42 samples/sec Loss 6.5272 LearningRate 0.0369 Epoch: 7 Global Step: 131080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:58,204-Speed 9421.67 samples/sec Loss 6.4633 LearningRate 0.0369 Epoch: 7 Global Step: 131090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:48:59,223-Speed 10055.15 samples/sec Loss 6.4341 LearningRate 0.0369 Epoch: 7 Global Step: 131100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:00,295-Speed 9557.59 samples/sec Loss 6.4785 LearningRate 0.0369 Epoch: 7 Global Step: 131110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:01,391-Speed 9355.21 samples/sec Loss 6.5595 LearningRate 0.0369 Epoch: 7 Global Step: 131120 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:02,493-Speed 9299.30 samples/sec Loss 6.4921 LearningRate 0.0369 Epoch: 7 Global Step: 131130 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:03,621-Speed 9083.51 samples/sec Loss 6.4707 LearningRate 0.0369 Epoch: 7 Global Step: 131140 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:04,722-Speed 9301.59 samples/sec Loss 6.4000 LearningRate 0.0369 Epoch: 7 Global Step: 131150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:05,796-Speed 9544.98 samples/sec Loss 6.4855 LearningRate 0.0369 Epoch: 7 Global Step: 131160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:06,861-Speed 9619.35 samples/sec Loss 6.4144 LearningRate 0.0369 Epoch: 7 Global Step: 131170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:07,934-Speed 9544.35 samples/sec Loss 6.5535 LearningRate 0.0368 Epoch: 7 Global Step: 131180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:09,041-Speed 9260.40 samples/sec Loss 6.4473 LearningRate 0.0368 Epoch: 7 Global Step: 131190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:10,103-Speed 9646.52 samples/sec Loss 6.5743 LearningRate 0.0368 Epoch: 7 Global Step: 131200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:11,149-Speed 9795.51 samples/sec Loss 6.5284 LearningRate 0.0368 Epoch: 7 Global Step: 131210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:12,232-Speed 9469.77 samples/sec Loss 6.3428 LearningRate 0.0368 Epoch: 7 Global Step: 131220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:13,312-Speed 9482.60 samples/sec Loss 6.4942 LearningRate 0.0368 Epoch: 7 Global Step: 131230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:14,403-Speed 9391.50 samples/sec Loss 6.4951 LearningRate 0.0368 Epoch: 7 Global Step: 131240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:15,474-Speed 9562.70 samples/sec Loss 6.4550 LearningRate 0.0368 Epoch: 7 Global Step: 131250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:16,599-Speed 9110.01 samples/sec Loss 6.6039 LearningRate 0.0368 Epoch: 7 Global Step: 131260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:17,676-Speed 9514.88 samples/sec Loss 6.4404 LearningRate 0.0368 Epoch: 7 Global Step: 131270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:18,715-Speed 9868.16 samples/sec Loss 6.6926 LearningRate 0.0368 Epoch: 7 Global Step: 131280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:19,778-Speed 9633.56 samples/sec Loss 6.5009 LearningRate 0.0368 Epoch: 7 Global Step: 131290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:20,814-Speed 9891.63 samples/sec Loss 6.4813 LearningRate 0.0368 Epoch: 7 Global Step: 131300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:21,879-Speed 9617.71 samples/sec Loss 6.4661 LearningRate 0.0368 Epoch: 7 Global Step: 131310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:22,960-Speed 9479.92 samples/sec Loss 6.6163 LearningRate 0.0368 Epoch: 7 Global Step: 131320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:24,067-Speed 9256.52 samples/sec Loss 6.4557 LearningRate 0.0368 Epoch: 7 Global Step: 131330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:25,128-Speed 9652.85 samples/sec Loss 6.5170 LearningRate 0.0368 Epoch: 7 Global Step: 131340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:26,190-Speed 9648.53 samples/sec Loss 6.4693 LearningRate 0.0368 Epoch: 7 Global Step: 131350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:27,260-Speed 9581.97 samples/sec Loss 6.4953 LearningRate 0.0368 Epoch: 7 Global Step: 131360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:28,309-Speed 9764.01 samples/sec Loss 6.5607 LearningRate 0.0368 Epoch: 7 Global Step: 131370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:29,378-Speed 9586.58 samples/sec Loss 6.4913 LearningRate 0.0368 Epoch: 7 Global Step: 131380 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:30,481-Speed 9285.44 samples/sec Loss 6.4274 LearningRate 0.0368 Epoch: 7 Global Step: 131390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:31,565-Speed 9458.33 samples/sec Loss 6.4500 LearningRate 0.0368 Epoch: 7 Global Step: 131400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:32,620-Speed 9711.05 samples/sec Loss 6.5706 LearningRate 0.0368 Epoch: 7 Global Step: 131410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:33,683-Speed 9637.56 samples/sec Loss 6.4645 LearningRate 0.0368 Epoch: 7 Global Step: 131420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:34,758-Speed 9532.75 samples/sec Loss 6.5271 LearningRate 0.0368 Epoch: 7 Global Step: 131430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:35,801-Speed 9823.88 samples/sec Loss 6.4503 LearningRate 0.0368 Epoch: 7 Global Step: 131440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:36,877-Speed 9521.90 samples/sec Loss 6.5414 LearningRate 0.0368 Epoch: 7 Global Step: 131450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:37,917-Speed 9857.68 samples/sec Loss 6.5349 LearningRate 0.0367 Epoch: 7 Global Step: 131460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:38,985-Speed 9594.33 samples/sec Loss 6.4698 LearningRate 0.0367 Epoch: 7 Global Step: 131470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:40,050-Speed 9616.10 samples/sec Loss 6.5620 LearningRate 0.0367 Epoch: 7 Global Step: 131480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:41,122-Speed 9557.25 samples/sec Loss 6.4711 LearningRate 0.0367 Epoch: 7 Global Step: 131490 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:42,198-Speed 9526.63 samples/sec Loss 6.4723 LearningRate 0.0367 Epoch: 7 Global Step: 131500 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:43,267-Speed 9582.77 samples/sec Loss 6.6408 LearningRate 0.0367 Epoch: 7 Global Step: 131510 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:44,340-Speed 9547.50 samples/sec Loss 6.5260 LearningRate 0.0367 Epoch: 7 Global Step: 131520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:45,448-Speed 9248.35 samples/sec Loss 6.4569 LearningRate 0.0367 Epoch: 7 Global Step: 131530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:46,512-Speed 9629.54 samples/sec Loss 6.5644 LearningRate 0.0367 Epoch: 7 Global Step: 131540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:47,607-Speed 9358.76 samples/sec Loss 6.5073 LearningRate 0.0367 Epoch: 7 Global Step: 131550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:48,648-Speed 9850.27 samples/sec Loss 6.6139 LearningRate 0.0367 Epoch: 7 Global Step: 131560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:49,696-Speed 9776.40 samples/sec Loss 6.5525 LearningRate 0.0367 Epoch: 7 Global Step: 131570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:50,779-Speed 9459.31 samples/sec Loss 6.6343 LearningRate 0.0367 Epoch: 7 Global Step: 131580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:51,894-Speed 9183.97 samples/sec Loss 6.5965 LearningRate 0.0367 Epoch: 7 Global Step: 131590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:53,007-Speed 9209.12 samples/sec Loss 6.4692 LearningRate 0.0367 Epoch: 7 Global Step: 131600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:54,044-Speed 9879.97 samples/sec Loss 6.5980 LearningRate 0.0367 Epoch: 7 Global Step: 131610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:55,101-Speed 9689.97 samples/sec Loss 6.4929 LearningRate 0.0367 Epoch: 7 Global Step: 131620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:49:56,165-Speed 9635.19 samples/sec Loss 6.5745 LearningRate 0.0367 Epoch: 7 Global Step: 131630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:57,269-Speed 9276.90 samples/sec Loss 6.5518 LearningRate 0.0367 Epoch: 7 Global Step: 131640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:58,398-Speed 9077.80 samples/sec Loss 6.4364 LearningRate 0.0367 Epoch: 7 Global Step: 131650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:49:59,469-Speed 9568.58 samples/sec Loss 6.4299 LearningRate 0.0367 Epoch: 7 Global Step: 131660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:00,554-Speed 9435.74 samples/sec Loss 6.6481 LearningRate 0.0367 Epoch: 7 Global Step: 131670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:01,644-Speed 9411.01 samples/sec Loss 6.5415 LearningRate 0.0367 Epoch: 7 Global Step: 131680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:02,727-Speed 9456.16 samples/sec Loss 6.5175 LearningRate 0.0367 Epoch: 7 Global Step: 131690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:03,829-Speed 9294.14 samples/sec Loss 6.5379 LearningRate 0.0367 Epoch: 7 Global Step: 131700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:04,896-Speed 9611.54 samples/sec Loss 6.4916 LearningRate 0.0367 Epoch: 7 Global Step: 131710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:05,970-Speed 9534.54 samples/sec Loss 6.4562 LearningRate 0.0367 Epoch: 7 Global Step: 131720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:07,045-Speed 9542.08 samples/sec Loss 6.4687 LearningRate 0.0366 Epoch: 7 Global Step: 131730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:08,149-Speed 9288.22 samples/sec Loss 6.5682 LearningRate 0.0366 Epoch: 7 Global Step: 131740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:09,261-Speed 9217.22 samples/sec Loss 6.4926 LearningRate 0.0366 Epoch: 7 Global Step: 131750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:10,312-Speed 9746.47 samples/sec Loss 6.5471 LearningRate 0.0366 Epoch: 7 Global Step: 131760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:11,424-Speed 9215.98 samples/sec Loss 6.5516 LearningRate 0.0366 Epoch: 7 Global Step: 131770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:12,533-Speed 9239.40 samples/sec Loss 6.4935 LearningRate 0.0366 Epoch: 7 Global Step: 131780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:13,621-Speed 9415.94 samples/sec Loss 6.4157 LearningRate 0.0366 Epoch: 7 Global Step: 131790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:14,696-Speed 9532.47 samples/sec Loss 6.5149 LearningRate 0.0366 Epoch: 7 Global Step: 131800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:15,806-Speed 9231.24 samples/sec Loss 6.5134 LearningRate 0.0366 Epoch: 7 Global Step: 131810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:16,900-Speed 9369.06 samples/sec Loss 6.3897 LearningRate 0.0366 Epoch: 7 Global Step: 131820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:17,981-Speed 9477.06 samples/sec Loss 6.6106 LearningRate 0.0366 Epoch: 7 Global Step: 131830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:50:19,041-Speed 9664.75 samples/sec Loss 6.4825 LearningRate 0.0366 Epoch: 7 Global Step: 131840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:20,195-Speed 8881.95 samples/sec Loss 6.4438 LearningRate 0.0366 Epoch: 7 Global Step: 131850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:21,327-Speed 9055.00 samples/sec Loss 6.5283 LearningRate 0.0366 Epoch: 7 Global Step: 131860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:22,428-Speed 9307.29 samples/sec Loss 6.4369 LearningRate 0.0366 Epoch: 7 Global Step: 131870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:23,540-Speed 9209.10 samples/sec Loss 6.4267 LearningRate 0.0366 Epoch: 7 Global Step: 131880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:24,625-Speed 9451.20 samples/sec Loss 6.4510 LearningRate 0.0366 Epoch: 7 Global Step: 131890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:25,696-Speed 9561.31 samples/sec Loss 6.5304 LearningRate 0.0366 Epoch: 7 Global Step: 131900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:26,767-Speed 9571.09 samples/sec Loss 6.4148 LearningRate 0.0366 Epoch: 7 Global Step: 131910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:27,844-Speed 9515.63 samples/sec Loss 6.4914 LearningRate 0.0366 Epoch: 7 Global Step: 131920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:28,975-Speed 9058.13 samples/sec Loss 6.5550 LearningRate 0.0366 Epoch: 7 Global Step: 131930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:30,064-Speed 9404.52 samples/sec Loss 6.6100 LearningRate 0.0366 Epoch: 7 Global Step: 131940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:50:31,158-Speed 9371.46 samples/sec Loss 6.6019 LearningRate 0.0366 Epoch: 7 Global Step: 131950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:50:32,237-Speed 9490.22 samples/sec Loss 6.5015 LearningRate 0.0366 Epoch: 7 Global Step: 131960 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:50:33,286-Speed 9767.83 samples/sec Loss 6.4916 LearningRate 0.0366 Epoch: 7 Global Step: 131970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:34,363-Speed 9516.00 samples/sec Loss 6.4740 LearningRate 0.0366 Epoch: 7 Global Step: 131980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:35,422-Speed 9675.93 samples/sec Loss 6.4956 LearningRate 0.0366 Epoch: 7 Global Step: 131990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:36,507-Speed 9439.91 samples/sec Loss 6.6133 LearningRate 0.0366 Epoch: 7 Global Step: 132000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:50:58,523-[lfw][132000]XNorm: 10.575938 Training: 2022-04-11 16:50:58,524-[lfw][132000]Accuracy-Flip: 0.99583+-0.00300 Training: 2022-04-11 16:50:58,525-[lfw][132000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:51:23,976-[cfp_fp][132000]XNorm: 8.998566 Training: 2022-04-11 16:51:23,977-[cfp_fp][132000]Accuracy-Flip: 0.95543+-0.00941 Training: 2022-04-11 16:51:23,978-[cfp_fp][132000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:51:45,924-[agedb_30][132000]XNorm: 10.248483 Training: 2022-04-11 16:51:45,925-[agedb_30][132000]Accuracy-Flip: 0.96183+-0.00831 Training: 2022-04-11 16:51:45,925-[agedb_30][132000]Accuracy-Highest: 0.96650 Training: 2022-04-11 16:51:47,028-Speed 145.21 samples/sec Loss 6.4539 LearningRate 0.0365 Epoch: 7 Global Step: 132010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:48,068-Speed 9856.05 samples/sec Loss 6.5066 LearningRate 0.0365 Epoch: 7 Global Step: 132020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:49,158-Speed 9398.07 samples/sec Loss 6.5052 LearningRate 0.0365 Epoch: 7 Global Step: 132030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:50,219-Speed 9655.30 samples/sec Loss 6.4070 LearningRate 0.0365 Epoch: 7 Global Step: 132040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:51,253-Speed 9905.71 samples/sec Loss 6.6050 LearningRate 0.0365 Epoch: 7 Global Step: 132050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:52,353-Speed 9321.09 samples/sec Loss 6.6488 LearningRate 0.0365 Epoch: 7 Global Step: 132060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:53,439-Speed 9434.64 samples/sec Loss 6.4935 LearningRate 0.0365 Epoch: 7 Global Step: 132070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:54,528-Speed 9408.14 samples/sec Loss 6.5583 LearningRate 0.0365 Epoch: 7 Global Step: 132080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:55,586-Speed 9680.82 samples/sec Loss 6.5410 LearningRate 0.0365 Epoch: 7 Global Step: 132090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:56,657-Speed 9569.55 samples/sec Loss 6.5651 LearningRate 0.0365 Epoch: 7 Global Step: 132100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:57,765-Speed 9243.70 samples/sec Loss 6.5613 LearningRate 0.0365 Epoch: 7 Global Step: 132110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:58,868-Speed 9288.41 samples/sec Loss 6.4462 LearningRate 0.0365 Epoch: 7 Global Step: 132120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:51:59,935-Speed 9604.00 samples/sec Loss 6.5425 LearningRate 0.0365 Epoch: 7 Global Step: 132130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:00,993-Speed 9684.65 samples/sec Loss 6.4869 LearningRate 0.0365 Epoch: 7 Global Step: 132140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:02,040-Speed 9786.65 samples/sec Loss 6.4499 LearningRate 0.0365 Epoch: 7 Global Step: 132150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:03,139-Speed 9321.52 samples/sec Loss 6.4806 LearningRate 0.0365 Epoch: 7 Global Step: 132160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:04,198-Speed 9676.71 samples/sec Loss 6.5735 LearningRate 0.0365 Epoch: 7 Global Step: 132170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:05,255-Speed 9695.30 samples/sec Loss 6.4820 LearningRate 0.0365 Epoch: 7 Global Step: 132180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:06,354-Speed 9321.79 samples/sec Loss 6.4804 LearningRate 0.0365 Epoch: 7 Global Step: 132190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:07,417-Speed 9641.21 samples/sec Loss 6.5069 LearningRate 0.0365 Epoch: 7 Global Step: 132200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:08,486-Speed 9577.75 samples/sec Loss 6.4848 LearningRate 0.0365 Epoch: 7 Global Step: 132210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:09,549-Speed 9642.26 samples/sec Loss 6.4286 LearningRate 0.0365 Epoch: 7 Global Step: 132220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:10,601-Speed 9740.82 samples/sec Loss 6.3883 LearningRate 0.0365 Epoch: 7 Global Step: 132230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:11,700-Speed 9322.61 samples/sec Loss 6.4032 LearningRate 0.0365 Epoch: 7 Global Step: 132240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:12,800-Speed 9316.20 samples/sec Loss 6.4609 LearningRate 0.0365 Epoch: 7 Global Step: 132250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:13,920-Speed 9152.60 samples/sec Loss 6.5269 LearningRate 0.0365 Epoch: 7 Global Step: 132260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:14,998-Speed 9504.64 samples/sec Loss 6.4495 LearningRate 0.0365 Epoch: 7 Global Step: 132270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:16,104-Speed 9263.68 samples/sec Loss 6.4550 LearningRate 0.0365 Epoch: 7 Global Step: 132280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:17,246-Speed 8972.62 samples/sec Loss 6.4731 LearningRate 0.0364 Epoch: 7 Global Step: 132290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:18,299-Speed 9724.94 samples/sec Loss 6.5788 LearningRate 0.0364 Epoch: 7 Global Step: 132300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:19,389-Speed 9403.26 samples/sec Loss 6.5154 LearningRate 0.0364 Epoch: 7 Global Step: 132310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:20,497-Speed 9239.18 samples/sec Loss 6.5118 LearningRate 0.0364 Epoch: 7 Global Step: 132320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:21,600-Speed 9293.00 samples/sec Loss 6.4667 LearningRate 0.0364 Epoch: 7 Global Step: 132330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:22,651-Speed 9748.00 samples/sec Loss 6.5810 LearningRate 0.0364 Epoch: 7 Global Step: 132340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:23,703-Speed 9747.61 samples/sec Loss 6.4817 LearningRate 0.0364 Epoch: 7 Global Step: 132350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:24,757-Speed 9716.03 samples/sec Loss 6.6032 LearningRate 0.0364 Epoch: 7 Global Step: 132360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:25,880-Speed 9127.36 samples/sec Loss 6.4676 LearningRate 0.0364 Epoch: 7 Global Step: 132370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:27,016-Speed 9018.84 samples/sec Loss 6.4885 LearningRate 0.0364 Epoch: 7 Global Step: 132380 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:52:28,059-Speed 9820.75 samples/sec Loss 6.3501 LearningRate 0.0364 Epoch: 7 Global Step: 132390 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:52:29,156-Speed 9336.04 samples/sec Loss 6.6254 LearningRate 0.0364 Epoch: 7 Global Step: 132400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:30,247-Speed 9393.83 samples/sec Loss 6.4425 LearningRate 0.0364 Epoch: 7 Global Step: 132410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:31,340-Speed 9373.89 samples/sec Loss 6.4658 LearningRate 0.0364 Epoch: 7 Global Step: 132420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:32,415-Speed 9536.92 samples/sec Loss 6.4809 LearningRate 0.0364 Epoch: 7 Global Step: 132430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:33,496-Speed 9482.08 samples/sec Loss 6.5527 LearningRate 0.0364 Epoch: 7 Global Step: 132440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:34,577-Speed 9471.66 samples/sec Loss 6.5342 LearningRate 0.0364 Epoch: 7 Global Step: 132450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:35,679-Speed 9297.75 samples/sec Loss 6.4971 LearningRate 0.0364 Epoch: 7 Global Step: 132460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:36,741-Speed 9648.16 samples/sec Loss 6.4442 LearningRate 0.0364 Epoch: 7 Global Step: 132470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:37,830-Speed 9409.42 samples/sec Loss 6.4906 LearningRate 0.0364 Epoch: 7 Global Step: 132480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:38,956-Speed 9097.13 samples/sec Loss 6.4640 LearningRate 0.0364 Epoch: 7 Global Step: 132490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:40,075-Speed 9157.56 samples/sec Loss 6.3881 LearningRate 0.0364 Epoch: 7 Global Step: 132500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:41,188-Speed 9212.10 samples/sec Loss 6.4950 LearningRate 0.0364 Epoch: 7 Global Step: 132510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:52:42,293-Speed 9266.00 samples/sec Loss 6.4477 LearningRate 0.0364 Epoch: 7 Global Step: 132520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:43,382-Speed 9410.34 samples/sec Loss 6.3972 LearningRate 0.0364 Epoch: 7 Global Step: 132530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:44,464-Speed 9469.40 samples/sec Loss 6.4432 LearningRate 0.0364 Epoch: 7 Global Step: 132540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:45,561-Speed 9339.67 samples/sec Loss 6.6135 LearningRate 0.0364 Epoch: 7 Global Step: 132550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:46,587-Speed 9982.63 samples/sec Loss 6.6114 LearningRate 0.0363 Epoch: 7 Global Step: 132560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:47,664-Speed 9513.08 samples/sec Loss 6.5319 LearningRate 0.0363 Epoch: 7 Global Step: 132570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:48,753-Speed 9415.40 samples/sec Loss 6.5196 LearningRate 0.0363 Epoch: 7 Global Step: 132580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:49,861-Speed 9244.61 samples/sec Loss 6.5108 LearningRate 0.0363 Epoch: 7 Global Step: 132590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:50,931-Speed 9576.98 samples/sec Loss 6.3852 LearningRate 0.0363 Epoch: 7 Global Step: 132600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:52,009-Speed 9504.09 samples/sec Loss 6.6009 LearningRate 0.0363 Epoch: 7 Global Step: 132610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:53,073-Speed 9639.11 samples/sec Loss 6.4641 LearningRate 0.0363 Epoch: 7 Global Step: 132620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:54,151-Speed 9499.14 samples/sec Loss 6.4785 LearningRate 0.0363 Epoch: 7 Global Step: 132630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:55,263-Speed 9214.22 samples/sec Loss 6.4218 LearningRate 0.0363 Epoch: 7 Global Step: 132640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:56,336-Speed 9549.27 samples/sec Loss 6.5184 LearningRate 0.0363 Epoch: 7 Global Step: 132650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:57,400-Speed 9626.84 samples/sec Loss 6.4767 LearningRate 0.0363 Epoch: 7 Global Step: 132660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:58,469-Speed 9586.33 samples/sec Loss 6.4584 LearningRate 0.0363 Epoch: 7 Global Step: 132670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:52:59,574-Speed 9271.19 samples/sec Loss 6.4589 LearningRate 0.0363 Epoch: 7 Global Step: 132680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:00,650-Speed 9519.54 samples/sec Loss 6.4793 LearningRate 0.0363 Epoch: 7 Global Step: 132690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:01,738-Speed 9418.27 samples/sec Loss 6.4657 LearningRate 0.0363 Epoch: 7 Global Step: 132700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:02,838-Speed 9327.04 samples/sec Loss 6.4290 LearningRate 0.0363 Epoch: 7 Global Step: 132710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:03,900-Speed 9646.63 samples/sec Loss 6.5150 LearningRate 0.0363 Epoch: 7 Global Step: 132720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:04,999-Speed 9324.30 samples/sec Loss 6.5214 LearningRate 0.0363 Epoch: 7 Global Step: 132730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:06,100-Speed 9305.54 samples/sec Loss 6.4750 LearningRate 0.0363 Epoch: 7 Global Step: 132740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:07,191-Speed 9388.84 samples/sec Loss 6.5550 LearningRate 0.0363 Epoch: 7 Global Step: 132750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:08,243-Speed 9738.14 samples/sec Loss 6.3608 LearningRate 0.0363 Epoch: 7 Global Step: 132760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:09,337-Speed 9372.92 samples/sec Loss 6.5011 LearningRate 0.0363 Epoch: 7 Global Step: 132770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:10,419-Speed 9465.08 samples/sec Loss 6.4885 LearningRate 0.0363 Epoch: 7 Global Step: 132780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:11,482-Speed 9645.66 samples/sec Loss 6.4438 LearningRate 0.0363 Epoch: 7 Global Step: 132790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:12,509-Speed 9972.18 samples/sec Loss 6.4120 LearningRate 0.0363 Epoch: 7 Global Step: 132800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:13,580-Speed 9565.85 samples/sec Loss 6.4315 LearningRate 0.0363 Epoch: 7 Global Step: 132810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:14,643-Speed 9638.40 samples/sec Loss 6.4809 LearningRate 0.0363 Epoch: 7 Global Step: 132820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:53:15,720-Speed 9515.11 samples/sec Loss 6.5243 LearningRate 0.0363 Epoch: 7 Global Step: 132830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:53:16,810-Speed 9400.09 samples/sec Loss 6.3800 LearningRate 0.0362 Epoch: 7 Global Step: 132840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:17,893-Speed 9458.49 samples/sec Loss 6.5385 LearningRate 0.0362 Epoch: 7 Global Step: 132850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:18,966-Speed 9548.15 samples/sec Loss 6.5118 LearningRate 0.0362 Epoch: 7 Global Step: 132860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:20,033-Speed 9602.10 samples/sec Loss 6.4457 LearningRate 0.0362 Epoch: 7 Global Step: 132870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:21,101-Speed 9592.53 samples/sec Loss 6.3567 LearningRate 0.0362 Epoch: 7 Global Step: 132880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:22,127-Speed 9986.79 samples/sec Loss 6.3412 LearningRate 0.0362 Epoch: 7 Global Step: 132890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:23,223-Speed 9355.62 samples/sec Loss 6.5353 LearningRate 0.0362 Epoch: 7 Global Step: 132900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:24,282-Speed 9671.61 samples/sec Loss 6.4885 LearningRate 0.0362 Epoch: 7 Global Step: 132910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:25,408-Speed 9097.90 samples/sec Loss 6.4238 LearningRate 0.0362 Epoch: 7 Global Step: 132920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:26,526-Speed 9162.94 samples/sec Loss 6.6124 LearningRate 0.0362 Epoch: 7 Global Step: 132930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:27,585-Speed 9677.03 samples/sec Loss 6.5818 LearningRate 0.0362 Epoch: 7 Global Step: 132940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:28,631-Speed 9804.77 samples/sec Loss 6.4265 LearningRate 0.0362 Epoch: 7 Global Step: 132950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:29,668-Speed 9873.24 samples/sec Loss 6.5030 LearningRate 0.0362 Epoch: 7 Global Step: 132960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:30,776-Speed 9250.42 samples/sec Loss 6.5019 LearningRate 0.0362 Epoch: 7 Global Step: 132970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:31,872-Speed 9347.07 samples/sec Loss 6.4470 LearningRate 0.0362 Epoch: 7 Global Step: 132980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:32,987-Speed 9195.12 samples/sec Loss 6.5377 LearningRate 0.0362 Epoch: 7 Global Step: 132990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:34,062-Speed 9532.01 samples/sec Loss 6.4978 LearningRate 0.0362 Epoch: 7 Global Step: 133000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:35,154-Speed 9384.10 samples/sec Loss 6.4639 LearningRate 0.0362 Epoch: 7 Global Step: 133010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:36,216-Speed 9644.33 samples/sec Loss 6.3958 LearningRate 0.0362 Epoch: 7 Global Step: 133020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:37,269-Speed 9732.54 samples/sec Loss 6.4452 LearningRate 0.0362 Epoch: 7 Global Step: 133030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:38,349-Speed 9479.86 samples/sec Loss 6.4304 LearningRate 0.0362 Epoch: 7 Global Step: 133040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:39,463-Speed 9207.65 samples/sec Loss 6.5502 LearningRate 0.0362 Epoch: 7 Global Step: 133050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:53:40,565-Speed 9295.52 samples/sec Loss 6.4411 LearningRate 0.0362 Epoch: 7 Global Step: 133060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:41,631-Speed 9612.46 samples/sec Loss 6.5786 LearningRate 0.0362 Epoch: 7 Global Step: 133070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:42,730-Speed 9320.54 samples/sec Loss 6.3627 LearningRate 0.0362 Epoch: 7 Global Step: 133080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:43,843-Speed 9204.69 samples/sec Loss 6.5501 LearningRate 0.0362 Epoch: 7 Global Step: 133090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:44,973-Speed 9071.75 samples/sec Loss 6.4435 LearningRate 0.0362 Epoch: 7 Global Step: 133100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:46,028-Speed 9711.66 samples/sec Loss 6.5504 LearningRate 0.0362 Epoch: 7 Global Step: 133110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:47,115-Speed 9427.01 samples/sec Loss 6.5521 LearningRate 0.0361 Epoch: 7 Global Step: 133120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:53:48,197-Speed 9469.79 samples/sec Loss 6.5289 LearningRate 0.0361 Epoch: 7 Global Step: 133130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:49,288-Speed 9387.35 samples/sec Loss 6.6217 LearningRate 0.0361 Epoch: 7 Global Step: 133140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:50,363-Speed 9534.27 samples/sec Loss 6.5010 LearningRate 0.0361 Epoch: 7 Global Step: 133150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:51,444-Speed 9480.29 samples/sec Loss 6.4419 LearningRate 0.0361 Epoch: 7 Global Step: 133160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:52,496-Speed 9735.02 samples/sec Loss 6.3918 LearningRate 0.0361 Epoch: 7 Global Step: 133170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:53,557-Speed 9659.29 samples/sec Loss 6.4472 LearningRate 0.0361 Epoch: 7 Global Step: 133180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:54,640-Speed 9463.06 samples/sec Loss 6.5241 LearningRate 0.0361 Epoch: 7 Global Step: 133190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:55,707-Speed 9597.43 samples/sec Loss 6.4735 LearningRate 0.0361 Epoch: 7 Global Step: 133200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:56,861-Speed 8881.31 samples/sec Loss 6.3962 LearningRate 0.0361 Epoch: 7 Global Step: 133210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:57,932-Speed 9568.23 samples/sec Loss 6.4627 LearningRate 0.0361 Epoch: 7 Global Step: 133220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:53:59,048-Speed 9173.77 samples/sec Loss 6.4928 LearningRate 0.0361 Epoch: 7 Global Step: 133230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:00,127-Speed 9496.01 samples/sec Loss 6.4976 LearningRate 0.0361 Epoch: 7 Global Step: 133240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:01,208-Speed 9479.30 samples/sec Loss 6.5578 LearningRate 0.0361 Epoch: 7 Global Step: 133250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:02,344-Speed 9017.50 samples/sec Loss 6.4871 LearningRate 0.0361 Epoch: 7 Global Step: 133260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:03,482-Speed 9008.03 samples/sec Loss 6.3623 LearningRate 0.0361 Epoch: 7 Global Step: 133270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:04,553-Speed 9571.41 samples/sec Loss 6.4899 LearningRate 0.0361 Epoch: 7 Global Step: 133280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:05,623-Speed 9580.78 samples/sec Loss 6.4630 LearningRate 0.0361 Epoch: 7 Global Step: 133290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:06,745-Speed 9126.77 samples/sec Loss 6.3669 LearningRate 0.0361 Epoch: 7 Global Step: 133300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:07,808-Speed 9638.71 samples/sec Loss 6.4764 LearningRate 0.0361 Epoch: 7 Global Step: 133310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:08,908-Speed 9316.27 samples/sec Loss 6.4850 LearningRate 0.0361 Epoch: 7 Global Step: 133320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:09,979-Speed 9574.70 samples/sec Loss 6.4760 LearningRate 0.0361 Epoch: 7 Global Step: 133330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:11,051-Speed 9558.29 samples/sec Loss 6.4347 LearningRate 0.0361 Epoch: 7 Global Step: 133340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:12,103-Speed 9740.19 samples/sec Loss 6.4753 LearningRate 0.0361 Epoch: 7 Global Step: 133350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:13,193-Speed 9400.22 samples/sec Loss 6.3191 LearningRate 0.0361 Epoch: 7 Global Step: 133360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:14,279-Speed 9430.36 samples/sec Loss 6.4986 LearningRate 0.0361 Epoch: 7 Global Step: 133370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:15,357-Speed 9500.21 samples/sec Loss 6.5219 LearningRate 0.0361 Epoch: 7 Global Step: 133380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:16,450-Speed 9375.64 samples/sec Loss 6.5047 LearningRate 0.0360 Epoch: 7 Global Step: 133390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:17,532-Speed 9466.09 samples/sec Loss 6.4182 LearningRate 0.0360 Epoch: 7 Global Step: 133400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:18,643-Speed 9227.87 samples/sec Loss 6.4758 LearningRate 0.0360 Epoch: 7 Global Step: 133410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:19,728-Speed 9441.25 samples/sec Loss 6.6120 LearningRate 0.0360 Epoch: 7 Global Step: 133420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:20,830-Speed 9300.91 samples/sec Loss 6.4803 LearningRate 0.0360 Epoch: 7 Global Step: 133430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:21,924-Speed 9367.34 samples/sec Loss 6.3399 LearningRate 0.0360 Epoch: 7 Global Step: 133440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:23,020-Speed 9347.78 samples/sec Loss 6.4492 LearningRate 0.0360 Epoch: 7 Global Step: 133450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:24,094-Speed 9548.79 samples/sec Loss 6.4832 LearningRate 0.0360 Epoch: 7 Global Step: 133460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:54:25,146-Speed 9738.57 samples/sec Loss 6.5695 LearningRate 0.0360 Epoch: 7 Global Step: 133470 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:54:26,206-Speed 9667.17 samples/sec Loss 6.4099 LearningRate 0.0360 Epoch: 7 Global Step: 133480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:54:27,229-Speed 10008.80 samples/sec Loss 6.4003 LearningRate 0.0360 Epoch: 7 Global Step: 133490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:28,293-Speed 9630.75 samples/sec Loss 6.4133 LearningRate 0.0360 Epoch: 7 Global Step: 133500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:29,416-Speed 9126.79 samples/sec Loss 6.5299 LearningRate 0.0360 Epoch: 7 Global Step: 133510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:30,804-Speed 7377.91 samples/sec Loss 6.4412 LearningRate 0.0360 Epoch: 7 Global Step: 133520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:54:59,977-Speed 351.04 samples/sec Loss 6.2787 LearningRate 0.0360 Epoch: 8 Global Step: 133530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:01,216-Speed 8274.77 samples/sec Loss 5.7452 LearningRate 0.0360 Epoch: 8 Global Step: 133540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:02,926-Speed 5990.68 samples/sec Loss 5.7061 LearningRate 0.0360 Epoch: 8 Global Step: 133550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:04,744-Speed 5634.74 samples/sec Loss 5.6522 LearningRate 0.0360 Epoch: 8 Global Step: 133560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:06,347-Speed 6392.35 samples/sec Loss 5.6688 LearningRate 0.0360 Epoch: 8 Global Step: 133570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:07,516-Speed 8768.47 samples/sec Loss 5.6419 LearningRate 0.0360 Epoch: 8 Global Step: 133580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:08,771-Speed 8160.24 samples/sec Loss 5.5655 LearningRate 0.0360 Epoch: 8 Global Step: 133590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:09,843-Speed 9559.07 samples/sec Loss 5.7168 LearningRate 0.0360 Epoch: 8 Global Step: 133600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:10,917-Speed 9540.75 samples/sec Loss 5.6078 LearningRate 0.0360 Epoch: 8 Global Step: 133610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:12,026-Speed 9243.98 samples/sec Loss 5.7123 LearningRate 0.0360 Epoch: 8 Global Step: 133620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:13,404-Speed 7434.64 samples/sec Loss 5.7700 LearningRate 0.0360 Epoch: 8 Global Step: 133630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:14,522-Speed 9158.21 samples/sec Loss 5.6549 LearningRate 0.0360 Epoch: 8 Global Step: 133640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:15,645-Speed 9126.16 samples/sec Loss 5.6491 LearningRate 0.0360 Epoch: 8 Global Step: 133650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:16,738-Speed 9378.88 samples/sec Loss 5.8074 LearningRate 0.0360 Epoch: 8 Global Step: 133660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:17,816-Speed 9504.89 samples/sec Loss 5.6989 LearningRate 0.0359 Epoch: 8 Global Step: 133670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:18,930-Speed 9192.66 samples/sec Loss 5.6363 LearningRate 0.0359 Epoch: 8 Global Step: 133680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:20,031-Speed 9309.71 samples/sec Loss 5.7916 LearningRate 0.0359 Epoch: 8 Global Step: 133690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:55:21,096-Speed 9623.10 samples/sec Loss 5.6810 LearningRate 0.0359 Epoch: 8 Global Step: 133700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:55:22,152-Speed 9706.40 samples/sec Loss 5.7197 LearningRate 0.0359 Epoch: 8 Global Step: 133710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:55:23,199-Speed 9782.95 samples/sec Loss 5.7753 LearningRate 0.0359 Epoch: 8 Global Step: 133720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:24,255-Speed 9699.98 samples/sec Loss 5.6736 LearningRate 0.0359 Epoch: 8 Global Step: 133730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:25,333-Speed 9505.52 samples/sec Loss 5.7397 LearningRate 0.0359 Epoch: 8 Global Step: 133740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:26,415-Speed 9473.56 samples/sec Loss 5.7265 LearningRate 0.0359 Epoch: 8 Global Step: 133750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:27,482-Speed 9598.68 samples/sec Loss 5.5625 LearningRate 0.0359 Epoch: 8 Global Step: 133760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:28,540-Speed 9690.00 samples/sec Loss 5.6823 LearningRate 0.0359 Epoch: 8 Global Step: 133770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:29,896-Speed 7555.72 samples/sec Loss 5.7231 LearningRate 0.0359 Epoch: 8 Global Step: 133780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:31,004-Speed 9249.06 samples/sec Loss 5.7348 LearningRate 0.0359 Epoch: 8 Global Step: 133790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:32,117-Speed 9198.82 samples/sec Loss 5.7691 LearningRate 0.0359 Epoch: 8 Global Step: 133800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:33,228-Speed 9222.84 samples/sec Loss 5.6956 LearningRate 0.0359 Epoch: 8 Global Step: 133810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:34,300-Speed 9554.64 samples/sec Loss 5.7851 LearningRate 0.0359 Epoch: 8 Global Step: 133820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:35,370-Speed 9581.12 samples/sec Loss 5.6825 LearningRate 0.0359 Epoch: 8 Global Step: 133830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:36,410-Speed 9849.76 samples/sec Loss 5.6766 LearningRate 0.0359 Epoch: 8 Global Step: 133840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:37,482-Speed 9561.78 samples/sec Loss 5.7435 LearningRate 0.0359 Epoch: 8 Global Step: 133850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:38,559-Speed 9510.69 samples/sec Loss 5.6844 LearningRate 0.0359 Epoch: 8 Global Step: 133860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:55:39,616-Speed 9698.76 samples/sec Loss 5.6794 LearningRate 0.0359 Epoch: 8 Global Step: 133870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:40,685-Speed 9580.04 samples/sec Loss 5.7011 LearningRate 0.0359 Epoch: 8 Global Step: 133880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:41,758-Speed 9552.32 samples/sec Loss 5.7557 LearningRate 0.0359 Epoch: 8 Global Step: 133890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:42,876-Speed 9168.45 samples/sec Loss 5.7714 LearningRate 0.0359 Epoch: 8 Global Step: 133900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:43,973-Speed 9341.17 samples/sec Loss 5.6567 LearningRate 0.0359 Epoch: 8 Global Step: 133910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:45,081-Speed 9245.92 samples/sec Loss 5.7114 LearningRate 0.0359 Epoch: 8 Global Step: 133920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:46,121-Speed 9855.46 samples/sec Loss 5.7356 LearningRate 0.0359 Epoch: 8 Global Step: 133930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:47,228-Speed 9254.27 samples/sec Loss 5.7527 LearningRate 0.0359 Epoch: 8 Global Step: 133940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:48,265-Speed 9873.91 samples/sec Loss 5.8555 LearningRate 0.0358 Epoch: 8 Global Step: 133950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:49,378-Speed 9209.53 samples/sec Loss 5.7181 LearningRate 0.0358 Epoch: 8 Global Step: 133960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:55:50,660-Speed 7991.70 samples/sec Loss 5.7000 LearningRate 0.0358 Epoch: 8 Global Step: 133970 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:55:51,776-Speed 9181.91 samples/sec Loss 5.6996 LearningRate 0.0358 Epoch: 8 Global Step: 133980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:55:52,847-Speed 9560.41 samples/sec Loss 5.7628 LearningRate 0.0358 Epoch: 8 Global Step: 133990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:55:54,109-Speed 8118.10 samples/sec Loss 5.8491 LearningRate 0.0358 Epoch: 8 Global Step: 134000 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:56:16,118-[lfw][134000]XNorm: 11.110978 Training: 2022-04-11 16:56:16,118-[lfw][134000]Accuracy-Flip: 0.97767+-0.00611 Training: 2022-04-11 16:56:16,119-[lfw][134000]Accuracy-Highest: 0.99683 Training: 2022-04-11 16:56:41,439-[cfp_fp][134000]XNorm: 9.446907 Training: 2022-04-11 16:56:41,440-[cfp_fp][134000]Accuracy-Flip: 0.89029+-0.01258 Training: 2022-04-11 16:56:41,440-[cfp_fp][134000]Accuracy-Highest: 0.96157 Training: 2022-04-11 16:57:03,287-[agedb_30][134000]XNorm: 10.635493 Training: 2022-04-11 16:57:03,288-[agedb_30][134000]Accuracy-Flip: 0.91617+-0.01520 Training: 2022-04-11 16:57:03,288-[agedb_30][134000]Accuracy-Highest: 0.96650 Training: 2022-04-11 16:57:04,737-Speed 144.99 samples/sec Loss 5.8002 LearningRate 0.0358 Epoch: 8 Global Step: 134010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:06,043-Speed 7850.46 samples/sec Loss 5.7898 LearningRate 0.0358 Epoch: 8 Global Step: 134020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:07,308-Speed 8100.06 samples/sec Loss 5.6676 LearningRate 0.0358 Epoch: 8 Global Step: 134030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:08,582-Speed 8037.89 samples/sec Loss 5.6885 LearningRate 0.0358 Epoch: 8 Global Step: 134040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:09,662-Speed 9488.64 samples/sec Loss 5.7868 LearningRate 0.0358 Epoch: 8 Global Step: 134050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:10,757-Speed 9355.82 samples/sec Loss 5.6469 LearningRate 0.0358 Epoch: 8 Global Step: 134060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:11,984-Speed 8352.30 samples/sec Loss 5.8915 LearningRate 0.0358 Epoch: 8 Global Step: 134070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:13,132-Speed 8920.45 samples/sec Loss 5.7855 LearningRate 0.0358 Epoch: 8 Global Step: 134080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:14,233-Speed 9308.10 samples/sec Loss 5.7690 LearningRate 0.0358 Epoch: 8 Global Step: 134090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:15,312-Speed 9504.56 samples/sec Loss 5.7725 LearningRate 0.0358 Epoch: 8 Global Step: 134100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:16,367-Speed 9710.60 samples/sec Loss 5.8072 LearningRate 0.0358 Epoch: 8 Global Step: 134110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:17,492-Speed 9105.23 samples/sec Loss 5.8168 LearningRate 0.0358 Epoch: 8 Global Step: 134120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:18,591-Speed 9322.10 samples/sec Loss 5.7034 LearningRate 0.0358 Epoch: 8 Global Step: 134130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:19,656-Speed 9617.98 samples/sec Loss 5.8428 LearningRate 0.0358 Epoch: 8 Global Step: 134140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:20,708-Speed 9743.31 samples/sec Loss 5.8577 LearningRate 0.0358 Epoch: 8 Global Step: 134150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:21,765-Speed 9687.90 samples/sec Loss 5.6755 LearningRate 0.0358 Epoch: 8 Global Step: 134160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:22,800-Speed 9901.44 samples/sec Loss 5.8761 LearningRate 0.0358 Epoch: 8 Global Step: 134170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:23,854-Speed 9722.13 samples/sec Loss 5.8198 LearningRate 0.0358 Epoch: 8 Global Step: 134180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:24,920-Speed 9610.72 samples/sec Loss 5.8963 LearningRate 0.0358 Epoch: 8 Global Step: 134190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:25,982-Speed 9643.72 samples/sec Loss 5.7302 LearningRate 0.0358 Epoch: 8 Global Step: 134200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:27,038-Speed 9710.27 samples/sec Loss 5.8144 LearningRate 0.0358 Epoch: 8 Global Step: 134210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:28,119-Speed 9477.53 samples/sec Loss 5.7742 LearningRate 0.0358 Epoch: 8 Global Step: 134220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:29,201-Speed 9466.02 samples/sec Loss 5.8323 LearningRate 0.0357 Epoch: 8 Global Step: 134230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:30,252-Speed 9747.23 samples/sec Loss 5.7941 LearningRate 0.0357 Epoch: 8 Global Step: 134240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:31,286-Speed 9917.20 samples/sec Loss 5.8263 LearningRate 0.0357 Epoch: 8 Global Step: 134250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:32,364-Speed 9501.87 samples/sec Loss 5.7887 LearningRate 0.0357 Epoch: 8 Global Step: 134260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:33,462-Speed 9328.85 samples/sec Loss 5.7350 LearningRate 0.0357 Epoch: 8 Global Step: 134270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:34,535-Speed 9553.61 samples/sec Loss 5.8527 LearningRate 0.0357 Epoch: 8 Global Step: 134280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:35,647-Speed 9216.66 samples/sec Loss 5.8963 LearningRate 0.0357 Epoch: 8 Global Step: 134290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:57:36,736-Speed 9407.85 samples/sec Loss 5.8666 LearningRate 0.0357 Epoch: 8 Global Step: 134300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:37,846-Speed 9231.75 samples/sec Loss 5.7828 LearningRate 0.0357 Epoch: 8 Global Step: 134310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:38,949-Speed 9285.88 samples/sec Loss 5.8622 LearningRate 0.0357 Epoch: 8 Global Step: 134320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:40,003-Speed 9727.20 samples/sec Loss 5.9011 LearningRate 0.0357 Epoch: 8 Global Step: 134330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:41,060-Speed 9693.28 samples/sec Loss 5.8523 LearningRate 0.0357 Epoch: 8 Global Step: 134340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:42,152-Speed 9376.14 samples/sec Loss 5.8762 LearningRate 0.0357 Epoch: 8 Global Step: 134350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:43,207-Speed 9718.73 samples/sec Loss 5.7977 LearningRate 0.0357 Epoch: 8 Global Step: 134360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:44,284-Speed 9509.75 samples/sec Loss 5.7271 LearningRate 0.0357 Epoch: 8 Global Step: 134370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:45,350-Speed 9616.12 samples/sec Loss 5.7249 LearningRate 0.0357 Epoch: 8 Global Step: 134380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:46,412-Speed 9647.54 samples/sec Loss 5.7863 LearningRate 0.0357 Epoch: 8 Global Step: 134390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:47,512-Speed 9315.07 samples/sec Loss 5.8311 LearningRate 0.0357 Epoch: 8 Global Step: 134400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:48,636-Speed 9116.93 samples/sec Loss 5.8355 LearningRate 0.0357 Epoch: 8 Global Step: 134410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:49,705-Speed 9577.63 samples/sec Loss 5.8568 LearningRate 0.0357 Epoch: 8 Global Step: 134420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:57:50,784-Speed 9500.12 samples/sec Loss 5.8022 LearningRate 0.0357 Epoch: 8 Global Step: 134430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:51,857-Speed 9551.79 samples/sec Loss 5.7601 LearningRate 0.0357 Epoch: 8 Global Step: 134440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:52,934-Speed 9515.47 samples/sec Loss 5.8642 LearningRate 0.0357 Epoch: 8 Global Step: 134450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:54,044-Speed 9226.50 samples/sec Loss 5.7894 LearningRate 0.0357 Epoch: 8 Global Step: 134460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:55,138-Speed 9367.79 samples/sec Loss 5.8305 LearningRate 0.0357 Epoch: 8 Global Step: 134470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:56,176-Speed 9870.01 samples/sec Loss 5.8220 LearningRate 0.0357 Epoch: 8 Global Step: 134480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:57,251-Speed 9530.99 samples/sec Loss 5.9715 LearningRate 0.0357 Epoch: 8 Global Step: 134490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:58,336-Speed 9448.37 samples/sec Loss 5.8855 LearningRate 0.0357 Epoch: 8 Global Step: 134500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:57:59,389-Speed 9726.69 samples/sec Loss 5.9273 LearningRate 0.0356 Epoch: 8 Global Step: 134510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:00,483-Speed 9368.89 samples/sec Loss 5.7903 LearningRate 0.0356 Epoch: 8 Global Step: 134520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:01,565-Speed 9464.43 samples/sec Loss 5.8805 LearningRate 0.0356 Epoch: 8 Global Step: 134530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:58:02,616-Speed 9748.68 samples/sec Loss 5.9151 LearningRate 0.0356 Epoch: 8 Global Step: 134540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:03,729-Speed 9202.39 samples/sec Loss 5.9196 LearningRate 0.0356 Epoch: 8 Global Step: 134550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:04,844-Speed 9195.70 samples/sec Loss 5.9684 LearningRate 0.0356 Epoch: 8 Global Step: 134560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:05,902-Speed 9682.11 samples/sec Loss 5.8429 LearningRate 0.0356 Epoch: 8 Global Step: 134570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:06,982-Speed 9481.63 samples/sec Loss 5.9154 LearningRate 0.0356 Epoch: 8 Global Step: 134580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:08,050-Speed 9596.57 samples/sec Loss 5.8728 LearningRate 0.0356 Epoch: 8 Global Step: 134590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:09,089-Speed 9864.46 samples/sec Loss 5.8353 LearningRate 0.0356 Epoch: 8 Global Step: 134600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:10,134-Speed 9802.10 samples/sec Loss 5.8411 LearningRate 0.0356 Epoch: 8 Global Step: 134610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:11,203-Speed 9584.91 samples/sec Loss 5.8441 LearningRate 0.0356 Epoch: 8 Global Step: 134620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:12,277-Speed 9542.75 samples/sec Loss 5.8547 LearningRate 0.0356 Epoch: 8 Global Step: 134630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:13,392-Speed 9194.93 samples/sec Loss 5.9465 LearningRate 0.0356 Epoch: 8 Global Step: 134640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:14,534-Speed 8967.34 samples/sec Loss 5.8800 LearningRate 0.0356 Epoch: 8 Global Step: 134650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:15,645-Speed 9228.86 samples/sec Loss 5.8303 LearningRate 0.0356 Epoch: 8 Global Step: 134660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:16,694-Speed 9768.17 samples/sec Loss 5.9455 LearningRate 0.0356 Epoch: 8 Global Step: 134670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:17,801-Speed 9252.87 samples/sec Loss 5.8267 LearningRate 0.0356 Epoch: 8 Global Step: 134680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:18,867-Speed 9612.12 samples/sec Loss 5.9111 LearningRate 0.0356 Epoch: 8 Global Step: 134690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:19,940-Speed 9552.33 samples/sec Loss 5.9315 LearningRate 0.0356 Epoch: 8 Global Step: 134700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:21,059-Speed 9153.94 samples/sec Loss 5.9259 LearningRate 0.0356 Epoch: 8 Global Step: 134710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:22,139-Speed 9484.71 samples/sec Loss 5.9406 LearningRate 0.0356 Epoch: 8 Global Step: 134720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:23,244-Speed 9274.13 samples/sec Loss 5.7995 LearningRate 0.0356 Epoch: 8 Global Step: 134730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:24,348-Speed 9279.60 samples/sec Loss 5.8351 LearningRate 0.0356 Epoch: 8 Global Step: 134740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:25,453-Speed 9272.24 samples/sec Loss 5.8596 LearningRate 0.0356 Epoch: 8 Global Step: 134750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:26,557-Speed 9281.84 samples/sec Loss 5.9263 LearningRate 0.0356 Epoch: 8 Global Step: 134760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:27,671-Speed 9200.37 samples/sec Loss 5.8765 LearningRate 0.0356 Epoch: 8 Global Step: 134770 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:58:28,758-Speed 9428.18 samples/sec Loss 5.9792 LearningRate 0.0356 Epoch: 8 Global Step: 134780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:29,829-Speed 9565.82 samples/sec Loss 5.9993 LearningRate 0.0355 Epoch: 8 Global Step: 134790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:30,899-Speed 9578.05 samples/sec Loss 5.8664 LearningRate 0.0355 Epoch: 8 Global Step: 134800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:31,963-Speed 9633.51 samples/sec Loss 5.9494 LearningRate 0.0355 Epoch: 8 Global Step: 134810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:33,037-Speed 9540.06 samples/sec Loss 5.9110 LearningRate 0.0355 Epoch: 8 Global Step: 134820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:34,144-Speed 9254.92 samples/sec Loss 5.9474 LearningRate 0.0355 Epoch: 8 Global Step: 134830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:35,277-Speed 9038.62 samples/sec Loss 5.9001 LearningRate 0.0355 Epoch: 8 Global Step: 134840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:36,355-Speed 9505.46 samples/sec Loss 6.0242 LearningRate 0.0355 Epoch: 8 Global Step: 134850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:37,490-Speed 9022.88 samples/sec Loss 5.8527 LearningRate 0.0355 Epoch: 8 Global Step: 134860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:38,532-Speed 9836.83 samples/sec Loss 5.8996 LearningRate 0.0355 Epoch: 8 Global Step: 134870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:39,646-Speed 9199.88 samples/sec Loss 5.9326 LearningRate 0.0355 Epoch: 8 Global Step: 134880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:40,726-Speed 9489.80 samples/sec Loss 5.7742 LearningRate 0.0355 Epoch: 8 Global Step: 134890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:41,828-Speed 9299.07 samples/sec Loss 5.9753 LearningRate 0.0355 Epoch: 8 Global Step: 134900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:42,924-Speed 9341.32 samples/sec Loss 5.8356 LearningRate 0.0355 Epoch: 8 Global Step: 134910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:44,018-Speed 9370.65 samples/sec Loss 5.7935 LearningRate 0.0355 Epoch: 8 Global Step: 134920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:45,125-Speed 9259.93 samples/sec Loss 5.9840 LearningRate 0.0355 Epoch: 8 Global Step: 134930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:46,209-Speed 9446.73 samples/sec Loss 5.8380 LearningRate 0.0355 Epoch: 8 Global Step: 134940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:47,295-Speed 9437.19 samples/sec Loss 5.9611 LearningRate 0.0355 Epoch: 8 Global Step: 134950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:48,400-Speed 9275.91 samples/sec Loss 6.0487 LearningRate 0.0355 Epoch: 8 Global Step: 134960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:49,482-Speed 9464.38 samples/sec Loss 5.7926 LearningRate 0.0355 Epoch: 8 Global Step: 134970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:50,522-Speed 9861.12 samples/sec Loss 5.9113 LearningRate 0.0355 Epoch: 8 Global Step: 134980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:58:51,613-Speed 9388.63 samples/sec Loss 5.9651 LearningRate 0.0355 Epoch: 8 Global Step: 134990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:52,661-Speed 9779.83 samples/sec Loss 5.8617 LearningRate 0.0355 Epoch: 8 Global Step: 135000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:53,735-Speed 9536.65 samples/sec Loss 5.9869 LearningRate 0.0355 Epoch: 8 Global Step: 135010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:58:54,754-Speed 10053.86 samples/sec Loss 5.8662 LearningRate 0.0355 Epoch: 8 Global Step: 135020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:55,851-Speed 9342.06 samples/sec Loss 5.8968 LearningRate 0.0355 Epoch: 8 Global Step: 135030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:56,948-Speed 9339.32 samples/sec Loss 5.8256 LearningRate 0.0355 Epoch: 8 Global Step: 135040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:58,012-Speed 9632.30 samples/sec Loss 5.9482 LearningRate 0.0355 Epoch: 8 Global Step: 135050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:58:59,099-Speed 9427.66 samples/sec Loss 5.8771 LearningRate 0.0355 Epoch: 8 Global Step: 135060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:00,165-Speed 9607.03 samples/sec Loss 5.9194 LearningRate 0.0354 Epoch: 8 Global Step: 135070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:01,252-Speed 9424.94 samples/sec Loss 5.9826 LearningRate 0.0354 Epoch: 8 Global Step: 135080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:02,322-Speed 9577.88 samples/sec Loss 5.9552 LearningRate 0.0354 Epoch: 8 Global Step: 135090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:03,371-Speed 9772.73 samples/sec Loss 5.8697 LearningRate 0.0354 Epoch: 8 Global Step: 135100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:04,475-Speed 9271.99 samples/sec Loss 5.9296 LearningRate 0.0354 Epoch: 8 Global Step: 135110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:05,529-Speed 9722.06 samples/sec Loss 5.9353 LearningRate 0.0354 Epoch: 8 Global Step: 135120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:06,627-Speed 9337.53 samples/sec Loss 6.0282 LearningRate 0.0354 Epoch: 8 Global Step: 135130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:07,683-Speed 9695.39 samples/sec Loss 5.9071 LearningRate 0.0354 Epoch: 8 Global Step: 135140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:08,771-Speed 9424.27 samples/sec Loss 5.9299 LearningRate 0.0354 Epoch: 8 Global Step: 135150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:09,811-Speed 9854.48 samples/sec Loss 5.9350 LearningRate 0.0354 Epoch: 8 Global Step: 135160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:10,852-Speed 9840.82 samples/sec Loss 6.0223 LearningRate 0.0354 Epoch: 8 Global Step: 135170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:11,926-Speed 9541.31 samples/sec Loss 5.9283 LearningRate 0.0354 Epoch: 8 Global Step: 135180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:12,977-Speed 9753.48 samples/sec Loss 6.0663 LearningRate 0.0354 Epoch: 8 Global Step: 135190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:14,068-Speed 9388.69 samples/sec Loss 6.0477 LearningRate 0.0354 Epoch: 8 Global Step: 135200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:15,139-Speed 9570.21 samples/sec Loss 5.9908 LearningRate 0.0354 Epoch: 8 Global Step: 135210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:16,219-Speed 9486.59 samples/sec Loss 5.9698 LearningRate 0.0354 Epoch: 8 Global Step: 135220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:59:17,280-Speed 9656.99 samples/sec Loss 6.0046 LearningRate 0.0354 Epoch: 8 Global Step: 135230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:59:18,320-Speed 9855.25 samples/sec Loss 5.9475 LearningRate 0.0354 Epoch: 8 Global Step: 135240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:59:19,422-Speed 9294.64 samples/sec Loss 6.0660 LearningRate 0.0354 Epoch: 8 Global Step: 135250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:20,476-Speed 9718.63 samples/sec Loss 5.9995 LearningRate 0.0354 Epoch: 8 Global Step: 135260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:21,547-Speed 9563.97 samples/sec Loss 6.0346 LearningRate 0.0354 Epoch: 8 Global Step: 135270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:22,600-Speed 9737.20 samples/sec Loss 6.0529 LearningRate 0.0354 Epoch: 8 Global Step: 135280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:23,695-Speed 9350.37 samples/sec Loss 6.0807 LearningRate 0.0354 Epoch: 8 Global Step: 135290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:24,776-Speed 9480.59 samples/sec Loss 6.0070 LearningRate 0.0354 Epoch: 8 Global Step: 135300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:25,875-Speed 9326.05 samples/sec Loss 6.0005 LearningRate 0.0354 Epoch: 8 Global Step: 135310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:26,963-Speed 9413.23 samples/sec Loss 5.9215 LearningRate 0.0354 Epoch: 8 Global Step: 135320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:28,034-Speed 9571.52 samples/sec Loss 6.0157 LearningRate 0.0354 Epoch: 8 Global Step: 135330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:29,115-Speed 9478.38 samples/sec Loss 5.9353 LearningRate 0.0354 Epoch: 8 Global Step: 135340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:30,194-Speed 9500.92 samples/sec Loss 5.9995 LearningRate 0.0353 Epoch: 8 Global Step: 135350 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:59:31,268-Speed 9534.43 samples/sec Loss 6.0342 LearningRate 0.0353 Epoch: 8 Global Step: 135360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:32,307-Speed 9867.98 samples/sec Loss 5.9963 LearningRate 0.0353 Epoch: 8 Global Step: 135370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:33,425-Speed 9162.18 samples/sec Loss 6.0308 LearningRate 0.0353 Epoch: 8 Global Step: 135380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:34,509-Speed 9449.39 samples/sec Loss 5.9757 LearningRate 0.0353 Epoch: 8 Global Step: 135390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:35,615-Speed 9264.41 samples/sec Loss 5.8845 LearningRate 0.0353 Epoch: 8 Global Step: 135400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:36,712-Speed 9334.50 samples/sec Loss 5.9945 LearningRate 0.0353 Epoch: 8 Global Step: 135410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:37,815-Speed 9296.11 samples/sec Loss 5.9632 LearningRate 0.0353 Epoch: 8 Global Step: 135420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:38,898-Speed 9464.49 samples/sec Loss 5.9811 LearningRate 0.0353 Epoch: 8 Global Step: 135430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:40,007-Speed 9233.58 samples/sec Loss 5.9686 LearningRate 0.0353 Epoch: 8 Global Step: 135440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:41,108-Speed 9307.95 samples/sec Loss 6.0352 LearningRate 0.0353 Epoch: 8 Global Step: 135450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 16:59:42,215-Speed 9250.96 samples/sec Loss 6.1260 LearningRate 0.0353 Epoch: 8 Global Step: 135460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:43,277-Speed 9651.94 samples/sec Loss 5.9333 LearningRate 0.0353 Epoch: 8 Global Step: 135470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:44,378-Speed 9301.07 samples/sec Loss 6.0558 LearningRate 0.0353 Epoch: 8 Global Step: 135480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:45,515-Speed 9019.26 samples/sec Loss 6.0569 LearningRate 0.0353 Epoch: 8 Global Step: 135490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:46,628-Speed 9201.39 samples/sec Loss 5.8791 LearningRate 0.0353 Epoch: 8 Global Step: 135500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:47,720-Speed 9386.03 samples/sec Loss 6.0216 LearningRate 0.0353 Epoch: 8 Global Step: 135510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:48,752-Speed 9933.64 samples/sec Loss 5.8813 LearningRate 0.0353 Epoch: 8 Global Step: 135520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:49,820-Speed 9605.70 samples/sec Loss 5.9588 LearningRate 0.0353 Epoch: 8 Global Step: 135530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:50,905-Speed 9443.28 samples/sec Loss 6.0664 LearningRate 0.0353 Epoch: 8 Global Step: 135540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:51,975-Speed 9574.77 samples/sec Loss 5.9390 LearningRate 0.0353 Epoch: 8 Global Step: 135550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:53,080-Speed 9274.74 samples/sec Loss 5.8819 LearningRate 0.0353 Epoch: 8 Global Step: 135560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 16:59:54,145-Speed 9619.34 samples/sec Loss 6.0380 LearningRate 0.0353 Epoch: 8 Global Step: 135570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:55,226-Speed 9477.15 samples/sec Loss 6.0944 LearningRate 0.0353 Epoch: 8 Global Step: 135580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:56,273-Speed 9782.87 samples/sec Loss 5.9649 LearningRate 0.0353 Epoch: 8 Global Step: 135590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:57,294-Speed 10032.63 samples/sec Loss 6.0295 LearningRate 0.0353 Epoch: 8 Global Step: 135600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:58,384-Speed 9409.06 samples/sec Loss 6.0503 LearningRate 0.0353 Epoch: 8 Global Step: 135610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 16:59:59,453-Speed 9581.71 samples/sec Loss 5.9706 LearningRate 0.0353 Epoch: 8 Global Step: 135620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:00,521-Speed 9595.28 samples/sec Loss 6.0926 LearningRate 0.0352 Epoch: 8 Global Step: 135630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:01,640-Speed 9150.90 samples/sec Loss 6.1090 LearningRate 0.0352 Epoch: 8 Global Step: 135640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:02,736-Speed 9356.06 samples/sec Loss 6.0254 LearningRate 0.0352 Epoch: 8 Global Step: 135650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:03,798-Speed 9646.80 samples/sec Loss 6.0218 LearningRate 0.0352 Epoch: 8 Global Step: 135660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:04,836-Speed 9864.36 samples/sec Loss 6.1289 LearningRate 0.0352 Epoch: 8 Global Step: 135670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:00:05,917-Speed 9475.31 samples/sec Loss 6.0067 LearningRate 0.0352 Epoch: 8 Global Step: 135680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:07,025-Speed 9249.67 samples/sec Loss 6.0767 LearningRate 0.0352 Epoch: 8 Global Step: 135690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:08,085-Speed 9671.96 samples/sec Loss 6.0675 LearningRate 0.0352 Epoch: 8 Global Step: 135700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:09,150-Speed 9620.36 samples/sec Loss 5.9034 LearningRate 0.0352 Epoch: 8 Global Step: 135710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:10,208-Speed 9690.47 samples/sec Loss 6.0252 LearningRate 0.0352 Epoch: 8 Global Step: 135720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:11,255-Speed 9781.41 samples/sec Loss 6.0451 LearningRate 0.0352 Epoch: 8 Global Step: 135730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:12,307-Speed 9740.35 samples/sec Loss 6.0922 LearningRate 0.0352 Epoch: 8 Global Step: 135740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:13,425-Speed 9165.28 samples/sec Loss 6.0157 LearningRate 0.0352 Epoch: 8 Global Step: 135750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:14,489-Speed 9627.99 samples/sec Loss 6.0042 LearningRate 0.0352 Epoch: 8 Global Step: 135760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:15,539-Speed 9759.20 samples/sec Loss 5.8953 LearningRate 0.0352 Epoch: 8 Global Step: 135770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:16,588-Speed 9771.79 samples/sec Loss 5.9689 LearningRate 0.0352 Epoch: 8 Global Step: 135780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:00:17,656-Speed 9593.85 samples/sec Loss 6.0752 LearningRate 0.0352 Epoch: 8 Global Step: 135790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:18,783-Speed 9090.34 samples/sec Loss 5.9759 LearningRate 0.0352 Epoch: 8 Global Step: 135800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:19,858-Speed 9528.00 samples/sec Loss 6.0970 LearningRate 0.0352 Epoch: 8 Global Step: 135810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:20,928-Speed 9583.30 samples/sec Loss 6.0516 LearningRate 0.0352 Epoch: 8 Global Step: 135820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:22,049-Speed 9132.48 samples/sec Loss 5.9436 LearningRate 0.0352 Epoch: 8 Global Step: 135830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:23,127-Speed 9504.38 samples/sec Loss 6.0113 LearningRate 0.0352 Epoch: 8 Global Step: 135840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:24,236-Speed 9243.50 samples/sec Loss 5.9836 LearningRate 0.0352 Epoch: 8 Global Step: 135850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:25,366-Speed 9070.05 samples/sec Loss 5.9767 LearningRate 0.0352 Epoch: 8 Global Step: 135860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:26,449-Speed 9455.35 samples/sec Loss 5.9905 LearningRate 0.0352 Epoch: 8 Global Step: 135870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:27,538-Speed 9408.64 samples/sec Loss 6.0319 LearningRate 0.0352 Epoch: 8 Global Step: 135880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:28,611-Speed 9560.50 samples/sec Loss 6.0254 LearningRate 0.0352 Epoch: 8 Global Step: 135890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:29,693-Speed 9464.55 samples/sec Loss 6.0724 LearningRate 0.0352 Epoch: 8 Global Step: 135900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:30,758-Speed 9622.60 samples/sec Loss 6.2302 LearningRate 0.0351 Epoch: 8 Global Step: 135910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:31,868-Speed 9234.20 samples/sec Loss 6.0990 LearningRate 0.0351 Epoch: 8 Global Step: 135920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:32,932-Speed 9622.20 samples/sec Loss 6.0747 LearningRate 0.0351 Epoch: 8 Global Step: 135930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:34,008-Speed 9524.67 samples/sec Loss 6.0519 LearningRate 0.0351 Epoch: 8 Global Step: 135940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:35,132-Speed 9115.04 samples/sec Loss 6.0617 LearningRate 0.0351 Epoch: 8 Global Step: 135950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:36,187-Speed 9722.02 samples/sec Loss 6.0732 LearningRate 0.0351 Epoch: 8 Global Step: 135960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:37,254-Speed 9600.49 samples/sec Loss 6.0185 LearningRate 0.0351 Epoch: 8 Global Step: 135970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:38,364-Speed 9229.97 samples/sec Loss 6.0292 LearningRate 0.0351 Epoch: 8 Global Step: 135980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:39,428-Speed 9632.50 samples/sec Loss 6.0128 LearningRate 0.0351 Epoch: 8 Global Step: 135990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:00:40,503-Speed 9531.73 samples/sec Loss 6.0701 LearningRate 0.0351 Epoch: 8 Global Step: 136000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:01:02,246-[lfw][136000]XNorm: 10.414027 Training: 2022-04-11 17:01:02,247-[lfw][136000]Accuracy-Flip: 0.99617+-0.00248 Training: 2022-04-11 17:01:02,247-[lfw][136000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:01:27,384-[cfp_fp][136000]XNorm: 8.860287 Training: 2022-04-11 17:01:27,385-[cfp_fp][136000]Accuracy-Flip: 0.95943+-0.01254 Training: 2022-04-11 17:01:27,385-[cfp_fp][136000]Accuracy-Highest: 0.96157 Training: 2022-04-11 17:01:49,069-[agedb_30][136000]XNorm: 10.062916 Training: 2022-04-11 17:01:49,069-[agedb_30][136000]Accuracy-Flip: 0.96500+-0.01003 Training: 2022-04-11 17:01:49,070-[agedb_30][136000]Accuracy-Highest: 0.96650 Training: 2022-04-11 17:01:50,158-Speed 147.01 samples/sec Loss 6.0386 LearningRate 0.0351 Epoch: 8 Global Step: 136010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:01:51,261-Speed 9284.61 samples/sec Loss 6.1490 LearningRate 0.0351 Epoch: 8 Global Step: 136020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:01:52,381-Speed 9143.64 samples/sec Loss 5.9747 LearningRate 0.0351 Epoch: 8 Global Step: 136030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:01:53,496-Speed 9191.76 samples/sec Loss 5.9745 LearningRate 0.0351 Epoch: 8 Global Step: 136040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:01:54,633-Speed 9016.52 samples/sec Loss 6.0460 LearningRate 0.0351 Epoch: 8 Global Step: 136050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:01:55,709-Speed 9520.63 samples/sec Loss 5.9804 LearningRate 0.0351 Epoch: 8 Global Step: 136060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:01:56,785-Speed 9518.48 samples/sec Loss 6.1235 LearningRate 0.0351 Epoch: 8 Global Step: 136070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:01:57,862-Speed 9514.82 samples/sec Loss 6.0584 LearningRate 0.0351 Epoch: 8 Global Step: 136080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:01:58,903-Speed 9843.99 samples/sec Loss 6.0276 LearningRate 0.0351 Epoch: 8 Global Step: 136090 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:02:00,011-Speed 9252.57 samples/sec Loss 5.9950 LearningRate 0.0351 Epoch: 8 Global Step: 136100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:01,114-Speed 9288.13 samples/sec Loss 6.0086 LearningRate 0.0351 Epoch: 8 Global Step: 136110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:02,200-Speed 9433.67 samples/sec Loss 6.0446 LearningRate 0.0351 Epoch: 8 Global Step: 136120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:03,267-Speed 9605.07 samples/sec Loss 6.1094 LearningRate 0.0351 Epoch: 8 Global Step: 136130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:04,313-Speed 9790.13 samples/sec Loss 6.1079 LearningRate 0.0351 Epoch: 8 Global Step: 136140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:05,453-Speed 8986.46 samples/sec Loss 6.0443 LearningRate 0.0351 Epoch: 8 Global Step: 136150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:06,536-Speed 9462.83 samples/sec Loss 5.9349 LearningRate 0.0351 Epoch: 8 Global Step: 136160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:07,608-Speed 9559.75 samples/sec Loss 5.9949 LearningRate 0.0351 Epoch: 8 Global Step: 136170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:08,707-Speed 9328.25 samples/sec Loss 6.0808 LearningRate 0.0351 Epoch: 8 Global Step: 136180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:09,776-Speed 9580.10 samples/sec Loss 6.0008 LearningRate 0.0350 Epoch: 8 Global Step: 136190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:10,852-Speed 9521.86 samples/sec Loss 6.0632 LearningRate 0.0350 Epoch: 8 Global Step: 136200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:11,942-Speed 9398.90 samples/sec Loss 6.0261 LearningRate 0.0350 Epoch: 8 Global Step: 136210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:13,040-Speed 9336.25 samples/sec Loss 6.0275 LearningRate 0.0350 Epoch: 8 Global Step: 136220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:14,130-Speed 9400.96 samples/sec Loss 6.1430 LearningRate 0.0350 Epoch: 8 Global Step: 136230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:15,175-Speed 9803.51 samples/sec Loss 6.0355 LearningRate 0.0350 Epoch: 8 Global Step: 136240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:16,240-Speed 9621.58 samples/sec Loss 6.0701 LearningRate 0.0350 Epoch: 8 Global Step: 136250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:17,316-Speed 9522.35 samples/sec Loss 6.1121 LearningRate 0.0350 Epoch: 8 Global Step: 136260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:18,423-Speed 9259.24 samples/sec Loss 6.0742 LearningRate 0.0350 Epoch: 8 Global Step: 136270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:19,488-Speed 9616.82 samples/sec Loss 5.9903 LearningRate 0.0350 Epoch: 8 Global Step: 136280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:20,564-Speed 9520.25 samples/sec Loss 6.1075 LearningRate 0.0350 Epoch: 8 Global Step: 136290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:21,623-Speed 9676.44 samples/sec Loss 6.0239 LearningRate 0.0350 Epoch: 8 Global Step: 136300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:02:22,712-Speed 9405.82 samples/sec Loss 6.1007 LearningRate 0.0350 Epoch: 8 Global Step: 136310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:23,795-Speed 9467.20 samples/sec Loss 6.0143 LearningRate 0.0350 Epoch: 8 Global Step: 136320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:24,882-Speed 9426.28 samples/sec Loss 6.0426 LearningRate 0.0350 Epoch: 8 Global Step: 136330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:26,001-Speed 9153.18 samples/sec Loss 5.8701 LearningRate 0.0350 Epoch: 8 Global Step: 136340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:27,097-Speed 9344.84 samples/sec Loss 6.0150 LearningRate 0.0350 Epoch: 8 Global Step: 136350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:28,134-Speed 9887.88 samples/sec Loss 5.9417 LearningRate 0.0350 Epoch: 8 Global Step: 136360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:29,181-Speed 9784.71 samples/sec Loss 5.9999 LearningRate 0.0350 Epoch: 8 Global Step: 136370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:30,236-Speed 9706.41 samples/sec Loss 5.9955 LearningRate 0.0350 Epoch: 8 Global Step: 136380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:31,319-Speed 9465.15 samples/sec Loss 6.0884 LearningRate 0.0350 Epoch: 8 Global Step: 136390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:32,429-Speed 9235.63 samples/sec Loss 6.0623 LearningRate 0.0350 Epoch: 8 Global Step: 136400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:33,535-Speed 9264.14 samples/sec Loss 6.1963 LearningRate 0.0350 Epoch: 8 Global Step: 136410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:02:34,612-Speed 9509.29 samples/sec Loss 6.1062 LearningRate 0.0350 Epoch: 8 Global Step: 136420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:35,649-Speed 9884.63 samples/sec Loss 6.0293 LearningRate 0.0350 Epoch: 8 Global Step: 136430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:36,739-Speed 9395.95 samples/sec Loss 6.0123 LearningRate 0.0350 Epoch: 8 Global Step: 136440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:37,852-Speed 9204.42 samples/sec Loss 6.0029 LearningRate 0.0350 Epoch: 8 Global Step: 136450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:38,941-Speed 9408.88 samples/sec Loss 6.0101 LearningRate 0.0350 Epoch: 8 Global Step: 136460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:40,036-Speed 9357.42 samples/sec Loss 5.9740 LearningRate 0.0350 Epoch: 8 Global Step: 136470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:41,091-Speed 9714.64 samples/sec Loss 6.0571 LearningRate 0.0349 Epoch: 8 Global Step: 136480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:42,210-Speed 9158.16 samples/sec Loss 6.1399 LearningRate 0.0349 Epoch: 8 Global Step: 136490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:43,326-Speed 9176.91 samples/sec Loss 6.0723 LearningRate 0.0349 Epoch: 8 Global Step: 136500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:44,393-Speed 9605.60 samples/sec Loss 6.0268 LearningRate 0.0349 Epoch: 8 Global Step: 136510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:45,456-Speed 9639.78 samples/sec Loss 6.0567 LearningRate 0.0349 Epoch: 8 Global Step: 136520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:02:46,537-Speed 9482.71 samples/sec Loss 6.1823 LearningRate 0.0349 Epoch: 8 Global Step: 136530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:47,605-Speed 9591.37 samples/sec Loss 6.0007 LearningRate 0.0349 Epoch: 8 Global Step: 136540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:02:48,677-Speed 9557.07 samples/sec Loss 6.0384 LearningRate 0.0349 Epoch: 8 Global Step: 136550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:49,761-Speed 9449.73 samples/sec Loss 6.0989 LearningRate 0.0349 Epoch: 8 Global Step: 136560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:50,861-Speed 9324.02 samples/sec Loss 6.0172 LearningRate 0.0349 Epoch: 8 Global Step: 136570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:51,953-Speed 9378.55 samples/sec Loss 6.2316 LearningRate 0.0349 Epoch: 8 Global Step: 136580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:53,051-Speed 9334.11 samples/sec Loss 6.0674 LearningRate 0.0349 Epoch: 8 Global Step: 136590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:54,115-Speed 9625.59 samples/sec Loss 6.0130 LearningRate 0.0349 Epoch: 8 Global Step: 136600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:55,191-Speed 9521.40 samples/sec Loss 6.0960 LearningRate 0.0349 Epoch: 8 Global Step: 136610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:56,243-Speed 9738.15 samples/sec Loss 6.1652 LearningRate 0.0349 Epoch: 8 Global Step: 136620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:57,326-Speed 9461.21 samples/sec Loss 6.0737 LearningRate 0.0349 Epoch: 8 Global Step: 136630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:58,380-Speed 9720.36 samples/sec Loss 6.1429 LearningRate 0.0349 Epoch: 8 Global Step: 136640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:02:59,420-Speed 9849.85 samples/sec Loss 6.0902 LearningRate 0.0349 Epoch: 8 Global Step: 136650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:00,482-Speed 9658.34 samples/sec Loss 6.0067 LearningRate 0.0349 Epoch: 8 Global Step: 136660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:01,521-Speed 9855.11 samples/sec Loss 6.1647 LearningRate 0.0349 Epoch: 8 Global Step: 136670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:02,593-Speed 9558.41 samples/sec Loss 6.1673 LearningRate 0.0349 Epoch: 8 Global Step: 136680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:03,670-Speed 9523.46 samples/sec Loss 6.2104 LearningRate 0.0349 Epoch: 8 Global Step: 136690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:04,761-Speed 9388.50 samples/sec Loss 6.1220 LearningRate 0.0349 Epoch: 8 Global Step: 136700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:05,830-Speed 9587.34 samples/sec Loss 6.1172 LearningRate 0.0349 Epoch: 8 Global Step: 136710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:06,863-Speed 9914.42 samples/sec Loss 6.0295 LearningRate 0.0349 Epoch: 8 Global Step: 136720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:07,926-Speed 9643.70 samples/sec Loss 6.0673 LearningRate 0.0349 Epoch: 8 Global Step: 136730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:08,981-Speed 9704.08 samples/sec Loss 6.0468 LearningRate 0.0349 Epoch: 8 Global Step: 136740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:10,030-Speed 9776.48 samples/sec Loss 5.9918 LearningRate 0.0349 Epoch: 8 Global Step: 136750 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:03:11,104-Speed 9540.93 samples/sec Loss 5.9507 LearningRate 0.0348 Epoch: 8 Global Step: 136760 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:03:12,190-Speed 9430.20 samples/sec Loss 6.1072 LearningRate 0.0348 Epoch: 8 Global Step: 136770 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:03:13,280-Speed 9403.84 samples/sec Loss 6.1652 LearningRate 0.0348 Epoch: 8 Global Step: 136780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:03:14,310-Speed 9951.32 samples/sec Loss 6.0792 LearningRate 0.0348 Epoch: 8 Global Step: 136790 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:03:15,386-Speed 9519.41 samples/sec Loss 6.0376 LearningRate 0.0348 Epoch: 8 Global Step: 136800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:03:16,463-Speed 9515.15 samples/sec Loss 6.1390 LearningRate 0.0348 Epoch: 8 Global Step: 136810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:17,565-Speed 9298.12 samples/sec Loss 6.1306 LearningRate 0.0348 Epoch: 8 Global Step: 136820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:18,642-Speed 9510.15 samples/sec Loss 6.0538 LearningRate 0.0348 Epoch: 8 Global Step: 136830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:19,725-Speed 9463.04 samples/sec Loss 6.1440 LearningRate 0.0348 Epoch: 8 Global Step: 136840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:20,772-Speed 9798.39 samples/sec Loss 6.0760 LearningRate 0.0348 Epoch: 8 Global Step: 136850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:21,846-Speed 9534.31 samples/sec Loss 6.0837 LearningRate 0.0348 Epoch: 8 Global Step: 136860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:22,949-Speed 9293.54 samples/sec Loss 6.0647 LearningRate 0.0348 Epoch: 8 Global Step: 136870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:24,037-Speed 9415.72 samples/sec Loss 6.1517 LearningRate 0.0348 Epoch: 8 Global Step: 136880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:25,114-Speed 9512.66 samples/sec Loss 6.0851 LearningRate 0.0348 Epoch: 8 Global Step: 136890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:26,194-Speed 9484.77 samples/sec Loss 6.1061 LearningRate 0.0348 Epoch: 8 Global Step: 136900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:27,278-Speed 9451.38 samples/sec Loss 6.1557 LearningRate 0.0348 Epoch: 8 Global Step: 136910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:03:28,344-Speed 9617.32 samples/sec Loss 6.2953 LearningRate 0.0348 Epoch: 8 Global Step: 136920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:29,422-Speed 9507.62 samples/sec Loss 6.1098 LearningRate 0.0348 Epoch: 8 Global Step: 136930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:30,473-Speed 9746.29 samples/sec Loss 6.1224 LearningRate 0.0348 Epoch: 8 Global Step: 136940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:31,562-Speed 9407.88 samples/sec Loss 6.1180 LearningRate 0.0348 Epoch: 8 Global Step: 136950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:32,648-Speed 9436.85 samples/sec Loss 6.1730 LearningRate 0.0348 Epoch: 8 Global Step: 136960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:33,765-Speed 9172.15 samples/sec Loss 6.2489 LearningRate 0.0348 Epoch: 8 Global Step: 136970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:34,815-Speed 9756.43 samples/sec Loss 6.0189 LearningRate 0.0348 Epoch: 8 Global Step: 136980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:35,862-Speed 9791.63 samples/sec Loss 6.0130 LearningRate 0.0348 Epoch: 8 Global Step: 136990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:36,975-Speed 9199.54 samples/sec Loss 6.1665 LearningRate 0.0348 Epoch: 8 Global Step: 137000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:38,035-Speed 9667.26 samples/sec Loss 6.2678 LearningRate 0.0348 Epoch: 8 Global Step: 137010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:39,173-Speed 9006.99 samples/sec Loss 6.2697 LearningRate 0.0348 Epoch: 8 Global Step: 137020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:40,287-Speed 9194.24 samples/sec Loss 6.1570 LearningRate 0.0348 Epoch: 8 Global Step: 137030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:41,373-Speed 9432.65 samples/sec Loss 6.0436 LearningRate 0.0347 Epoch: 8 Global Step: 137040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:42,482-Speed 9242.94 samples/sec Loss 6.1671 LearningRate 0.0347 Epoch: 8 Global Step: 137050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:43,595-Speed 9207.21 samples/sec Loss 6.0274 LearningRate 0.0347 Epoch: 8 Global Step: 137060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:44,674-Speed 9492.16 samples/sec Loss 6.1665 LearningRate 0.0347 Epoch: 8 Global Step: 137070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:45,745-Speed 9572.34 samples/sec Loss 6.0877 LearningRate 0.0347 Epoch: 8 Global Step: 137080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:46,852-Speed 9257.11 samples/sec Loss 6.1468 LearningRate 0.0347 Epoch: 8 Global Step: 137090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:47,936-Speed 9451.40 samples/sec Loss 6.1129 LearningRate 0.0347 Epoch: 8 Global Step: 137100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:49,028-Speed 9381.30 samples/sec Loss 6.1071 LearningRate 0.0347 Epoch: 8 Global Step: 137110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:50,143-Speed 9191.39 samples/sec Loss 6.1348 LearningRate 0.0347 Epoch: 8 Global Step: 137120 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:03:51,208-Speed 9627.04 samples/sec Loss 6.2486 LearningRate 0.0347 Epoch: 8 Global Step: 137130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:52,298-Speed 9394.47 samples/sec Loss 6.1408 LearningRate 0.0347 Epoch: 8 Global Step: 137140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:53,427-Speed 9076.73 samples/sec Loss 6.2426 LearningRate 0.0347 Epoch: 8 Global Step: 137150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:54,507-Speed 9486.11 samples/sec Loss 6.2051 LearningRate 0.0347 Epoch: 8 Global Step: 137160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:55,585-Speed 9503.51 samples/sec Loss 6.1374 LearningRate 0.0347 Epoch: 8 Global Step: 137170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:56,667-Speed 9470.19 samples/sec Loss 6.1847 LearningRate 0.0347 Epoch: 8 Global Step: 137180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:57,724-Speed 9691.71 samples/sec Loss 6.0369 LearningRate 0.0347 Epoch: 8 Global Step: 137190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:58,828-Speed 9278.56 samples/sec Loss 6.1205 LearningRate 0.0347 Epoch: 8 Global Step: 137200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:03:59,910-Speed 9473.19 samples/sec Loss 6.1171 LearningRate 0.0347 Epoch: 8 Global Step: 137210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:00,973-Speed 9651.54 samples/sec Loss 6.0906 LearningRate 0.0347 Epoch: 8 Global Step: 137220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:02,044-Speed 9559.90 samples/sec Loss 6.2198 LearningRate 0.0347 Epoch: 8 Global Step: 137230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:03,136-Speed 9383.36 samples/sec Loss 6.1489 LearningRate 0.0347 Epoch: 8 Global Step: 137240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:04,250-Speed 9202.52 samples/sec Loss 6.1611 LearningRate 0.0347 Epoch: 8 Global Step: 137250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:05,339-Speed 9407.99 samples/sec Loss 6.0573 LearningRate 0.0347 Epoch: 8 Global Step: 137260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:06,454-Speed 9197.68 samples/sec Loss 6.1067 LearningRate 0.0347 Epoch: 8 Global Step: 137270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:07,504-Speed 9760.59 samples/sec Loss 6.1505 LearningRate 0.0347 Epoch: 8 Global Step: 137280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:08,569-Speed 9620.06 samples/sec Loss 6.2302 LearningRate 0.0347 Epoch: 8 Global Step: 137290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:09,662-Speed 9367.97 samples/sec Loss 6.1274 LearningRate 0.0347 Epoch: 8 Global Step: 137300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:10,741-Speed 9497.86 samples/sec Loss 6.1225 LearningRate 0.0347 Epoch: 8 Global Step: 137310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:11,811-Speed 9580.15 samples/sec Loss 6.0935 LearningRate 0.0346 Epoch: 8 Global Step: 137320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:12,876-Speed 9619.55 samples/sec Loss 6.1916 LearningRate 0.0346 Epoch: 8 Global Step: 137330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:04:13,947-Speed 9572.09 samples/sec Loss 6.1238 LearningRate 0.0346 Epoch: 8 Global Step: 137340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:15,010-Speed 9630.95 samples/sec Loss 6.2440 LearningRate 0.0346 Epoch: 8 Global Step: 137350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:16,092-Speed 9472.49 samples/sec Loss 6.2002 LearningRate 0.0346 Epoch: 8 Global Step: 137360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:17,130-Speed 9865.09 samples/sec Loss 6.0921 LearningRate 0.0346 Epoch: 8 Global Step: 137370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:18,225-Speed 9357.93 samples/sec Loss 6.1238 LearningRate 0.0346 Epoch: 8 Global Step: 137380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:19,320-Speed 9357.71 samples/sec Loss 6.1432 LearningRate 0.0346 Epoch: 8 Global Step: 137390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:20,381-Speed 9665.58 samples/sec Loss 6.2252 LearningRate 0.0346 Epoch: 8 Global Step: 137400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:21,462-Speed 9476.22 samples/sec Loss 6.1349 LearningRate 0.0346 Epoch: 8 Global Step: 137410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:22,551-Speed 9406.10 samples/sec Loss 6.2413 LearningRate 0.0346 Epoch: 8 Global Step: 137420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:23,636-Speed 9444.05 samples/sec Loss 6.0554 LearningRate 0.0346 Epoch: 8 Global Step: 137430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:24,737-Speed 9303.44 samples/sec Loss 6.1056 LearningRate 0.0346 Epoch: 8 Global Step: 137440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:04:25,803-Speed 9613.96 samples/sec Loss 6.0868 LearningRate 0.0346 Epoch: 8 Global Step: 137450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:26,867-Speed 9635.76 samples/sec Loss 6.1102 LearningRate 0.0346 Epoch: 8 Global Step: 137460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:27,943-Speed 9520.14 samples/sec Loss 6.0660 LearningRate 0.0346 Epoch: 8 Global Step: 137470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:29,006-Speed 9635.88 samples/sec Loss 6.1637 LearningRate 0.0346 Epoch: 8 Global Step: 137480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:30,076-Speed 9573.31 samples/sec Loss 6.1247 LearningRate 0.0346 Epoch: 8 Global Step: 137490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:31,198-Speed 9131.72 samples/sec Loss 6.1767 LearningRate 0.0346 Epoch: 8 Global Step: 137500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:32,292-Speed 9369.45 samples/sec Loss 6.1754 LearningRate 0.0346 Epoch: 8 Global Step: 137510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:33,392-Speed 9318.94 samples/sec Loss 6.1641 LearningRate 0.0346 Epoch: 8 Global Step: 137520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:34,437-Speed 9801.88 samples/sec Loss 6.1231 LearningRate 0.0346 Epoch: 8 Global Step: 137530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:35,554-Speed 9175.02 samples/sec Loss 6.1489 LearningRate 0.0346 Epoch: 8 Global Step: 137540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:36,668-Speed 9196.22 samples/sec Loss 6.1792 LearningRate 0.0346 Epoch: 8 Global Step: 137550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:37,721-Speed 9724.61 samples/sec Loss 6.2035 LearningRate 0.0346 Epoch: 8 Global Step: 137560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:38,847-Speed 9099.96 samples/sec Loss 6.1318 LearningRate 0.0346 Epoch: 8 Global Step: 137570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:39,949-Speed 9298.13 samples/sec Loss 6.0980 LearningRate 0.0346 Epoch: 8 Global Step: 137580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:41,022-Speed 9548.91 samples/sec Loss 6.0537 LearningRate 0.0346 Epoch: 8 Global Step: 137590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:42,112-Speed 9395.96 samples/sec Loss 6.1828 LearningRate 0.0346 Epoch: 8 Global Step: 137600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:43,191-Speed 9505.99 samples/sec Loss 6.0999 LearningRate 0.0345 Epoch: 8 Global Step: 137610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:44,288-Speed 9342.58 samples/sec Loss 6.0641 LearningRate 0.0345 Epoch: 8 Global Step: 137620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:45,370-Speed 9474.90 samples/sec Loss 6.1854 LearningRate 0.0345 Epoch: 8 Global Step: 137630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:46,438-Speed 9587.75 samples/sec Loss 6.1390 LearningRate 0.0345 Epoch: 8 Global Step: 137640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:47,507-Speed 9583.11 samples/sec Loss 6.0102 LearningRate 0.0345 Epoch: 8 Global Step: 137650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:04:48,593-Speed 9440.19 samples/sec Loss 6.0516 LearningRate 0.0345 Epoch: 8 Global Step: 137660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:04:49,692-Speed 9318.40 samples/sec Loss 6.1228 LearningRate 0.0345 Epoch: 8 Global Step: 137670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:04:50,762-Speed 9576.81 samples/sec Loss 6.1245 LearningRate 0.0345 Epoch: 8 Global Step: 137680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:04:51,907-Speed 8947.81 samples/sec Loss 6.2951 LearningRate 0.0345 Epoch: 8 Global Step: 137690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:04:53,012-Speed 9273.70 samples/sec Loss 6.1327 LearningRate 0.0345 Epoch: 8 Global Step: 137700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:04:54,070-Speed 9688.86 samples/sec Loss 6.1382 LearningRate 0.0345 Epoch: 8 Global Step: 137710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:04:55,128-Speed 9678.81 samples/sec Loss 6.1741 LearningRate 0.0345 Epoch: 8 Global Step: 137720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:04:56,246-Speed 9167.93 samples/sec Loss 6.0621 LearningRate 0.0345 Epoch: 8 Global Step: 137730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:04:57,315-Speed 9589.88 samples/sec Loss 6.0887 LearningRate 0.0345 Epoch: 8 Global Step: 137740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:04:58,376-Speed 9649.95 samples/sec Loss 6.1582 LearningRate 0.0345 Epoch: 8 Global Step: 137750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:04:59,423-Speed 9784.43 samples/sec Loss 6.1629 LearningRate 0.0345 Epoch: 8 Global Step: 137760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:00,499-Speed 9528.67 samples/sec Loss 6.2214 LearningRate 0.0345 Epoch: 8 Global Step: 137770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:01,564-Speed 9616.78 samples/sec Loss 6.0574 LearningRate 0.0345 Epoch: 8 Global Step: 137780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:02,656-Speed 9389.76 samples/sec Loss 6.0621 LearningRate 0.0345 Epoch: 8 Global Step: 137790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:03,735-Speed 9499.62 samples/sec Loss 6.0644 LearningRate 0.0345 Epoch: 8 Global Step: 137800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:04,799-Speed 9630.36 samples/sec Loss 6.1773 LearningRate 0.0345 Epoch: 8 Global Step: 137810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:05,900-Speed 9307.26 samples/sec Loss 6.1374 LearningRate 0.0345 Epoch: 8 Global Step: 137820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:06,982-Speed 9463.34 samples/sec Loss 6.1524 LearningRate 0.0345 Epoch: 8 Global Step: 137830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:08,075-Speed 9373.29 samples/sec Loss 6.2338 LearningRate 0.0345 Epoch: 8 Global Step: 137840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:09,149-Speed 9541.53 samples/sec Loss 6.1893 LearningRate 0.0345 Epoch: 8 Global Step: 137850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:05:10,218-Speed 9581.57 samples/sec Loss 6.2406 LearningRate 0.0345 Epoch: 8 Global Step: 137860 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:05:11,313-Speed 9363.59 samples/sec Loss 6.1753 LearningRate 0.0345 Epoch: 8 Global Step: 137870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:05:12,400-Speed 9418.99 samples/sec Loss 6.2255 LearningRate 0.0345 Epoch: 8 Global Step: 137880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:05:13,511-Speed 9228.35 samples/sec Loss 6.1982 LearningRate 0.0344 Epoch: 8 Global Step: 137890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:05:14,618-Speed 9260.58 samples/sec Loss 6.0685 LearningRate 0.0344 Epoch: 8 Global Step: 137900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:05:15,717-Speed 9317.74 samples/sec Loss 6.2737 LearningRate 0.0344 Epoch: 8 Global Step: 137910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:05:16,788-Speed 9570.46 samples/sec Loss 6.1220 LearningRate 0.0344 Epoch: 8 Global Step: 137920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:05:17,900-Speed 9215.28 samples/sec Loss 6.1493 LearningRate 0.0344 Epoch: 8 Global Step: 137930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:05:18,972-Speed 9551.91 samples/sec Loss 6.1104 LearningRate 0.0344 Epoch: 8 Global Step: 137940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:05:20,108-Speed 9023.22 samples/sec Loss 6.2067 LearningRate 0.0344 Epoch: 8 Global Step: 137950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:05:21,186-Speed 9501.42 samples/sec Loss 6.0943 LearningRate 0.0344 Epoch: 8 Global Step: 137960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:05:22,245-Speed 9681.38 samples/sec Loss 6.1824 LearningRate 0.0344 Epoch: 8 Global Step: 137970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:23,364-Speed 9158.20 samples/sec Loss 6.2593 LearningRate 0.0344 Epoch: 8 Global Step: 137980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:24,411-Speed 9790.98 samples/sec Loss 6.1724 LearningRate 0.0344 Epoch: 8 Global Step: 137990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:25,474-Speed 9637.76 samples/sec Loss 6.1578 LearningRate 0.0344 Epoch: 8 Global Step: 138000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:05:47,336-[lfw][138000]XNorm: 10.263365 Training: 2022-04-11 17:05:47,337-[lfw][138000]Accuracy-Flip: 0.99667+-0.00289 Training: 2022-04-11 17:05:47,337-[lfw][138000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:06:12,608-[cfp_fp][138000]XNorm: 8.801229 Training: 2022-04-11 17:06:12,609-[cfp_fp][138000]Accuracy-Flip: 0.96500+-0.00918 Training: 2022-04-11 17:06:12,609-[cfp_fp][138000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:06:34,369-[agedb_30][138000]XNorm: 9.990886 Training: 2022-04-11 17:06:34,370-[agedb_30][138000]Accuracy-Flip: 0.96300+-0.00991 Training: 2022-04-11 17:06:34,370-[agedb_30][138000]Accuracy-Highest: 0.96650 Training: 2022-04-11 17:06:35,460-Speed 146.32 samples/sec Loss 6.2353 LearningRate 0.0344 Epoch: 8 Global Step: 138010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:36,519-Speed 9677.12 samples/sec Loss 6.1906 LearningRate 0.0344 Epoch: 8 Global Step: 138020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:37,621-Speed 9294.15 samples/sec Loss 6.2120 LearningRate 0.0344 Epoch: 8 Global Step: 138030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:38,717-Speed 9351.88 samples/sec Loss 6.2495 LearningRate 0.0344 Epoch: 8 Global Step: 138040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:39,797-Speed 9478.98 samples/sec Loss 6.1530 LearningRate 0.0344 Epoch: 8 Global Step: 138050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:40,866-Speed 9584.57 samples/sec Loss 6.2038 LearningRate 0.0344 Epoch: 8 Global Step: 138060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:41,934-Speed 9600.64 samples/sec Loss 6.1287 LearningRate 0.0344 Epoch: 8 Global Step: 138070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:06:43,011-Speed 9515.73 samples/sec Loss 6.1864 LearningRate 0.0344 Epoch: 8 Global Step: 138080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:44,104-Speed 9370.54 samples/sec Loss 6.1638 LearningRate 0.0344 Epoch: 8 Global Step: 138090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:45,219-Speed 9194.42 samples/sec Loss 6.0576 LearningRate 0.0344 Epoch: 8 Global Step: 138100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:46,283-Speed 9629.33 samples/sec Loss 6.1313 LearningRate 0.0344 Epoch: 8 Global Step: 138110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:47,389-Speed 9258.49 samples/sec Loss 6.1637 LearningRate 0.0344 Epoch: 8 Global Step: 138120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:48,471-Speed 9470.14 samples/sec Loss 6.1746 LearningRate 0.0344 Epoch: 8 Global Step: 138130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:49,547-Speed 9523.63 samples/sec Loss 6.1852 LearningRate 0.0344 Epoch: 8 Global Step: 138140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:50,595-Speed 9779.24 samples/sec Loss 6.1675 LearningRate 0.0344 Epoch: 8 Global Step: 138150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:51,666-Speed 9571.85 samples/sec Loss 6.1623 LearningRate 0.0344 Epoch: 8 Global Step: 138160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:52,721-Speed 9712.84 samples/sec Loss 6.1640 LearningRate 0.0344 Epoch: 8 Global Step: 138170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:53,810-Speed 9403.74 samples/sec Loss 6.1121 LearningRate 0.0343 Epoch: 8 Global Step: 138180 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:06:54,927-Speed 9173.28 samples/sec Loss 6.3069 LearningRate 0.0343 Epoch: 8 Global Step: 138190 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:06:56,013-Speed 9435.74 samples/sec Loss 6.1738 LearningRate 0.0343 Epoch: 8 Global Step: 138200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:06:57,071-Speed 9681.30 samples/sec Loss 6.1594 LearningRate 0.0343 Epoch: 8 Global Step: 138210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:06:58,148-Speed 9511.97 samples/sec Loss 6.2248 LearningRate 0.0343 Epoch: 8 Global Step: 138220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:06:59,180-Speed 9928.24 samples/sec Loss 6.2151 LearningRate 0.0343 Epoch: 8 Global Step: 138230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:00,222-Speed 9834.83 samples/sec Loss 6.2022 LearningRate 0.0343 Epoch: 8 Global Step: 138240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:01,305-Speed 9460.22 samples/sec Loss 6.2117 LearningRate 0.0343 Epoch: 8 Global Step: 138250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:02,428-Speed 9122.31 samples/sec Loss 6.0512 LearningRate 0.0343 Epoch: 8 Global Step: 138260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:03,513-Speed 9444.58 samples/sec Loss 6.2511 LearningRate 0.0343 Epoch: 8 Global Step: 138270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:04,555-Speed 9838.12 samples/sec Loss 6.2176 LearningRate 0.0343 Epoch: 8 Global Step: 138280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:05,588-Speed 9915.50 samples/sec Loss 6.1855 LearningRate 0.0343 Epoch: 8 Global Step: 138290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:06,681-Speed 9378.63 samples/sec Loss 6.1293 LearningRate 0.0343 Epoch: 8 Global Step: 138300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:07,760-Speed 9490.72 samples/sec Loss 6.2899 LearningRate 0.0343 Epoch: 8 Global Step: 138310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:08,835-Speed 9537.85 samples/sec Loss 6.2051 LearningRate 0.0343 Epoch: 8 Global Step: 138320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:09,918-Speed 9456.92 samples/sec Loss 6.2834 LearningRate 0.0343 Epoch: 8 Global Step: 138330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:10,955-Speed 9874.34 samples/sec Loss 6.1991 LearningRate 0.0343 Epoch: 8 Global Step: 138340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:11,995-Speed 9853.15 samples/sec Loss 6.3085 LearningRate 0.0343 Epoch: 8 Global Step: 138350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:13,101-Speed 9263.15 samples/sec Loss 6.2154 LearningRate 0.0343 Epoch: 8 Global Step: 138360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:14,172-Speed 9572.63 samples/sec Loss 6.1063 LearningRate 0.0343 Epoch: 8 Global Step: 138370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:15,214-Speed 9835.46 samples/sec Loss 6.2188 LearningRate 0.0343 Epoch: 8 Global Step: 138380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:16,325-Speed 9223.73 samples/sec Loss 6.1807 LearningRate 0.0343 Epoch: 8 Global Step: 138390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:17,378-Speed 9735.23 samples/sec Loss 6.2232 LearningRate 0.0343 Epoch: 8 Global Step: 138400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:18,469-Speed 9385.70 samples/sec Loss 6.2086 LearningRate 0.0343 Epoch: 8 Global Step: 138410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:07:19,546-Speed 9512.56 samples/sec Loss 6.0744 LearningRate 0.0343 Epoch: 8 Global Step: 138420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:07:20,620-Speed 9546.44 samples/sec Loss 6.1668 LearningRate 0.0343 Epoch: 8 Global Step: 138430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:21,684-Speed 9624.35 samples/sec Loss 6.0953 LearningRate 0.0343 Epoch: 8 Global Step: 138440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:22,778-Speed 9365.86 samples/sec Loss 6.1973 LearningRate 0.0343 Epoch: 8 Global Step: 138450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:23,846-Speed 9599.44 samples/sec Loss 6.1367 LearningRate 0.0342 Epoch: 8 Global Step: 138460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:24,919-Speed 9543.08 samples/sec Loss 6.1668 LearningRate 0.0342 Epoch: 8 Global Step: 138470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:25,968-Speed 9772.69 samples/sec Loss 6.0890 LearningRate 0.0342 Epoch: 8 Global Step: 138480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:27,033-Speed 9618.72 samples/sec Loss 6.0990 LearningRate 0.0342 Epoch: 8 Global Step: 138490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:28,084-Speed 9747.98 samples/sec Loss 6.2237 LearningRate 0.0342 Epoch: 8 Global Step: 138500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:29,129-Speed 9798.46 samples/sec Loss 6.2839 LearningRate 0.0342 Epoch: 8 Global Step: 138510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:30,168-Speed 9868.10 samples/sec Loss 6.1219 LearningRate 0.0342 Epoch: 8 Global Step: 138520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:31,212-Speed 9808.59 samples/sec Loss 6.2424 LearningRate 0.0342 Epoch: 8 Global Step: 138530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:07:32,246-Speed 9909.96 samples/sec Loss 6.2547 LearningRate 0.0342 Epoch: 8 Global Step: 138540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:33,354-Speed 9250.68 samples/sec Loss 6.1690 LearningRate 0.0342 Epoch: 8 Global Step: 138550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:34,421-Speed 9606.46 samples/sec Loss 6.1582 LearningRate 0.0342 Epoch: 8 Global Step: 138560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:35,518-Speed 9339.40 samples/sec Loss 6.1535 LearningRate 0.0342 Epoch: 8 Global Step: 138570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:36,600-Speed 9473.73 samples/sec Loss 6.2033 LearningRate 0.0342 Epoch: 8 Global Step: 138580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:37,678-Speed 9499.06 samples/sec Loss 6.1442 LearningRate 0.0342 Epoch: 8 Global Step: 138590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:38,761-Speed 9460.24 samples/sec Loss 6.2392 LearningRate 0.0342 Epoch: 8 Global Step: 138600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:39,845-Speed 9456.39 samples/sec Loss 6.1282 LearningRate 0.0342 Epoch: 8 Global Step: 138610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:40,921-Speed 9517.22 samples/sec Loss 6.0966 LearningRate 0.0342 Epoch: 8 Global Step: 138620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:41,976-Speed 9710.00 samples/sec Loss 6.1249 LearningRate 0.0342 Epoch: 8 Global Step: 138630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:43,029-Speed 9734.50 samples/sec Loss 6.1913 LearningRate 0.0342 Epoch: 8 Global Step: 138640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:07:44,127-Speed 9332.10 samples/sec Loss 6.1347 LearningRate 0.0342 Epoch: 8 Global Step: 138650 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:07:45,181-Speed 9718.51 samples/sec Loss 6.1767 LearningRate 0.0342 Epoch: 8 Global Step: 138660 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:07:46,294-Speed 9211.17 samples/sec Loss 6.1651 LearningRate 0.0342 Epoch: 8 Global Step: 138670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:07:47,392-Speed 9329.47 samples/sec Loss 6.1930 LearningRate 0.0342 Epoch: 8 Global Step: 138680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:07:48,473-Speed 9474.06 samples/sec Loss 6.2555 LearningRate 0.0342 Epoch: 8 Global Step: 138690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:07:49,535-Speed 9646.07 samples/sec Loss 6.2747 LearningRate 0.0342 Epoch: 8 Global Step: 138700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:50,662-Speed 9090.33 samples/sec Loss 6.2422 LearningRate 0.0342 Epoch: 8 Global Step: 138710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:51,812-Speed 8921.17 samples/sec Loss 6.2510 LearningRate 0.0342 Epoch: 8 Global Step: 138720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:52,892-Speed 9485.48 samples/sec Loss 6.2415 LearningRate 0.0342 Epoch: 8 Global Step: 138730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:53,965-Speed 9546.92 samples/sec Loss 6.1404 LearningRate 0.0342 Epoch: 8 Global Step: 138740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:55,048-Speed 9463.54 samples/sec Loss 6.1829 LearningRate 0.0341 Epoch: 8 Global Step: 138750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:56,142-Speed 9369.75 samples/sec Loss 6.1579 LearningRate 0.0341 Epoch: 8 Global Step: 138760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:57,219-Speed 9512.60 samples/sec Loss 6.1928 LearningRate 0.0341 Epoch: 8 Global Step: 138770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:58,249-Speed 9939.21 samples/sec Loss 6.1676 LearningRate 0.0341 Epoch: 8 Global Step: 138780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:07:59,323-Speed 9543.49 samples/sec Loss 6.2378 LearningRate 0.0341 Epoch: 8 Global Step: 138790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:00,413-Speed 9402.26 samples/sec Loss 6.2540 LearningRate 0.0341 Epoch: 8 Global Step: 138800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:01,475-Speed 9648.20 samples/sec Loss 6.0985 LearningRate 0.0341 Epoch: 8 Global Step: 138810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:02,569-Speed 9362.41 samples/sec Loss 6.2355 LearningRate 0.0341 Epoch: 8 Global Step: 138820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:03,634-Speed 9618.40 samples/sec Loss 6.1589 LearningRate 0.0341 Epoch: 8 Global Step: 138830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:04,705-Speed 9574.27 samples/sec Loss 6.2079 LearningRate 0.0341 Epoch: 8 Global Step: 138840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:05,786-Speed 9474.76 samples/sec Loss 6.2619 LearningRate 0.0341 Epoch: 8 Global Step: 138850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:06,828-Speed 9831.69 samples/sec Loss 6.1966 LearningRate 0.0341 Epoch: 8 Global Step: 138860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:07,884-Speed 9701.93 samples/sec Loss 6.2543 LearningRate 0.0341 Epoch: 8 Global Step: 138870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:08,955-Speed 9565.82 samples/sec Loss 6.1783 LearningRate 0.0341 Epoch: 8 Global Step: 138880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:10,014-Speed 9674.24 samples/sec Loss 6.2047 LearningRate 0.0341 Epoch: 8 Global Step: 138890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:11,093-Speed 9500.78 samples/sec Loss 6.0948 LearningRate 0.0341 Epoch: 8 Global Step: 138900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:08:12,172-Speed 9497.32 samples/sec Loss 6.1670 LearningRate 0.0341 Epoch: 8 Global Step: 138910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:13,239-Speed 9599.66 samples/sec Loss 6.0335 LearningRate 0.0341 Epoch: 8 Global Step: 138920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:14,324-Speed 9448.58 samples/sec Loss 6.0985 LearningRate 0.0341 Epoch: 8 Global Step: 138930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:15,448-Speed 9115.63 samples/sec Loss 6.2339 LearningRate 0.0341 Epoch: 8 Global Step: 138940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:16,530-Speed 9463.87 samples/sec Loss 6.0687 LearningRate 0.0341 Epoch: 8 Global Step: 138950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:17,594-Speed 9636.36 samples/sec Loss 6.1497 LearningRate 0.0341 Epoch: 8 Global Step: 138960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:18,673-Speed 9494.13 samples/sec Loss 6.1637 LearningRate 0.0341 Epoch: 8 Global Step: 138970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:19,741-Speed 9595.39 samples/sec Loss 6.2015 LearningRate 0.0341 Epoch: 8 Global Step: 138980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:20,794-Speed 9735.71 samples/sec Loss 6.1234 LearningRate 0.0341 Epoch: 8 Global Step: 138990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:21,884-Speed 9395.04 samples/sec Loss 6.1963 LearningRate 0.0341 Epoch: 8 Global Step: 139000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:22,982-Speed 9327.63 samples/sec Loss 6.1591 LearningRate 0.0341 Epoch: 8 Global Step: 139010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:24,056-Speed 9540.21 samples/sec Loss 6.1944 LearningRate 0.0341 Epoch: 8 Global Step: 139020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:25,171-Speed 9190.55 samples/sec Loss 6.1203 LearningRate 0.0340 Epoch: 8 Global Step: 139030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:26,230-Speed 9674.03 samples/sec Loss 6.1824 LearningRate 0.0340 Epoch: 8 Global Step: 139040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:27,324-Speed 9372.17 samples/sec Loss 6.2679 LearningRate 0.0340 Epoch: 8 Global Step: 139050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:28,378-Speed 9716.48 samples/sec Loss 6.1434 LearningRate 0.0340 Epoch: 8 Global Step: 139060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:29,416-Speed 9868.64 samples/sec Loss 6.0650 LearningRate 0.0340 Epoch: 8 Global Step: 139070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:30,507-Speed 9398.22 samples/sec Loss 6.1593 LearningRate 0.0340 Epoch: 8 Global Step: 139080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:31,618-Speed 9227.08 samples/sec Loss 6.1727 LearningRate 0.0340 Epoch: 8 Global Step: 139090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:32,738-Speed 9141.48 samples/sec Loss 6.1711 LearningRate 0.0340 Epoch: 8 Global Step: 139100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:33,838-Speed 9323.84 samples/sec Loss 6.1402 LearningRate 0.0340 Epoch: 8 Global Step: 139110 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:08:34,973-Speed 9023.50 samples/sec Loss 6.2175 LearningRate 0.0340 Epoch: 8 Global Step: 139120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:36,014-Speed 9838.76 samples/sec Loss 6.1082 LearningRate 0.0340 Epoch: 8 Global Step: 139130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:08:37,066-Speed 9746.97 samples/sec Loss 6.2341 LearningRate 0.0340 Epoch: 8 Global Step: 139140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:08:38,093-Speed 9974.38 samples/sec Loss 6.1988 LearningRate 0.0340 Epoch: 8 Global Step: 139150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:08:39,137-Speed 9811.30 samples/sec Loss 6.0432 LearningRate 0.0340 Epoch: 8 Global Step: 139160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:08:40,222-Speed 9442.14 samples/sec Loss 6.1684 LearningRate 0.0340 Epoch: 8 Global Step: 139170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:08:41,335-Speed 9206.73 samples/sec Loss 6.1350 LearningRate 0.0340 Epoch: 8 Global Step: 139180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:08:42,392-Speed 9691.12 samples/sec Loss 6.2078 LearningRate 0.0340 Epoch: 8 Global Step: 139190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:08:43,477-Speed 9443.98 samples/sec Loss 6.1724 LearningRate 0.0340 Epoch: 8 Global Step: 139200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:08:44,560-Speed 9468.08 samples/sec Loss 6.1135 LearningRate 0.0340 Epoch: 8 Global Step: 139210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:08:45,622-Speed 9643.55 samples/sec Loss 6.1988 LearningRate 0.0340 Epoch: 8 Global Step: 139220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:08:46,704-Speed 9468.37 samples/sec Loss 6.1732 LearningRate 0.0340 Epoch: 8 Global Step: 139230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:47,811-Speed 9256.21 samples/sec Loss 6.2660 LearningRate 0.0340 Epoch: 8 Global Step: 139240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:48,906-Speed 9361.10 samples/sec Loss 6.1587 LearningRate 0.0340 Epoch: 8 Global Step: 139250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:49,939-Speed 9914.57 samples/sec Loss 6.1726 LearningRate 0.0340 Epoch: 8 Global Step: 139260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:50,984-Speed 9813.80 samples/sec Loss 6.3172 LearningRate 0.0340 Epoch: 8 Global Step: 139270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:52,063-Speed 9499.46 samples/sec Loss 6.2095 LearningRate 0.0340 Epoch: 8 Global Step: 139280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:53,155-Speed 9375.80 samples/sec Loss 6.2048 LearningRate 0.0340 Epoch: 8 Global Step: 139290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:54,212-Speed 9693.54 samples/sec Loss 6.2638 LearningRate 0.0340 Epoch: 8 Global Step: 139300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:55,261-Speed 9769.28 samples/sec Loss 6.1829 LearningRate 0.0340 Epoch: 8 Global Step: 139310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:56,341-Speed 9489.79 samples/sec Loss 6.1227 LearningRate 0.0339 Epoch: 8 Global Step: 139320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:57,429-Speed 9414.13 samples/sec Loss 6.1510 LearningRate 0.0339 Epoch: 8 Global Step: 139330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:08:58,474-Speed 9801.62 samples/sec Loss 6.1410 LearningRate 0.0339 Epoch: 8 Global Step: 139340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:08:59,568-Speed 9365.48 samples/sec Loss 6.2211 LearningRate 0.0339 Epoch: 8 Global Step: 139350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:00,636-Speed 9593.25 samples/sec Loss 6.1620 LearningRate 0.0339 Epoch: 8 Global Step: 139360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:01,773-Speed 9013.82 samples/sec Loss 6.1493 LearningRate 0.0339 Epoch: 8 Global Step: 139370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:02,859-Speed 9430.68 samples/sec Loss 6.1397 LearningRate 0.0339 Epoch: 8 Global Step: 139380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:03,934-Speed 9541.55 samples/sec Loss 6.0860 LearningRate 0.0339 Epoch: 8 Global Step: 139390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:04,989-Speed 9708.00 samples/sec Loss 6.2170 LearningRate 0.0339 Epoch: 8 Global Step: 139400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:06,056-Speed 9600.55 samples/sec Loss 6.1786 LearningRate 0.0339 Epoch: 8 Global Step: 139410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:07,144-Speed 9417.66 samples/sec Loss 6.0847 LearningRate 0.0339 Epoch: 8 Global Step: 139420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:08,217-Speed 9553.30 samples/sec Loss 6.1826 LearningRate 0.0339 Epoch: 8 Global Step: 139430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:09,305-Speed 9417.40 samples/sec Loss 6.1513 LearningRate 0.0339 Epoch: 8 Global Step: 139440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:09:10,344-Speed 9866.88 samples/sec Loss 6.1951 LearningRate 0.0339 Epoch: 8 Global Step: 139450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:09:11,401-Speed 9692.12 samples/sec Loss 6.2595 LearningRate 0.0339 Epoch: 8 Global Step: 139460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:12,448-Speed 9778.58 samples/sec Loss 6.3053 LearningRate 0.0339 Epoch: 8 Global Step: 139470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:13,514-Speed 9615.94 samples/sec Loss 6.2992 LearningRate 0.0339 Epoch: 8 Global Step: 139480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:14,563-Speed 9773.29 samples/sec Loss 6.2133 LearningRate 0.0339 Epoch: 8 Global Step: 139490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:15,632-Speed 9585.84 samples/sec Loss 6.2015 LearningRate 0.0339 Epoch: 8 Global Step: 139500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:16,727-Speed 9354.53 samples/sec Loss 6.1913 LearningRate 0.0339 Epoch: 8 Global Step: 139510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:17,813-Speed 9433.38 samples/sec Loss 6.2282 LearningRate 0.0339 Epoch: 8 Global Step: 139520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:18,851-Speed 9867.29 samples/sec Loss 6.1450 LearningRate 0.0339 Epoch: 8 Global Step: 139530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:19,907-Speed 9704.52 samples/sec Loss 6.2096 LearningRate 0.0339 Epoch: 8 Global Step: 139540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:20,959-Speed 9738.68 samples/sec Loss 6.3402 LearningRate 0.0339 Epoch: 8 Global Step: 139550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:22,011-Speed 9739.56 samples/sec Loss 6.1670 LearningRate 0.0339 Epoch: 8 Global Step: 139560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:09:23,075-Speed 9636.37 samples/sec Loss 6.3201 LearningRate 0.0339 Epoch: 8 Global Step: 139570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:09:24,122-Speed 9777.85 samples/sec Loss 6.3077 LearningRate 0.0339 Epoch: 8 Global Step: 139580 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:09:25,180-Speed 9687.20 samples/sec Loss 6.2008 LearningRate 0.0339 Epoch: 8 Global Step: 139590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:26,238-Speed 9684.12 samples/sec Loss 6.1829 LearningRate 0.0339 Epoch: 8 Global Step: 139600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:27,275-Speed 9880.95 samples/sec Loss 6.0729 LearningRate 0.0338 Epoch: 8 Global Step: 139610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:28,357-Speed 9473.20 samples/sec Loss 6.1640 LearningRate 0.0338 Epoch: 8 Global Step: 139620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:29,454-Speed 9342.48 samples/sec Loss 6.2228 LearningRate 0.0338 Epoch: 8 Global Step: 139630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:30,522-Speed 9587.14 samples/sec Loss 6.1758 LearningRate 0.0338 Epoch: 8 Global Step: 139640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:31,579-Speed 9696.22 samples/sec Loss 6.2286 LearningRate 0.0338 Epoch: 8 Global Step: 139650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:32,664-Speed 9446.48 samples/sec Loss 6.1019 LearningRate 0.0338 Epoch: 8 Global Step: 139660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:33,748-Speed 9455.68 samples/sec Loss 6.1938 LearningRate 0.0338 Epoch: 8 Global Step: 139670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:09:34,857-Speed 9238.01 samples/sec Loss 6.1694 LearningRate 0.0338 Epoch: 8 Global Step: 139680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:09:35,949-Speed 9383.78 samples/sec Loss 6.2248 LearningRate 0.0338 Epoch: 8 Global Step: 139690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:09:37,025-Speed 9523.68 samples/sec Loss 6.2243 LearningRate 0.0338 Epoch: 8 Global Step: 139700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:09:38,128-Speed 9285.55 samples/sec Loss 6.1946 LearningRate 0.0338 Epoch: 8 Global Step: 139710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:09:39,174-Speed 9796.97 samples/sec Loss 6.2297 LearningRate 0.0338 Epoch: 8 Global Step: 139720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:09:40,312-Speed 9001.48 samples/sec Loss 6.1893 LearningRate 0.0338 Epoch: 8 Global Step: 139730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:09:41,387-Speed 9535.33 samples/sec Loss 6.1897 LearningRate 0.0338 Epoch: 8 Global Step: 139740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:09:42,440-Speed 9728.80 samples/sec Loss 6.2345 LearningRate 0.0338 Epoch: 8 Global Step: 139750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:09:43,486-Speed 9792.62 samples/sec Loss 6.1892 LearningRate 0.0338 Epoch: 8 Global Step: 139760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:09:44,624-Speed 9009.31 samples/sec Loss 6.2017 LearningRate 0.0338 Epoch: 8 Global Step: 139770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:45,701-Speed 9510.90 samples/sec Loss 6.2608 LearningRate 0.0338 Epoch: 8 Global Step: 139780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:46,780-Speed 9500.57 samples/sec Loss 6.2129 LearningRate 0.0338 Epoch: 8 Global Step: 139790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:47,851-Speed 9561.73 samples/sec Loss 6.2710 LearningRate 0.0338 Epoch: 8 Global Step: 139800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:48,956-Speed 9279.19 samples/sec Loss 6.2329 LearningRate 0.0338 Epoch: 8 Global Step: 139810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:50,014-Speed 9685.67 samples/sec Loss 6.1790 LearningRate 0.0338 Epoch: 8 Global Step: 139820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:51,082-Speed 9594.47 samples/sec Loss 6.1835 LearningRate 0.0338 Epoch: 8 Global Step: 139830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:52,135-Speed 9727.00 samples/sec Loss 6.1882 LearningRate 0.0338 Epoch: 8 Global Step: 139840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:53,191-Speed 9701.62 samples/sec Loss 6.1760 LearningRate 0.0338 Epoch: 8 Global Step: 139850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:54,249-Speed 9684.00 samples/sec Loss 6.2529 LearningRate 0.0338 Epoch: 8 Global Step: 139860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:55,312-Speed 9643.04 samples/sec Loss 6.1690 LearningRate 0.0338 Epoch: 8 Global Step: 139870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:09:56,358-Speed 9796.66 samples/sec Loss 6.2088 LearningRate 0.0338 Epoch: 8 Global Step: 139880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:57,433-Speed 9529.86 samples/sec Loss 6.2452 LearningRate 0.0337 Epoch: 8 Global Step: 139890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:58,536-Speed 9284.08 samples/sec Loss 6.1787 LearningRate 0.0337 Epoch: 8 Global Step: 139900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:09:59,594-Speed 9683.26 samples/sec Loss 6.2066 LearningRate 0.0337 Epoch: 8 Global Step: 139910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:10:00,681-Speed 9431.00 samples/sec Loss 6.1712 LearningRate 0.0337 Epoch: 8 Global Step: 139920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:10:01,789-Speed 9244.28 samples/sec Loss 6.1290 LearningRate 0.0337 Epoch: 8 Global Step: 139930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:10:02,850-Speed 9660.01 samples/sec Loss 6.2574 LearningRate 0.0337 Epoch: 8 Global Step: 139940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:10:03,902-Speed 9742.31 samples/sec Loss 6.1915 LearningRate 0.0337 Epoch: 8 Global Step: 139950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:10:04,951-Speed 9772.13 samples/sec Loss 6.1653 LearningRate 0.0337 Epoch: 8 Global Step: 139960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:10:06,055-Speed 9277.49 samples/sec Loss 6.3222 LearningRate 0.0337 Epoch: 8 Global Step: 139970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:10:07,159-Speed 9286.07 samples/sec Loss 6.2665 LearningRate 0.0337 Epoch: 8 Global Step: 139980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:10:08,267-Speed 9248.49 samples/sec Loss 6.1683 LearningRate 0.0337 Epoch: 8 Global Step: 139990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:10:09,380-Speed 9200.82 samples/sec Loss 6.2481 LearningRate 0.0337 Epoch: 8 Global Step: 140000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:10:31,424-[lfw][140000]XNorm: 10.335960 Training: 2022-04-11 17:10:31,425-[lfw][140000]Accuracy-Flip: 0.99600+-0.00281 Training: 2022-04-11 17:10:31,426-[lfw][140000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:10:56,887-[cfp_fp][140000]XNorm: 8.793890 Training: 2022-04-11 17:10:56,888-[cfp_fp][140000]Accuracy-Flip: 0.96229+-0.01035 Training: 2022-04-11 17:10:56,888-[cfp_fp][140000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:11:18,756-[agedb_30][140000]XNorm: 10.007913 Training: 2022-04-11 17:11:18,756-[agedb_30][140000]Accuracy-Flip: 0.96300+-0.00859 Training: 2022-04-11 17:11:18,756-[agedb_30][140000]Accuracy-Highest: 0.96650 Training: 2022-04-11 17:11:19,830-Speed 145.35 samples/sec Loss 6.2682 LearningRate 0.0337 Epoch: 8 Global Step: 140010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:20,915-Speed 9450.39 samples/sec Loss 6.2826 LearningRate 0.0337 Epoch: 8 Global Step: 140020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:21,963-Speed 9774.82 samples/sec Loss 6.1669 LearningRate 0.0337 Epoch: 8 Global Step: 140030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:23,042-Speed 9491.19 samples/sec Loss 6.2485 LearningRate 0.0337 Epoch: 8 Global Step: 140040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:24,102-Speed 9665.78 samples/sec Loss 6.2438 LearningRate 0.0337 Epoch: 8 Global Step: 140050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:25,166-Speed 9634.45 samples/sec Loss 6.2239 LearningRate 0.0337 Epoch: 8 Global Step: 140060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:26,242-Speed 9519.34 samples/sec Loss 6.1736 LearningRate 0.0337 Epoch: 8 Global Step: 140070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:27,367-Speed 9109.85 samples/sec Loss 6.2732 LearningRate 0.0337 Epoch: 8 Global Step: 140080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:28,436-Speed 9586.26 samples/sec Loss 6.2312 LearningRate 0.0337 Epoch: 8 Global Step: 140090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:29,493-Speed 9687.06 samples/sec Loss 6.2207 LearningRate 0.0337 Epoch: 8 Global Step: 140100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:30,519-Speed 9992.67 samples/sec Loss 6.2847 LearningRate 0.0337 Epoch: 8 Global Step: 140110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:31,622-Speed 9293.00 samples/sec Loss 6.1792 LearningRate 0.0337 Epoch: 8 Global Step: 140120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:32,690-Speed 9591.56 samples/sec Loss 6.1653 LearningRate 0.0337 Epoch: 8 Global Step: 140130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:33,740-Speed 9763.75 samples/sec Loss 6.2642 LearningRate 0.0337 Epoch: 8 Global Step: 140140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:34,848-Speed 9247.99 samples/sec Loss 6.2459 LearningRate 0.0337 Epoch: 8 Global Step: 140150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:35,944-Speed 9342.94 samples/sec Loss 6.2307 LearningRate 0.0337 Epoch: 8 Global Step: 140160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:37,024-Speed 9487.54 samples/sec Loss 6.3432 LearningRate 0.0337 Epoch: 8 Global Step: 140170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:38,095-Speed 9566.96 samples/sec Loss 6.2460 LearningRate 0.0336 Epoch: 8 Global Step: 140180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:39,186-Speed 9393.40 samples/sec Loss 6.1958 LearningRate 0.0336 Epoch: 8 Global Step: 140190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:40,242-Speed 9705.24 samples/sec Loss 6.2711 LearningRate 0.0336 Epoch: 8 Global Step: 140200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:11:41,296-Speed 9722.94 samples/sec Loss 6.2054 LearningRate 0.0336 Epoch: 8 Global Step: 140210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:11:42,360-Speed 9626.88 samples/sec Loss 6.2848 LearningRate 0.0336 Epoch: 8 Global Step: 140220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:11:43,440-Speed 9489.59 samples/sec Loss 6.3395 LearningRate 0.0336 Epoch: 8 Global Step: 140230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:44,553-Speed 9205.31 samples/sec Loss 6.2770 LearningRate 0.0336 Epoch: 8 Global Step: 140240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:45,655-Speed 9289.64 samples/sec Loss 6.3425 LearningRate 0.0336 Epoch: 8 Global Step: 140250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:46,751-Speed 9348.10 samples/sec Loss 6.2680 LearningRate 0.0336 Epoch: 8 Global Step: 140260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:47,834-Speed 9463.88 samples/sec Loss 6.1844 LearningRate 0.0336 Epoch: 8 Global Step: 140270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:48,927-Speed 9373.22 samples/sec Loss 6.2395 LearningRate 0.0336 Epoch: 8 Global Step: 140280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:50,011-Speed 9449.27 samples/sec Loss 6.2736 LearningRate 0.0336 Epoch: 8 Global Step: 140290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:51,080-Speed 9589.25 samples/sec Loss 6.2732 LearningRate 0.0336 Epoch: 8 Global Step: 140300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:52,183-Speed 9293.12 samples/sec Loss 6.2472 LearningRate 0.0336 Epoch: 8 Global Step: 140310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:53,241-Speed 9685.17 samples/sec Loss 6.3303 LearningRate 0.0336 Epoch: 8 Global Step: 140320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:54,317-Speed 9519.55 samples/sec Loss 6.3293 LearningRate 0.0336 Epoch: 8 Global Step: 140330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:11:55,422-Speed 9278.16 samples/sec Loss 6.1394 LearningRate 0.0336 Epoch: 8 Global Step: 140340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:11:56,449-Speed 9974.50 samples/sec Loss 6.2563 LearningRate 0.0336 Epoch: 8 Global Step: 140350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:57,519-Speed 9570.40 samples/sec Loss 6.2889 LearningRate 0.0336 Epoch: 8 Global Step: 140360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:58,583-Speed 9634.06 samples/sec Loss 6.1808 LearningRate 0.0336 Epoch: 8 Global Step: 140370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:11:59,631-Speed 9773.33 samples/sec Loss 6.3816 LearningRate 0.0336 Epoch: 8 Global Step: 140380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:00,712-Speed 9484.03 samples/sec Loss 6.2400 LearningRate 0.0336 Epoch: 8 Global Step: 140390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:01,804-Speed 9384.41 samples/sec Loss 6.1150 LearningRate 0.0336 Epoch: 8 Global Step: 140400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:02,912-Speed 9247.24 samples/sec Loss 6.2695 LearningRate 0.0336 Epoch: 8 Global Step: 140410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:04,006-Speed 9367.19 samples/sec Loss 6.2910 LearningRate 0.0336 Epoch: 8 Global Step: 140420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:05,098-Speed 9384.81 samples/sec Loss 6.2508 LearningRate 0.0336 Epoch: 8 Global Step: 140430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:06,197-Speed 9317.88 samples/sec Loss 6.3551 LearningRate 0.0336 Epoch: 8 Global Step: 140440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:07,301-Speed 9281.59 samples/sec Loss 6.3454 LearningRate 0.0336 Epoch: 8 Global Step: 140450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:12:08,365-Speed 9635.85 samples/sec Loss 6.2113 LearningRate 0.0336 Epoch: 8 Global Step: 140460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:12:09,469-Speed 9282.09 samples/sec Loss 6.2544 LearningRate 0.0335 Epoch: 8 Global Step: 140470 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:12:10,545-Speed 9523.42 samples/sec Loss 6.3370 LearningRate 0.0335 Epoch: 8 Global Step: 140480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:12:11,635-Speed 9400.01 samples/sec Loss 6.1713 LearningRate 0.0335 Epoch: 8 Global Step: 140490 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:12:12,737-Speed 9296.15 samples/sec Loss 6.2896 LearningRate 0.0335 Epoch: 8 Global Step: 140500 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:12:13,832-Speed 9353.29 samples/sec Loss 6.2797 LearningRate 0.0335 Epoch: 8 Global Step: 140510 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:12:14,902-Speed 9584.65 samples/sec Loss 6.1125 LearningRate 0.0335 Epoch: 8 Global Step: 140520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:12:15,984-Speed 9469.81 samples/sec Loss 6.4132 LearningRate 0.0335 Epoch: 8 Global Step: 140530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:17,066-Speed 9465.03 samples/sec Loss 6.3524 LearningRate 0.0335 Epoch: 8 Global Step: 140540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:18,127-Speed 9661.26 samples/sec Loss 6.2707 LearningRate 0.0335 Epoch: 8 Global Step: 140550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:19,230-Speed 9286.18 samples/sec Loss 6.1678 LearningRate 0.0335 Epoch: 8 Global Step: 140560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:20,289-Speed 9676.08 samples/sec Loss 6.2362 LearningRate 0.0335 Epoch: 8 Global Step: 140570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:21,389-Speed 9318.75 samples/sec Loss 6.2683 LearningRate 0.0335 Epoch: 8 Global Step: 140580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:22,513-Speed 9111.72 samples/sec Loss 6.3042 LearningRate 0.0335 Epoch: 8 Global Step: 140590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:23,572-Speed 9676.52 samples/sec Loss 6.2088 LearningRate 0.0335 Epoch: 8 Global Step: 140600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:24,686-Speed 9197.40 samples/sec Loss 6.2547 LearningRate 0.0335 Epoch: 8 Global Step: 140610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:25,734-Speed 9782.34 samples/sec Loss 6.3267 LearningRate 0.0335 Epoch: 8 Global Step: 140620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:26,835-Speed 9306.01 samples/sec Loss 6.2326 LearningRate 0.0335 Epoch: 8 Global Step: 140630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:27,942-Speed 9256.07 samples/sec Loss 6.3520 LearningRate 0.0335 Epoch: 8 Global Step: 140640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:29,055-Speed 9198.37 samples/sec Loss 6.2126 LearningRate 0.0335 Epoch: 8 Global Step: 140650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:30,152-Speed 9347.78 samples/sec Loss 6.1800 LearningRate 0.0335 Epoch: 8 Global Step: 140660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:31,220-Speed 9600.54 samples/sec Loss 6.1831 LearningRate 0.0335 Epoch: 8 Global Step: 140670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:32,290-Speed 9567.76 samples/sec Loss 6.3022 LearningRate 0.0335 Epoch: 8 Global Step: 140680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:33,360-Speed 9578.92 samples/sec Loss 6.1232 LearningRate 0.0335 Epoch: 8 Global Step: 140690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:34,424-Speed 9635.16 samples/sec Loss 6.1819 LearningRate 0.0335 Epoch: 8 Global Step: 140700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:35,545-Speed 9138.98 samples/sec Loss 6.2315 LearningRate 0.0335 Epoch: 8 Global Step: 140710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:36,635-Speed 9398.96 samples/sec Loss 6.1675 LearningRate 0.0335 Epoch: 8 Global Step: 140720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:37,736-Speed 9308.75 samples/sec Loss 6.2201 LearningRate 0.0335 Epoch: 8 Global Step: 140730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:38,850-Speed 9195.41 samples/sec Loss 6.3293 LearningRate 0.0335 Epoch: 8 Global Step: 140740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:39,950-Speed 9313.42 samples/sec Loss 6.2777 LearningRate 0.0335 Epoch: 8 Global Step: 140750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:41,028-Speed 9503.47 samples/sec Loss 6.3572 LearningRate 0.0334 Epoch: 8 Global Step: 140760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:42,093-Speed 9623.64 samples/sec Loss 6.1437 LearningRate 0.0334 Epoch: 8 Global Step: 140770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:43,182-Speed 9406.80 samples/sec Loss 6.2443 LearningRate 0.0334 Epoch: 8 Global Step: 140780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:44,307-Speed 9111.90 samples/sec Loss 6.3096 LearningRate 0.0334 Epoch: 8 Global Step: 140790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:12:45,434-Speed 9089.47 samples/sec Loss 6.2478 LearningRate 0.0334 Epoch: 8 Global Step: 140800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:46,558-Speed 9109.45 samples/sec Loss 6.1385 LearningRate 0.0334 Epoch: 8 Global Step: 140810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:47,666-Speed 9250.20 samples/sec Loss 6.2155 LearningRate 0.0334 Epoch: 8 Global Step: 140820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:48,733-Speed 9603.39 samples/sec Loss 6.2028 LearningRate 0.0334 Epoch: 8 Global Step: 140830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:49,830-Speed 9347.64 samples/sec Loss 6.3354 LearningRate 0.0334 Epoch: 8 Global Step: 140840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:50,875-Speed 9808.23 samples/sec Loss 6.2486 LearningRate 0.0334 Epoch: 8 Global Step: 140850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:51,961-Speed 9432.54 samples/sec Loss 6.1988 LearningRate 0.0334 Epoch: 8 Global Step: 140860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:53,043-Speed 9471.79 samples/sec Loss 6.2743 LearningRate 0.0334 Epoch: 8 Global Step: 140870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:54,124-Speed 9477.06 samples/sec Loss 6.3081 LearningRate 0.0334 Epoch: 8 Global Step: 140880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:55,220-Speed 9340.64 samples/sec Loss 6.2391 LearningRate 0.0334 Epoch: 8 Global Step: 140890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:56,277-Speed 9700.18 samples/sec Loss 6.1768 LearningRate 0.0334 Epoch: 8 Global Step: 140900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:12:57,337-Speed 9658.70 samples/sec Loss 6.1306 LearningRate 0.0334 Epoch: 8 Global Step: 140910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:12:58,385-Speed 9776.22 samples/sec Loss 6.1185 LearningRate 0.0334 Epoch: 8 Global Step: 140920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:12:59,490-Speed 9281.11 samples/sec Loss 6.1924 LearningRate 0.0334 Epoch: 8 Global Step: 140930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:00,570-Speed 9482.42 samples/sec Loss 6.2852 LearningRate 0.0334 Epoch: 8 Global Step: 140940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:01,652-Speed 9472.19 samples/sec Loss 6.1857 LearningRate 0.0334 Epoch: 8 Global Step: 140950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:02,779-Speed 9097.20 samples/sec Loss 6.1701 LearningRate 0.0334 Epoch: 8 Global Step: 140960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:03,884-Speed 9268.25 samples/sec Loss 6.1814 LearningRate 0.0334 Epoch: 8 Global Step: 140970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:04,975-Speed 9389.97 samples/sec Loss 6.2486 LearningRate 0.0334 Epoch: 8 Global Step: 140980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:06,017-Speed 9838.24 samples/sec Loss 6.2500 LearningRate 0.0334 Epoch: 8 Global Step: 140990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:07,112-Speed 9355.02 samples/sec Loss 6.2707 LearningRate 0.0334 Epoch: 8 Global Step: 141000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:08,207-Speed 9356.78 samples/sec Loss 6.1966 LearningRate 0.0334 Epoch: 8 Global Step: 141010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:09,278-Speed 9569.98 samples/sec Loss 6.2595 LearningRate 0.0334 Epoch: 8 Global Step: 141020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:10,373-Speed 9359.55 samples/sec Loss 6.2366 LearningRate 0.0334 Epoch: 8 Global Step: 141030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:11,464-Speed 9386.59 samples/sec Loss 6.3040 LearningRate 0.0334 Epoch: 8 Global Step: 141040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:12,532-Speed 9600.69 samples/sec Loss 6.3104 LearningRate 0.0333 Epoch: 8 Global Step: 141050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:13,623-Speed 9389.89 samples/sec Loss 6.2124 LearningRate 0.0333 Epoch: 8 Global Step: 141060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:14,698-Speed 9524.82 samples/sec Loss 6.1347 LearningRate 0.0333 Epoch: 8 Global Step: 141070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:15,791-Speed 9375.49 samples/sec Loss 6.1638 LearningRate 0.0333 Epoch: 8 Global Step: 141080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:16,875-Speed 9450.62 samples/sec Loss 6.1742 LearningRate 0.0333 Epoch: 8 Global Step: 141090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:17,962-Speed 9433.98 samples/sec Loss 6.1620 LearningRate 0.0333 Epoch: 8 Global Step: 141100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:19,077-Speed 9184.92 samples/sec Loss 6.1892 LearningRate 0.0333 Epoch: 8 Global Step: 141110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:20,157-Speed 9486.79 samples/sec Loss 6.2280 LearningRate 0.0333 Epoch: 8 Global Step: 141120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:21,237-Speed 9489.21 samples/sec Loss 6.2494 LearningRate 0.0333 Epoch: 8 Global Step: 141130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:22,288-Speed 9752.29 samples/sec Loss 6.2648 LearningRate 0.0333 Epoch: 8 Global Step: 141140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:23,373-Speed 9444.29 samples/sec Loss 6.2869 LearningRate 0.0333 Epoch: 8 Global Step: 141150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:24,452-Speed 9492.81 samples/sec Loss 6.3179 LearningRate 0.0333 Epoch: 8 Global Step: 141160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:25,540-Speed 9416.78 samples/sec Loss 6.3205 LearningRate 0.0333 Epoch: 8 Global Step: 141170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:26,561-Speed 10035.14 samples/sec Loss 6.1963 LearningRate 0.0333 Epoch: 8 Global Step: 141180 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:13:27,643-Speed 9476.36 samples/sec Loss 6.2865 LearningRate 0.0333 Epoch: 8 Global Step: 141190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:28,736-Speed 9370.64 samples/sec Loss 6.2185 LearningRate 0.0333 Epoch: 8 Global Step: 141200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:29,820-Speed 9448.25 samples/sec Loss 6.2300 LearningRate 0.0333 Epoch: 8 Global Step: 141210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:30,901-Speed 9476.20 samples/sec Loss 6.1885 LearningRate 0.0333 Epoch: 8 Global Step: 141220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:32,013-Speed 9218.80 samples/sec Loss 6.1430 LearningRate 0.0333 Epoch: 8 Global Step: 141230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:33,084-Speed 9573.01 samples/sec Loss 6.1936 LearningRate 0.0333 Epoch: 8 Global Step: 141240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:34,228-Speed 8952.98 samples/sec Loss 6.2997 LearningRate 0.0333 Epoch: 8 Global Step: 141250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:35,358-Speed 9068.75 samples/sec Loss 6.2577 LearningRate 0.0333 Epoch: 8 Global Step: 141260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:36,450-Speed 9385.30 samples/sec Loss 6.2571 LearningRate 0.0333 Epoch: 8 Global Step: 141270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:37,522-Speed 9550.62 samples/sec Loss 6.2020 LearningRate 0.0333 Epoch: 8 Global Step: 141280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:38,621-Speed 9340.93 samples/sec Loss 6.2281 LearningRate 0.0333 Epoch: 8 Global Step: 141290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:39,681-Speed 9667.89 samples/sec Loss 6.2448 LearningRate 0.0333 Epoch: 8 Global Step: 141300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:40,715-Speed 9909.68 samples/sec Loss 6.2503 LearningRate 0.0333 Epoch: 8 Global Step: 141310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:41,792-Speed 9514.97 samples/sec Loss 6.2097 LearningRate 0.0333 Epoch: 8 Global Step: 141320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:42,933-Speed 8973.03 samples/sec Loss 6.1372 LearningRate 0.0332 Epoch: 8 Global Step: 141330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:43,982-Speed 9772.29 samples/sec Loss 6.3722 LearningRate 0.0332 Epoch: 8 Global Step: 141340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:45,076-Speed 9366.31 samples/sec Loss 6.2051 LearningRate 0.0332 Epoch: 8 Global Step: 141350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:46,144-Speed 9589.98 samples/sec Loss 6.2156 LearningRate 0.0332 Epoch: 8 Global Step: 141360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:47,218-Speed 9540.74 samples/sec Loss 6.2208 LearningRate 0.0332 Epoch: 8 Global Step: 141370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:48,319-Speed 9302.85 samples/sec Loss 6.3031 LearningRate 0.0332 Epoch: 8 Global Step: 141380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:49,344-Speed 9999.93 samples/sec Loss 6.1786 LearningRate 0.0332 Epoch: 8 Global Step: 141390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:50,407-Speed 9636.79 samples/sec Loss 6.1795 LearningRate 0.0332 Epoch: 8 Global Step: 141400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:13:51,475-Speed 9597.72 samples/sec Loss 6.2127 LearningRate 0.0332 Epoch: 8 Global Step: 141410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:52,566-Speed 9395.20 samples/sec Loss 6.1241 LearningRate 0.0332 Epoch: 8 Global Step: 141420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:53,692-Speed 9098.96 samples/sec Loss 6.2406 LearningRate 0.0332 Epoch: 8 Global Step: 141430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:54,800-Speed 9243.56 samples/sec Loss 6.2053 LearningRate 0.0332 Epoch: 8 Global Step: 141440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:55,854-Speed 9728.39 samples/sec Loss 6.2237 LearningRate 0.0332 Epoch: 8 Global Step: 141450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:56,943-Speed 9410.18 samples/sec Loss 6.0713 LearningRate 0.0332 Epoch: 8 Global Step: 141460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:58,029-Speed 9428.21 samples/sec Loss 6.2314 LearningRate 0.0332 Epoch: 8 Global Step: 141470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:13:59,105-Speed 9526.52 samples/sec Loss 6.1995 LearningRate 0.0332 Epoch: 8 Global Step: 141480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:00,181-Speed 9521.11 samples/sec Loss 6.2338 LearningRate 0.0332 Epoch: 8 Global Step: 141490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:01,248-Speed 9605.41 samples/sec Loss 6.1946 LearningRate 0.0332 Epoch: 8 Global Step: 141500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:02,271-Speed 10017.41 samples/sec Loss 6.2076 LearningRate 0.0332 Epoch: 8 Global Step: 141510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:14:03,353-Speed 9468.87 samples/sec Loss 6.2005 LearningRate 0.0332 Epoch: 8 Global Step: 141520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:14:04,466-Speed 9204.97 samples/sec Loss 6.3037 LearningRate 0.0332 Epoch: 8 Global Step: 141530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:14:05,614-Speed 8923.51 samples/sec Loss 6.2294 LearningRate 0.0332 Epoch: 8 Global Step: 141540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:14:06,724-Speed 9232.44 samples/sec Loss 6.3281 LearningRate 0.0332 Epoch: 8 Global Step: 141550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:14:07,802-Speed 9512.81 samples/sec Loss 6.2769 LearningRate 0.0332 Epoch: 8 Global Step: 141560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:14:08,888-Speed 9427.58 samples/sec Loss 6.3589 LearningRate 0.0332 Epoch: 8 Global Step: 141570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:14:09,975-Speed 9425.25 samples/sec Loss 6.2042 LearningRate 0.0332 Epoch: 8 Global Step: 141580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:14:11,056-Speed 9481.37 samples/sec Loss 6.4421 LearningRate 0.0332 Epoch: 8 Global Step: 141590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:14:12,102-Speed 9794.39 samples/sec Loss 6.2664 LearningRate 0.0332 Epoch: 8 Global Step: 141600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:14:13,194-Speed 9383.75 samples/sec Loss 6.1666 LearningRate 0.0332 Epoch: 8 Global Step: 141610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:14,296-Speed 9294.04 samples/sec Loss 6.3232 LearningRate 0.0331 Epoch: 8 Global Step: 141620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:15,376-Speed 9493.26 samples/sec Loss 6.0951 LearningRate 0.0331 Epoch: 8 Global Step: 141630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:16,508-Speed 9049.28 samples/sec Loss 6.3430 LearningRate 0.0331 Epoch: 8 Global Step: 141640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:17,637-Speed 9074.13 samples/sec Loss 6.1047 LearningRate 0.0331 Epoch: 8 Global Step: 141650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:18,685-Speed 9776.47 samples/sec Loss 6.1565 LearningRate 0.0331 Epoch: 8 Global Step: 141660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:19,784-Speed 9327.25 samples/sec Loss 6.2603 LearningRate 0.0331 Epoch: 8 Global Step: 141670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:20,865-Speed 9479.05 samples/sec Loss 6.1842 LearningRate 0.0331 Epoch: 8 Global Step: 141680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:21,906-Speed 9839.53 samples/sec Loss 6.2701 LearningRate 0.0331 Epoch: 8 Global Step: 141690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:22,959-Speed 9730.89 samples/sec Loss 6.1429 LearningRate 0.0331 Epoch: 8 Global Step: 141700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:23,998-Speed 9858.77 samples/sec Loss 6.2359 LearningRate 0.0331 Epoch: 8 Global Step: 141710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:25,070-Speed 9561.83 samples/sec Loss 6.1513 LearningRate 0.0331 Epoch: 8 Global Step: 141720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:26,198-Speed 9079.12 samples/sec Loss 6.0568 LearningRate 0.0331 Epoch: 8 Global Step: 141730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:27,280-Speed 9473.02 samples/sec Loss 6.2731 LearningRate 0.0331 Epoch: 8 Global Step: 141740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:28,373-Speed 9371.44 samples/sec Loss 6.3766 LearningRate 0.0331 Epoch: 8 Global Step: 141750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:29,453-Speed 9488.90 samples/sec Loss 6.1716 LearningRate 0.0331 Epoch: 8 Global Step: 141760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:30,534-Speed 9473.90 samples/sec Loss 6.1965 LearningRate 0.0331 Epoch: 8 Global Step: 141770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:31,632-Speed 9341.98 samples/sec Loss 6.2803 LearningRate 0.0331 Epoch: 8 Global Step: 141780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:32,687-Speed 9709.63 samples/sec Loss 6.2294 LearningRate 0.0331 Epoch: 8 Global Step: 141790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:33,781-Speed 9361.90 samples/sec Loss 6.2681 LearningRate 0.0331 Epoch: 8 Global Step: 141800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:34,848-Speed 9604.61 samples/sec Loss 6.3122 LearningRate 0.0331 Epoch: 8 Global Step: 141810 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:14:35,904-Speed 9704.78 samples/sec Loss 6.2339 LearningRate 0.0331 Epoch: 8 Global Step: 141820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:36,991-Speed 9421.52 samples/sec Loss 6.2431 LearningRate 0.0331 Epoch: 8 Global Step: 141830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:38,128-Speed 9014.65 samples/sec Loss 6.2840 LearningRate 0.0331 Epoch: 8 Global Step: 141840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:39,204-Speed 9526.96 samples/sec Loss 6.1848 LearningRate 0.0331 Epoch: 8 Global Step: 141850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:40,289-Speed 9443.76 samples/sec Loss 6.2853 LearningRate 0.0331 Epoch: 8 Global Step: 141860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:41,380-Speed 9388.07 samples/sec Loss 6.1804 LearningRate 0.0331 Epoch: 8 Global Step: 141870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:42,504-Speed 9114.46 samples/sec Loss 6.1434 LearningRate 0.0331 Epoch: 8 Global Step: 141880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:43,616-Speed 9217.22 samples/sec Loss 6.3039 LearningRate 0.0331 Epoch: 8 Global Step: 141890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:44,680-Speed 9625.68 samples/sec Loss 6.2770 LearningRate 0.0331 Epoch: 8 Global Step: 141900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:45,737-Speed 9694.66 samples/sec Loss 6.3574 LearningRate 0.0330 Epoch: 8 Global Step: 141910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:46,787-Speed 9755.17 samples/sec Loss 6.1992 LearningRate 0.0330 Epoch: 8 Global Step: 141920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:14:47,847-Speed 9669.92 samples/sec Loss 6.2963 LearningRate 0.0330 Epoch: 8 Global Step: 141930 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:14:48,925-Speed 9499.32 samples/sec Loss 6.1981 LearningRate 0.0330 Epoch: 8 Global Step: 141940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:14:49,979-Speed 9722.55 samples/sec Loss 6.2547 LearningRate 0.0330 Epoch: 8 Global Step: 141950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:51,092-Speed 9221.47 samples/sec Loss 6.2564 LearningRate 0.0330 Epoch: 8 Global Step: 141960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:52,144-Speed 9731.96 samples/sec Loss 6.0840 LearningRate 0.0330 Epoch: 8 Global Step: 141970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:53,251-Speed 9259.90 samples/sec Loss 6.2489 LearningRate 0.0330 Epoch: 8 Global Step: 141980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:54,349-Speed 9337.15 samples/sec Loss 6.2699 LearningRate 0.0330 Epoch: 8 Global Step: 141990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:14:55,429-Speed 9487.03 samples/sec Loss 6.3012 LearningRate 0.0330 Epoch: 8 Global Step: 142000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:15:17,792-[lfw][142000]XNorm: 10.203931 Training: 2022-04-11 17:15:17,793-[lfw][142000]Accuracy-Flip: 0.99550+-0.00248 Training: 2022-04-11 17:15:17,793-[lfw][142000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:15:43,444-[cfp_fp][142000]XNorm: 8.713773 Training: 2022-04-11 17:15:43,445-[cfp_fp][142000]Accuracy-Flip: 0.95700+-0.00839 Training: 2022-04-11 17:15:43,445-[cfp_fp][142000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:16:05,505-[agedb_30][142000]XNorm: 9.926271 Training: 2022-04-11 17:16:05,562-[agedb_30][142000]Accuracy-Flip: 0.96017+-0.01076 Training: 2022-04-11 17:16:05,562-[agedb_30][142000]Accuracy-Highest: 0.96650 Training: 2022-04-11 17:16:06,685-Speed 143.71 samples/sec Loss 6.2521 LearningRate 0.0330 Epoch: 8 Global Step: 142010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:07,814-Speed 9078.24 samples/sec Loss 6.2125 LearningRate 0.0330 Epoch: 8 Global Step: 142020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:08,927-Speed 9202.04 samples/sec Loss 6.2445 LearningRate 0.0330 Epoch: 8 Global Step: 142030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:16:09,993-Speed 9612.28 samples/sec Loss 6.2976 LearningRate 0.0330 Epoch: 8 Global Step: 142040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:16:11,076-Speed 9456.44 samples/sec Loss 6.2459 LearningRate 0.0330 Epoch: 8 Global Step: 142050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:16:12,125-Speed 9767.28 samples/sec Loss 6.2432 LearningRate 0.0330 Epoch: 8 Global Step: 142060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:16:13,227-Speed 9302.60 samples/sec Loss 6.3893 LearningRate 0.0330 Epoch: 8 Global Step: 142070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:16:14,319-Speed 9385.17 samples/sec Loss 6.1955 LearningRate 0.0330 Epoch: 8 Global Step: 142080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:16:15,433-Speed 9197.20 samples/sec Loss 6.2162 LearningRate 0.0330 Epoch: 8 Global Step: 142090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:16:16,526-Speed 9390.84 samples/sec Loss 6.3551 LearningRate 0.0330 Epoch: 8 Global Step: 142100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:16:17,592-Speed 9606.17 samples/sec Loss 6.3009 LearningRate 0.0330 Epoch: 8 Global Step: 142110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:16:18,695-Speed 9293.52 samples/sec Loss 6.3242 LearningRate 0.0330 Epoch: 8 Global Step: 142120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:16:19,766-Speed 9564.91 samples/sec Loss 6.2606 LearningRate 0.0330 Epoch: 8 Global Step: 142130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:20,849-Speed 9460.39 samples/sec Loss 6.2031 LearningRate 0.0330 Epoch: 8 Global Step: 142140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:21,914-Speed 9626.84 samples/sec Loss 6.1945 LearningRate 0.0330 Epoch: 8 Global Step: 142150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:22,972-Speed 9686.51 samples/sec Loss 6.2228 LearningRate 0.0330 Epoch: 8 Global Step: 142160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:24,046-Speed 9537.13 samples/sec Loss 6.2691 LearningRate 0.0330 Epoch: 8 Global Step: 142170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:25,142-Speed 9348.78 samples/sec Loss 6.2074 LearningRate 0.0330 Epoch: 8 Global Step: 142180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:26,232-Speed 9398.46 samples/sec Loss 6.2541 LearningRate 0.0330 Epoch: 8 Global Step: 142190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:27,318-Speed 9437.48 samples/sec Loss 6.2324 LearningRate 0.0330 Epoch: 8 Global Step: 142200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:28,435-Speed 9167.65 samples/sec Loss 6.2226 LearningRate 0.0329 Epoch: 8 Global Step: 142210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:29,529-Speed 9369.68 samples/sec Loss 6.2300 LearningRate 0.0329 Epoch: 8 Global Step: 142220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:30,626-Speed 9345.16 samples/sec Loss 6.1721 LearningRate 0.0329 Epoch: 8 Global Step: 142230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:16:31,716-Speed 9394.92 samples/sec Loss 6.2712 LearningRate 0.0329 Epoch: 8 Global Step: 142240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:16:32,807-Speed 9389.26 samples/sec Loss 6.2214 LearningRate 0.0329 Epoch: 8 Global Step: 142250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:16:33,854-Speed 9792.35 samples/sec Loss 6.2435 LearningRate 0.0329 Epoch: 8 Global Step: 142260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:34,925-Speed 9563.09 samples/sec Loss 6.1375 LearningRate 0.0329 Epoch: 8 Global Step: 142270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:36,016-Speed 9392.05 samples/sec Loss 6.2855 LearningRate 0.0329 Epoch: 8 Global Step: 142280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:37,114-Speed 9332.13 samples/sec Loss 6.2449 LearningRate 0.0329 Epoch: 8 Global Step: 142290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:38,231-Speed 9173.43 samples/sec Loss 6.2579 LearningRate 0.0329 Epoch: 8 Global Step: 142300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:39,365-Speed 9038.40 samples/sec Loss 6.1604 LearningRate 0.0329 Epoch: 8 Global Step: 142310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:40,466-Speed 9302.41 samples/sec Loss 6.2176 LearningRate 0.0329 Epoch: 8 Global Step: 142320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:41,539-Speed 9553.91 samples/sec Loss 6.2096 LearningRate 0.0329 Epoch: 8 Global Step: 142330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:42,652-Speed 9211.83 samples/sec Loss 6.2204 LearningRate 0.0329 Epoch: 8 Global Step: 142340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:43,718-Speed 9608.32 samples/sec Loss 6.2757 LearningRate 0.0329 Epoch: 8 Global Step: 142350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:44,760-Speed 9834.48 samples/sec Loss 6.2559 LearningRate 0.0329 Epoch: 8 Global Step: 142360 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:16:45,845-Speed 9440.59 samples/sec Loss 6.0951 LearningRate 0.0329 Epoch: 8 Global Step: 142370 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:16:46,945-Speed 9315.35 samples/sec Loss 6.1813 LearningRate 0.0329 Epoch: 8 Global Step: 142380 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:16:48,016-Speed 9570.65 samples/sec Loss 6.1757 LearningRate 0.0329 Epoch: 8 Global Step: 142390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:49,117-Speed 9307.72 samples/sec Loss 6.2160 LearningRate 0.0329 Epoch: 8 Global Step: 142400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:50,186-Speed 9581.52 samples/sec Loss 6.1525 LearningRate 0.0329 Epoch: 8 Global Step: 142410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:51,258-Speed 9560.20 samples/sec Loss 6.3239 LearningRate 0.0329 Epoch: 8 Global Step: 142420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:52,312-Speed 9717.72 samples/sec Loss 6.1660 LearningRate 0.0329 Epoch: 8 Global Step: 142430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:53,406-Speed 9371.76 samples/sec Loss 6.2856 LearningRate 0.0329 Epoch: 8 Global Step: 142440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:54,500-Speed 9358.83 samples/sec Loss 6.2294 LearningRate 0.0329 Epoch: 8 Global Step: 142450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:55,600-Speed 9321.64 samples/sec Loss 6.2947 LearningRate 0.0329 Epoch: 8 Global Step: 142460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:56,685-Speed 9442.30 samples/sec Loss 6.2206 LearningRate 0.0329 Epoch: 8 Global Step: 142470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:57,741-Speed 9699.58 samples/sec Loss 6.2531 LearningRate 0.0329 Epoch: 8 Global Step: 142480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:58,810-Speed 9593.48 samples/sec Loss 6.2851 LearningRate 0.0329 Epoch: 8 Global Step: 142490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:16:59,888-Speed 9507.83 samples/sec Loss 6.2810 LearningRate 0.0328 Epoch: 8 Global Step: 142500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:00,970-Speed 9470.77 samples/sec Loss 6.2293 LearningRate 0.0328 Epoch: 8 Global Step: 142510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:02,049-Speed 9501.04 samples/sec Loss 6.1958 LearningRate 0.0328 Epoch: 8 Global Step: 142520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:03,092-Speed 9823.26 samples/sec Loss 6.1991 LearningRate 0.0328 Epoch: 8 Global Step: 142530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:04,144-Speed 9733.46 samples/sec Loss 6.2493 LearningRate 0.0328 Epoch: 8 Global Step: 142540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:05,231-Speed 9432.73 samples/sec Loss 6.2761 LearningRate 0.0328 Epoch: 8 Global Step: 142550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:06,271-Speed 9846.68 samples/sec Loss 6.1682 LearningRate 0.0328 Epoch: 8 Global Step: 142560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:07,360-Speed 9412.12 samples/sec Loss 6.2326 LearningRate 0.0328 Epoch: 8 Global Step: 142570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:08,435-Speed 9530.69 samples/sec Loss 6.2244 LearningRate 0.0328 Epoch: 8 Global Step: 142580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:09,550-Speed 9190.31 samples/sec Loss 6.2211 LearningRate 0.0328 Epoch: 8 Global Step: 142590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:17:10,636-Speed 9431.64 samples/sec Loss 6.2414 LearningRate 0.0328 Epoch: 8 Global Step: 142600 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:17:11,704-Speed 9592.10 samples/sec Loss 6.2200 LearningRate 0.0328 Epoch: 8 Global Step: 142610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:17:12,782-Speed 9505.69 samples/sec Loss 6.1282 LearningRate 0.0328 Epoch: 8 Global Step: 142620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:13,843-Speed 9656.99 samples/sec Loss 6.2288 LearningRate 0.0328 Epoch: 8 Global Step: 142630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:14,900-Speed 9690.67 samples/sec Loss 6.3227 LearningRate 0.0328 Epoch: 8 Global Step: 142640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:15,968-Speed 9596.80 samples/sec Loss 6.2098 LearningRate 0.0328 Epoch: 8 Global Step: 142650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:17,065-Speed 9338.03 samples/sec Loss 6.2765 LearningRate 0.0328 Epoch: 8 Global Step: 142660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:18,181-Speed 9183.57 samples/sec Loss 6.2747 LearningRate 0.0328 Epoch: 8 Global Step: 142670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:19,319-Speed 9375.05 samples/sec Loss 6.2616 LearningRate 0.0328 Epoch: 8 Global Step: 142680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:20,357-Speed 9874.11 samples/sec Loss 6.2327 LearningRate 0.0328 Epoch: 8 Global Step: 142690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:21,524-Speed 8783.58 samples/sec Loss 6.2562 LearningRate 0.0328 Epoch: 8 Global Step: 142700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:22,716-Speed 9851.92 samples/sec Loss 6.1769 LearningRate 0.0328 Epoch: 8 Global Step: 142710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:23,797-Speed 9478.26 samples/sec Loss 6.3092 LearningRate 0.0328 Epoch: 8 Global Step: 142720 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:17:24,874-Speed 9509.97 samples/sec Loss 6.1495 LearningRate 0.0328 Epoch: 8 Global Step: 142730 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:17:25,943-Speed 9590.62 samples/sec Loss 6.1711 LearningRate 0.0328 Epoch: 8 Global Step: 142740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:27,031-Speed 9416.34 samples/sec Loss 6.2648 LearningRate 0.0328 Epoch: 8 Global Step: 142750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:28,107-Speed 9516.40 samples/sec Loss 6.1904 LearningRate 0.0328 Epoch: 8 Global Step: 142760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:29,172-Speed 9645.67 samples/sec Loss 6.1821 LearningRate 0.0328 Epoch: 8 Global Step: 142770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:30,249-Speed 9512.32 samples/sec Loss 6.2938 LearningRate 0.0328 Epoch: 8 Global Step: 142780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:31,392-Speed 8959.24 samples/sec Loss 6.1568 LearningRate 0.0327 Epoch: 8 Global Step: 142790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:32,514-Speed 9134.04 samples/sec Loss 6.1640 LearningRate 0.0327 Epoch: 8 Global Step: 142800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:33,580-Speed 9606.81 samples/sec Loss 6.1909 LearningRate 0.0327 Epoch: 8 Global Step: 142810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:34,667-Speed 9431.16 samples/sec Loss 6.3279 LearningRate 0.0327 Epoch: 8 Global Step: 142820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:35,707-Speed 9845.98 samples/sec Loss 6.2023 LearningRate 0.0327 Epoch: 8 Global Step: 142830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:36,784-Speed 9514.21 samples/sec Loss 6.2818 LearningRate 0.0327 Epoch: 8 Global Step: 142840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:37,881-Speed 9335.63 samples/sec Loss 6.3060 LearningRate 0.0327 Epoch: 8 Global Step: 142850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:39,007-Speed 9106.10 samples/sec Loss 6.2321 LearningRate 0.0327 Epoch: 8 Global Step: 142860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:40,102-Speed 9365.74 samples/sec Loss 6.2477 LearningRate 0.0327 Epoch: 8 Global Step: 142870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:41,167-Speed 9613.33 samples/sec Loss 6.2040 LearningRate 0.0327 Epoch: 8 Global Step: 142880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:42,275-Speed 9256.75 samples/sec Loss 6.2951 LearningRate 0.0327 Epoch: 8 Global Step: 142890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:43,402-Speed 9090.63 samples/sec Loss 6.2451 LearningRate 0.0327 Epoch: 8 Global Step: 142900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:44,502-Speed 9312.36 samples/sec Loss 6.2632 LearningRate 0.0327 Epoch: 8 Global Step: 142910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:45,595-Speed 9374.85 samples/sec Loss 6.2109 LearningRate 0.0327 Epoch: 8 Global Step: 142920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:46,694-Speed 9323.89 samples/sec Loss 6.2735 LearningRate 0.0327 Epoch: 8 Global Step: 142930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:47,792-Speed 9333.96 samples/sec Loss 6.2998 LearningRate 0.0327 Epoch: 8 Global Step: 142940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:17:48,871-Speed 9488.82 samples/sec Loss 6.2789 LearningRate 0.0327 Epoch: 8 Global Step: 142950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:17:49,977-Speed 9265.04 samples/sec Loss 6.2818 LearningRate 0.0327 Epoch: 8 Global Step: 142960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:51,065-Speed 9423.16 samples/sec Loss 6.1552 LearningRate 0.0327 Epoch: 8 Global Step: 142970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:52,113-Speed 9782.40 samples/sec Loss 6.2180 LearningRate 0.0327 Epoch: 8 Global Step: 142980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:53,188-Speed 9522.23 samples/sec Loss 6.1537 LearningRate 0.0327 Epoch: 8 Global Step: 142990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:54,288-Speed 9316.05 samples/sec Loss 6.1645 LearningRate 0.0327 Epoch: 8 Global Step: 143000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:55,384-Speed 9345.49 samples/sec Loss 6.1936 LearningRate 0.0327 Epoch: 8 Global Step: 143010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:56,458-Speed 9549.98 samples/sec Loss 6.2570 LearningRate 0.0327 Epoch: 8 Global Step: 143020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:57,520-Speed 9646.01 samples/sec Loss 6.2889 LearningRate 0.0327 Epoch: 8 Global Step: 143030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:58,581-Speed 9657.76 samples/sec Loss 6.1239 LearningRate 0.0327 Epoch: 8 Global Step: 143040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:17:59,646-Speed 9621.56 samples/sec Loss 6.2491 LearningRate 0.0327 Epoch: 8 Global Step: 143050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:00,703-Speed 9695.21 samples/sec Loss 6.2031 LearningRate 0.0327 Epoch: 8 Global Step: 143060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:01,758-Speed 9711.58 samples/sec Loss 6.1369 LearningRate 0.0327 Epoch: 8 Global Step: 143070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:02,802-Speed 9814.42 samples/sec Loss 6.2129 LearningRate 0.0326 Epoch: 8 Global Step: 143080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:03,844-Speed 9834.62 samples/sec Loss 6.3342 LearningRate 0.0326 Epoch: 8 Global Step: 143090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:04,924-Speed 9493.38 samples/sec Loss 6.3851 LearningRate 0.0326 Epoch: 8 Global Step: 143100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:05,981-Speed 9684.78 samples/sec Loss 6.1414 LearningRate 0.0326 Epoch: 8 Global Step: 143110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:07,087-Speed 9270.37 samples/sec Loss 6.2778 LearningRate 0.0326 Epoch: 8 Global Step: 143120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:08,216-Speed 9070.24 samples/sec Loss 6.3126 LearningRate 0.0326 Epoch: 8 Global Step: 143130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:09,311-Speed 9360.10 samples/sec Loss 6.2359 LearningRate 0.0326 Epoch: 8 Global Step: 143140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:10,391-Speed 9484.41 samples/sec Loss 6.1346 LearningRate 0.0326 Epoch: 8 Global Step: 143150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:11,441-Speed 9762.90 samples/sec Loss 6.1756 LearningRate 0.0326 Epoch: 8 Global Step: 143160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:18:12,497-Speed 9700.75 samples/sec Loss 6.2401 LearningRate 0.0326 Epoch: 8 Global Step: 143170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:13,573-Speed 9518.84 samples/sec Loss 6.2044 LearningRate 0.0326 Epoch: 8 Global Step: 143180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:14,644-Speed 9569.90 samples/sec Loss 6.2151 LearningRate 0.0326 Epoch: 8 Global Step: 143190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:15,749-Speed 9274.10 samples/sec Loss 6.3294 LearningRate 0.0326 Epoch: 8 Global Step: 143200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:16,864-Speed 9195.10 samples/sec Loss 6.3524 LearningRate 0.0326 Epoch: 8 Global Step: 143210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:17,921-Speed 9698.30 samples/sec Loss 6.3304 LearningRate 0.0326 Epoch: 8 Global Step: 143220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:18,961-Speed 9855.96 samples/sec Loss 6.2984 LearningRate 0.0326 Epoch: 8 Global Step: 143230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:20,069-Speed 9246.47 samples/sec Loss 6.2623 LearningRate 0.0326 Epoch: 8 Global Step: 143240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:21,130-Speed 9656.34 samples/sec Loss 6.3131 LearningRate 0.0326 Epoch: 8 Global Step: 143250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:22,228-Speed 9329.14 samples/sec Loss 6.2306 LearningRate 0.0326 Epoch: 8 Global Step: 143260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:23,339-Speed 9223.89 samples/sec Loss 6.1824 LearningRate 0.0326 Epoch: 8 Global Step: 143270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:24,470-Speed 9053.17 samples/sec Loss 6.2631 LearningRate 0.0326 Epoch: 8 Global Step: 143280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:25,541-Speed 9568.45 samples/sec Loss 6.1838 LearningRate 0.0326 Epoch: 8 Global Step: 143290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:26,620-Speed 9498.19 samples/sec Loss 6.3309 LearningRate 0.0326 Epoch: 8 Global Step: 143300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:27,728-Speed 9250.13 samples/sec Loss 6.2807 LearningRate 0.0326 Epoch: 8 Global Step: 143310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:28,817-Speed 9401.94 samples/sec Loss 6.1661 LearningRate 0.0326 Epoch: 8 Global Step: 143320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:29,845-Speed 9974.41 samples/sec Loss 6.2537 LearningRate 0.0326 Epoch: 8 Global Step: 143330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:30,901-Speed 9703.16 samples/sec Loss 6.3380 LearningRate 0.0326 Epoch: 8 Global Step: 143340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:31,957-Speed 9694.04 samples/sec Loss 6.1979 LearningRate 0.0326 Epoch: 8 Global Step: 143350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:33,039-Speed 9469.99 samples/sec Loss 6.2214 LearningRate 0.0326 Epoch: 8 Global Step: 143360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:34,101-Speed 9657.61 samples/sec Loss 6.2294 LearningRate 0.0325 Epoch: 8 Global Step: 143370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:35,159-Speed 9687.46 samples/sec Loss 6.2500 LearningRate 0.0325 Epoch: 8 Global Step: 143380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:36,228-Speed 9582.02 samples/sec Loss 6.1965 LearningRate 0.0325 Epoch: 8 Global Step: 143390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:37,301-Speed 9554.92 samples/sec Loss 6.2363 LearningRate 0.0325 Epoch: 8 Global Step: 143400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:38,334-Speed 9919.97 samples/sec Loss 6.1952 LearningRate 0.0325 Epoch: 8 Global Step: 143410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:39,361-Speed 9971.64 samples/sec Loss 6.1859 LearningRate 0.0325 Epoch: 8 Global Step: 143420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:40,453-Speed 9381.29 samples/sec Loss 6.1917 LearningRate 0.0325 Epoch: 8 Global Step: 143430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:18:41,546-Speed 9371.53 samples/sec Loss 6.3974 LearningRate 0.0325 Epoch: 8 Global Step: 143440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:42,585-Speed 9868.27 samples/sec Loss 6.2646 LearningRate 0.0325 Epoch: 8 Global Step: 143450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:43,660-Speed 9525.81 samples/sec Loss 6.3012 LearningRate 0.0325 Epoch: 8 Global Step: 143460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:44,703-Speed 9826.51 samples/sec Loss 6.1367 LearningRate 0.0325 Epoch: 8 Global Step: 143470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:45,812-Speed 9241.26 samples/sec Loss 6.2288 LearningRate 0.0325 Epoch: 8 Global Step: 143480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:46,912-Speed 9317.61 samples/sec Loss 6.2516 LearningRate 0.0325 Epoch: 8 Global Step: 143490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:47,951-Speed 9855.10 samples/sec Loss 6.1820 LearningRate 0.0325 Epoch: 8 Global Step: 143500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:49,021-Speed 9578.63 samples/sec Loss 6.2788 LearningRate 0.0325 Epoch: 8 Global Step: 143510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:50,068-Speed 9787.50 samples/sec Loss 6.2282 LearningRate 0.0325 Epoch: 8 Global Step: 143520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:51,119-Speed 9749.94 samples/sec Loss 6.2265 LearningRate 0.0325 Epoch: 8 Global Step: 143530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:52,187-Speed 9590.23 samples/sec Loss 6.3487 LearningRate 0.0325 Epoch: 8 Global Step: 143540 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:18:53,261-Speed 9540.13 samples/sec Loss 6.2757 LearningRate 0.0325 Epoch: 8 Global Step: 143550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:54,368-Speed 9261.69 samples/sec Loss 6.1997 LearningRate 0.0325 Epoch: 8 Global Step: 143560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:55,453-Speed 9440.49 samples/sec Loss 6.2870 LearningRate 0.0325 Epoch: 8 Global Step: 143570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:56,557-Speed 9277.82 samples/sec Loss 6.2815 LearningRate 0.0325 Epoch: 8 Global Step: 143580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:57,646-Speed 9410.57 samples/sec Loss 6.2080 LearningRate 0.0325 Epoch: 8 Global Step: 143590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:58,714-Speed 9600.41 samples/sec Loss 6.2662 LearningRate 0.0325 Epoch: 8 Global Step: 143600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:18:59,798-Speed 9452.46 samples/sec Loss 6.2067 LearningRate 0.0325 Epoch: 8 Global Step: 143610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:00,888-Speed 9393.18 samples/sec Loss 6.3189 LearningRate 0.0325 Epoch: 8 Global Step: 143620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:01,968-Speed 9492.29 samples/sec Loss 6.2190 LearningRate 0.0325 Epoch: 8 Global Step: 143630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:03,079-Speed 9220.64 samples/sec Loss 6.2286 LearningRate 0.0325 Epoch: 8 Global Step: 143640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:04,138-Speed 9681.60 samples/sec Loss 6.2676 LearningRate 0.0325 Epoch: 8 Global Step: 143650 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:05,158-Speed 10042.88 samples/sec Loss 6.2618 LearningRate 0.0324 Epoch: 8 Global Step: 143660 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:06,226-Speed 9594.31 samples/sec Loss 6.2516 LearningRate 0.0324 Epoch: 8 Global Step: 143670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:07,286-Speed 9659.25 samples/sec Loss 6.2950 LearningRate 0.0324 Epoch: 8 Global Step: 143680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:08,381-Speed 9357.86 samples/sec Loss 6.2559 LearningRate 0.0324 Epoch: 8 Global Step: 143690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:09,423-Speed 9833.21 samples/sec Loss 6.3123 LearningRate 0.0324 Epoch: 8 Global Step: 143700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:10,472-Speed 9772.17 samples/sec Loss 6.2791 LearningRate 0.0324 Epoch: 8 Global Step: 143710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:11,554-Speed 9464.08 samples/sec Loss 6.2831 LearningRate 0.0324 Epoch: 8 Global Step: 143720 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:12,601-Speed 9790.64 samples/sec Loss 6.2373 LearningRate 0.0324 Epoch: 8 Global Step: 143730 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:13,662-Speed 9656.83 samples/sec Loss 6.2607 LearningRate 0.0324 Epoch: 8 Global Step: 143740 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:14,709-Speed 9784.02 samples/sec Loss 6.2734 LearningRate 0.0324 Epoch: 8 Global Step: 143750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:15,793-Speed 9451.46 samples/sec Loss 6.3649 LearningRate 0.0324 Epoch: 8 Global Step: 143760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:16,894-Speed 9309.93 samples/sec Loss 6.2950 LearningRate 0.0324 Epoch: 8 Global Step: 143770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:17,948-Speed 9726.12 samples/sec Loss 6.1738 LearningRate 0.0324 Epoch: 8 Global Step: 143780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:19,009-Speed 9650.28 samples/sec Loss 6.2775 LearningRate 0.0324 Epoch: 8 Global Step: 143790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:20,105-Speed 9347.30 samples/sec Loss 6.1722 LearningRate 0.0324 Epoch: 8 Global Step: 143800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:21,142-Speed 9883.96 samples/sec Loss 6.2020 LearningRate 0.0324 Epoch: 8 Global Step: 143810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:22,217-Speed 9529.24 samples/sec Loss 6.1329 LearningRate 0.0324 Epoch: 8 Global Step: 143820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:23,285-Speed 9596.31 samples/sec Loss 6.2327 LearningRate 0.0324 Epoch: 8 Global Step: 143830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:24,378-Speed 9378.29 samples/sec Loss 6.1937 LearningRate 0.0324 Epoch: 8 Global Step: 143840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:25,451-Speed 9546.41 samples/sec Loss 6.1189 LearningRate 0.0324 Epoch: 8 Global Step: 143850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:19:26,518-Speed 9599.86 samples/sec Loss 6.2433 LearningRate 0.0324 Epoch: 8 Global Step: 143860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:27,624-Speed 9267.27 samples/sec Loss 6.2061 LearningRate 0.0324 Epoch: 8 Global Step: 143870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:28,713-Speed 9411.32 samples/sec Loss 6.2265 LearningRate 0.0324 Epoch: 8 Global Step: 143880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:29,785-Speed 9553.27 samples/sec Loss 6.1611 LearningRate 0.0324 Epoch: 8 Global Step: 143890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:30,864-Speed 9501.95 samples/sec Loss 6.2178 LearningRate 0.0324 Epoch: 8 Global Step: 143900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:31,913-Speed 9764.52 samples/sec Loss 6.1456 LearningRate 0.0324 Epoch: 8 Global Step: 143910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:32,974-Speed 9656.05 samples/sec Loss 6.1864 LearningRate 0.0324 Epoch: 8 Global Step: 143920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:34,050-Speed 9528.91 samples/sec Loss 6.2505 LearningRate 0.0324 Epoch: 8 Global Step: 143930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:35,098-Speed 9773.89 samples/sec Loss 6.1750 LearningRate 0.0324 Epoch: 8 Global Step: 143940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:36,141-Speed 9828.10 samples/sec Loss 6.2448 LearningRate 0.0324 Epoch: 8 Global Step: 143950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:37,216-Speed 9526.15 samples/sec Loss 6.2507 LearningRate 0.0323 Epoch: 8 Global Step: 143960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:38,289-Speed 9545.88 samples/sec Loss 6.1799 LearningRate 0.0323 Epoch: 8 Global Step: 143970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:39,395-Speed 9266.24 samples/sec Loss 6.2208 LearningRate 0.0323 Epoch: 8 Global Step: 143980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:40,477-Speed 9473.00 samples/sec Loss 6.2500 LearningRate 0.0323 Epoch: 8 Global Step: 143990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:19:41,552-Speed 9527.91 samples/sec Loss 6.2702 LearningRate 0.0323 Epoch: 8 Global Step: 144000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:20:03,667-[lfw][144000]XNorm: 10.260583 Training: 2022-04-11 17:20:03,668-[lfw][144000]Accuracy-Flip: 0.99600+-0.00281 Training: 2022-04-11 17:20:03,668-[lfw][144000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:20:29,212-[cfp_fp][144000]XNorm: 8.743006 Training: 2022-04-11 17:20:29,213-[cfp_fp][144000]Accuracy-Flip: 0.95929+-0.00957 Training: 2022-04-11 17:20:29,213-[cfp_fp][144000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:20:51,185-[agedb_30][144000]XNorm: 9.975129 Training: 2022-04-11 17:20:51,185-[agedb_30][144000]Accuracy-Flip: 0.96517+-0.01061 Training: 2022-04-11 17:20:51,186-[agedb_30][144000]Accuracy-Highest: 0.96650 Training: 2022-04-11 17:20:52,244-Speed 144.85 samples/sec Loss 6.2565 LearningRate 0.0323 Epoch: 8 Global Step: 144010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:20:53,305-Speed 9664.77 samples/sec Loss 6.2612 LearningRate 0.0323 Epoch: 8 Global Step: 144020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:20:54,349-Speed 9814.89 samples/sec Loss 6.2012 LearningRate 0.0323 Epoch: 8 Global Step: 144030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:20:55,476-Speed 9092.34 samples/sec Loss 6.2585 LearningRate 0.0323 Epoch: 8 Global Step: 144040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:20:56,558-Speed 9462.44 samples/sec Loss 6.1358 LearningRate 0.0323 Epoch: 8 Global Step: 144050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:20:57,633-Speed 9534.99 samples/sec Loss 6.1911 LearningRate 0.0323 Epoch: 8 Global Step: 144060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:20:58,734-Speed 9300.71 samples/sec Loss 6.2470 LearningRate 0.0323 Epoch: 8 Global Step: 144070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:20:59,780-Speed 9795.96 samples/sec Loss 6.3082 LearningRate 0.0323 Epoch: 8 Global Step: 144080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:00,841-Speed 9662.61 samples/sec Loss 6.1587 LearningRate 0.0323 Epoch: 8 Global Step: 144090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:01,896-Speed 9709.61 samples/sec Loss 6.1122 LearningRate 0.0323 Epoch: 8 Global Step: 144100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:02,946-Speed 9759.20 samples/sec Loss 6.3351 LearningRate 0.0323 Epoch: 8 Global Step: 144110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:04,016-Speed 9578.00 samples/sec Loss 6.1769 LearningRate 0.0323 Epoch: 8 Global Step: 144120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:05,083-Speed 9599.62 samples/sec Loss 6.1823 LearningRate 0.0323 Epoch: 8 Global Step: 144130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:06,165-Speed 9477.08 samples/sec Loss 6.3643 LearningRate 0.0323 Epoch: 8 Global Step: 144140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:07,243-Speed 9504.49 samples/sec Loss 6.3187 LearningRate 0.0323 Epoch: 8 Global Step: 144150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:08,317-Speed 9531.31 samples/sec Loss 6.2555 LearningRate 0.0323 Epoch: 8 Global Step: 144160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:09,381-Speed 9632.90 samples/sec Loss 6.2109 LearningRate 0.0323 Epoch: 8 Global Step: 144170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:10,514-Speed 9051.16 samples/sec Loss 6.2943 LearningRate 0.0323 Epoch: 8 Global Step: 144180 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:21:11,608-Speed 9360.50 samples/sec Loss 6.3558 LearningRate 0.0323 Epoch: 8 Global Step: 144190 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:21:12,680-Speed 9564.86 samples/sec Loss 6.2556 LearningRate 0.0323 Epoch: 8 Global Step: 144200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:21:13,790-Speed 9225.74 samples/sec Loss 6.2534 LearningRate 0.0323 Epoch: 8 Global Step: 144210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:21:14,859-Speed 9589.05 samples/sec Loss 6.1214 LearningRate 0.0323 Epoch: 8 Global Step: 144220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:21:15,978-Speed 9155.28 samples/sec Loss 6.1480 LearningRate 0.0323 Epoch: 8 Global Step: 144230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:21:17,056-Speed 9500.77 samples/sec Loss 6.2125 LearningRate 0.0323 Epoch: 8 Global Step: 144240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:18,162-Speed 9260.11 samples/sec Loss 6.2335 LearningRate 0.0322 Epoch: 8 Global Step: 144250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:19,254-Speed 9389.99 samples/sec Loss 6.2786 LearningRate 0.0322 Epoch: 8 Global Step: 144260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:20,374-Speed 9149.84 samples/sec Loss 6.2976 LearningRate 0.0322 Epoch: 8 Global Step: 144270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:21,440-Speed 9607.42 samples/sec Loss 6.1702 LearningRate 0.0322 Epoch: 8 Global Step: 144280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:22,507-Speed 9607.50 samples/sec Loss 6.1547 LearningRate 0.0322 Epoch: 8 Global Step: 144290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:23,591-Speed 9451.33 samples/sec Loss 6.1791 LearningRate 0.0322 Epoch: 8 Global Step: 144300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:24,685-Speed 9369.66 samples/sec Loss 6.2539 LearningRate 0.0322 Epoch: 8 Global Step: 144310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:25,781-Speed 9345.61 samples/sec Loss 6.3106 LearningRate 0.0322 Epoch: 8 Global Step: 144320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:26,878-Speed 9338.23 samples/sec Loss 6.2544 LearningRate 0.0322 Epoch: 8 Global Step: 144330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:27,993-Speed 9188.67 samples/sec Loss 6.3441 LearningRate 0.0322 Epoch: 8 Global Step: 144340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:29,061-Speed 9588.86 samples/sec Loss 6.2328 LearningRate 0.0322 Epoch: 8 Global Step: 144350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:30,148-Speed 9424.42 samples/sec Loss 6.2642 LearningRate 0.0322 Epoch: 8 Global Step: 144360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:31,206-Speed 9686.99 samples/sec Loss 6.2077 LearningRate 0.0322 Epoch: 8 Global Step: 144370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:32,289-Speed 9462.61 samples/sec Loss 6.2781 LearningRate 0.0322 Epoch: 8 Global Step: 144380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:33,408-Speed 9154.93 samples/sec Loss 6.2507 LearningRate 0.0322 Epoch: 8 Global Step: 144390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:34,463-Speed 9718.59 samples/sec Loss 6.2963 LearningRate 0.0322 Epoch: 8 Global Step: 144400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:35,553-Speed 9397.04 samples/sec Loss 6.2991 LearningRate 0.0322 Epoch: 8 Global Step: 144410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:36,620-Speed 9599.45 samples/sec Loss 6.2095 LearningRate 0.0322 Epoch: 8 Global Step: 144420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:37,697-Speed 9513.48 samples/sec Loss 6.1979 LearningRate 0.0322 Epoch: 8 Global Step: 144430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:38,797-Speed 9319.28 samples/sec Loss 6.2582 LearningRate 0.0322 Epoch: 8 Global Step: 144440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:21:39,858-Speed 9658.61 samples/sec Loss 6.1645 LearningRate 0.0322 Epoch: 8 Global Step: 144450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:40,948-Speed 9399.38 samples/sec Loss 6.2045 LearningRate 0.0322 Epoch: 8 Global Step: 144460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:42,038-Speed 9402.20 samples/sec Loss 6.2854 LearningRate 0.0322 Epoch: 8 Global Step: 144470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:43,110-Speed 9563.32 samples/sec Loss 6.2839 LearningRate 0.0322 Epoch: 8 Global Step: 144480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:44,222-Speed 9208.62 samples/sec Loss 6.2386 LearningRate 0.0322 Epoch: 8 Global Step: 144490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:45,294-Speed 9556.09 samples/sec Loss 6.2900 LearningRate 0.0322 Epoch: 8 Global Step: 144500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:46,388-Speed 9365.79 samples/sec Loss 6.3215 LearningRate 0.0322 Epoch: 8 Global Step: 144510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:47,485-Speed 9342.23 samples/sec Loss 6.1237 LearningRate 0.0322 Epoch: 8 Global Step: 144520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:48,578-Speed 9374.40 samples/sec Loss 6.2090 LearningRate 0.0322 Epoch: 8 Global Step: 144530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:49,632-Speed 9718.76 samples/sec Loss 6.2492 LearningRate 0.0322 Epoch: 8 Global Step: 144540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:50,685-Speed 9735.84 samples/sec Loss 6.3475 LearningRate 0.0321 Epoch: 8 Global Step: 144550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:51,779-Speed 9358.27 samples/sec Loss 6.1973 LearningRate 0.0321 Epoch: 8 Global Step: 144560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:52,853-Speed 9541.51 samples/sec Loss 6.2573 LearningRate 0.0321 Epoch: 8 Global Step: 144570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:53,947-Speed 9373.93 samples/sec Loss 6.2470 LearningRate 0.0321 Epoch: 8 Global Step: 144580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:55,016-Speed 9576.96 samples/sec Loss 6.1895 LearningRate 0.0321 Epoch: 8 Global Step: 144590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:56,120-Speed 9281.23 samples/sec Loss 6.1667 LearningRate 0.0321 Epoch: 8 Global Step: 144600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:57,206-Speed 9432.38 samples/sec Loss 6.2180 LearningRate 0.0321 Epoch: 8 Global Step: 144610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:58,307-Speed 9310.94 samples/sec Loss 6.1785 LearningRate 0.0321 Epoch: 8 Global Step: 144620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:21:59,393-Speed 9435.55 samples/sec Loss 6.2345 LearningRate 0.0321 Epoch: 8 Global Step: 144630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:00,466-Speed 9556.67 samples/sec Loss 6.2989 LearningRate 0.0321 Epoch: 8 Global Step: 144640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:01,575-Speed 9232.28 samples/sec Loss 6.3180 LearningRate 0.0321 Epoch: 8 Global Step: 144650 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:22:02,669-Speed 9368.54 samples/sec Loss 6.1660 LearningRate 0.0321 Epoch: 8 Global Step: 144660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:03,761-Speed 9385.51 samples/sec Loss 6.2749 LearningRate 0.0321 Epoch: 8 Global Step: 144670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:04,865-Speed 9278.38 samples/sec Loss 6.1892 LearningRate 0.0321 Epoch: 8 Global Step: 144680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:05,930-Speed 9629.23 samples/sec Loss 6.2649 LearningRate 0.0321 Epoch: 8 Global Step: 144690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:07,006-Speed 9520.26 samples/sec Loss 6.2644 LearningRate 0.0321 Epoch: 8 Global Step: 144700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:08,066-Speed 9663.64 samples/sec Loss 6.1916 LearningRate 0.0321 Epoch: 8 Global Step: 144710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:09,190-Speed 9115.54 samples/sec Loss 6.1447 LearningRate 0.0321 Epoch: 8 Global Step: 144720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:10,384-Speed 9820.37 samples/sec Loss 6.2223 LearningRate 0.0321 Epoch: 8 Global Step: 144730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:11,460-Speed 9521.36 samples/sec Loss 6.1941 LearningRate 0.0321 Epoch: 8 Global Step: 144740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:12,534-Speed 9537.89 samples/sec Loss 6.2897 LearningRate 0.0321 Epoch: 8 Global Step: 144750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:13,598-Speed 9629.40 samples/sec Loss 6.2923 LearningRate 0.0321 Epoch: 8 Global Step: 144760 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:22:14,694-Speed 9348.50 samples/sec Loss 6.2082 LearningRate 0.0321 Epoch: 8 Global Step: 144770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:15,738-Speed 9818.62 samples/sec Loss 6.2806 LearningRate 0.0321 Epoch: 8 Global Step: 144780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:16,803-Speed 9620.95 samples/sec Loss 6.2497 LearningRate 0.0321 Epoch: 8 Global Step: 144790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:17,859-Speed 9700.43 samples/sec Loss 6.1638 LearningRate 0.0321 Epoch: 8 Global Step: 144800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:18,948-Speed 9413.82 samples/sec Loss 6.2501 LearningRate 0.0321 Epoch: 8 Global Step: 144810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:20,050-Speed 9294.91 samples/sec Loss 6.1645 LearningRate 0.0321 Epoch: 8 Global Step: 144820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:21,154-Speed 9278.37 samples/sec Loss 6.1694 LearningRate 0.0321 Epoch: 8 Global Step: 144830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:22,231-Speed 9521.40 samples/sec Loss 6.2272 LearningRate 0.0320 Epoch: 8 Global Step: 144840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:23,324-Speed 9372.30 samples/sec Loss 6.2993 LearningRate 0.0320 Epoch: 8 Global Step: 144850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:24,415-Speed 9390.38 samples/sec Loss 6.2309 LearningRate 0.0320 Epoch: 8 Global Step: 144860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:25,478-Speed 9643.40 samples/sec Loss 6.2774 LearningRate 0.0320 Epoch: 8 Global Step: 144870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:26,563-Speed 9440.06 samples/sec Loss 6.2266 LearningRate 0.0320 Epoch: 8 Global Step: 144880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:27,670-Speed 9255.29 samples/sec Loss 6.3276 LearningRate 0.0320 Epoch: 8 Global Step: 144890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:28,731-Speed 9659.80 samples/sec Loss 6.2670 LearningRate 0.0320 Epoch: 8 Global Step: 144900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:29,820-Speed 9409.61 samples/sec Loss 6.3036 LearningRate 0.0320 Epoch: 8 Global Step: 144910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:30,934-Speed 9199.07 samples/sec Loss 6.2628 LearningRate 0.0320 Epoch: 8 Global Step: 144920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:32,019-Speed 9441.61 samples/sec Loss 6.1363 LearningRate 0.0320 Epoch: 8 Global Step: 144930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:33,097-Speed 9508.60 samples/sec Loss 6.2867 LearningRate 0.0320 Epoch: 8 Global Step: 144940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:34,233-Speed 9019.58 samples/sec Loss 6.2660 LearningRate 0.0320 Epoch: 8 Global Step: 144950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:35,318-Speed 9449.95 samples/sec Loss 6.2517 LearningRate 0.0320 Epoch: 8 Global Step: 144960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:36,401-Speed 9459.84 samples/sec Loss 6.2518 LearningRate 0.0320 Epoch: 8 Global Step: 144970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:37,509-Speed 9245.73 samples/sec Loss 6.2015 LearningRate 0.0320 Epoch: 8 Global Step: 144980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:38,602-Speed 9377.93 samples/sec Loss 6.1984 LearningRate 0.0320 Epoch: 8 Global Step: 144990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:39,677-Speed 9531.43 samples/sec Loss 6.1883 LearningRate 0.0320 Epoch: 8 Global Step: 145000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:40,750-Speed 9549.40 samples/sec Loss 6.1666 LearningRate 0.0320 Epoch: 8 Global Step: 145010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:41,793-Speed 9829.55 samples/sec Loss 6.1972 LearningRate 0.0320 Epoch: 8 Global Step: 145020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:42,881-Speed 9410.71 samples/sec Loss 6.2188 LearningRate 0.0320 Epoch: 8 Global Step: 145030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:43,938-Speed 9699.09 samples/sec Loss 6.2294 LearningRate 0.0320 Epoch: 8 Global Step: 145040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:45,023-Speed 9440.36 samples/sec Loss 6.2728 LearningRate 0.0320 Epoch: 8 Global Step: 145050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:46,114-Speed 9388.28 samples/sec Loss 6.2114 LearningRate 0.0320 Epoch: 8 Global Step: 145060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:47,271-Speed 8861.41 samples/sec Loss 6.1703 LearningRate 0.0320 Epoch: 8 Global Step: 145070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:48,370-Speed 9321.82 samples/sec Loss 6.1500 LearningRate 0.0320 Epoch: 8 Global Step: 145080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:49,466-Speed 9357.28 samples/sec Loss 6.2433 LearningRate 0.0320 Epoch: 8 Global Step: 145090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:50,574-Speed 9248.39 samples/sec Loss 6.2644 LearningRate 0.0320 Epoch: 8 Global Step: 145100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:51,647-Speed 9544.70 samples/sec Loss 6.3326 LearningRate 0.0320 Epoch: 8 Global Step: 145110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:22:52,722-Speed 9535.22 samples/sec Loss 6.3969 LearningRate 0.0320 Epoch: 8 Global Step: 145120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:53,810-Speed 9418.45 samples/sec Loss 6.3117 LearningRate 0.0320 Epoch: 8 Global Step: 145130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:54,903-Speed 9375.21 samples/sec Loss 6.1992 LearningRate 0.0319 Epoch: 8 Global Step: 145140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:55,980-Speed 9521.30 samples/sec Loss 6.1721 LearningRate 0.0319 Epoch: 8 Global Step: 145150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:57,056-Speed 9521.56 samples/sec Loss 6.2500 LearningRate 0.0319 Epoch: 8 Global Step: 145160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:58,152-Speed 9346.00 samples/sec Loss 6.1932 LearningRate 0.0319 Epoch: 8 Global Step: 145170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:22:59,225-Speed 9548.54 samples/sec Loss 6.1775 LearningRate 0.0319 Epoch: 8 Global Step: 145180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:00,270-Speed 9801.37 samples/sec Loss 6.1585 LearningRate 0.0319 Epoch: 8 Global Step: 145190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:01,390-Speed 9145.90 samples/sec Loss 6.2451 LearningRate 0.0319 Epoch: 8 Global Step: 145200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:02,471-Speed 9483.78 samples/sec Loss 6.2057 LearningRate 0.0319 Epoch: 8 Global Step: 145210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:03,581-Speed 9234.72 samples/sec Loss 6.2751 LearningRate 0.0319 Epoch: 8 Global Step: 145220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:04,662-Speed 9480.37 samples/sec Loss 6.2714 LearningRate 0.0319 Epoch: 8 Global Step: 145230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:05,763-Speed 9304.27 samples/sec Loss 6.2974 LearningRate 0.0319 Epoch: 8 Global Step: 145240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:06,851-Speed 9413.61 samples/sec Loss 6.3141 LearningRate 0.0319 Epoch: 8 Global Step: 145250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:07,957-Speed 9270.79 samples/sec Loss 6.1645 LearningRate 0.0319 Epoch: 8 Global Step: 145260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:09,057-Speed 9312.30 samples/sec Loss 6.2021 LearningRate 0.0319 Epoch: 8 Global Step: 145270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:10,140-Speed 9462.51 samples/sec Loss 6.2153 LearningRate 0.0319 Epoch: 8 Global Step: 145280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:11,239-Speed 9321.91 samples/sec Loss 6.2191 LearningRate 0.0319 Epoch: 8 Global Step: 145290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:12,344-Speed 9272.77 samples/sec Loss 6.1880 LearningRate 0.0319 Epoch: 8 Global Step: 145300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:13,501-Speed 8858.73 samples/sec Loss 6.2308 LearningRate 0.0319 Epoch: 8 Global Step: 145310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:14,549-Speed 9773.04 samples/sec Loss 6.1481 LearningRate 0.0319 Epoch: 8 Global Step: 145320 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:23:15,608-Speed 9682.44 samples/sec Loss 6.2717 LearningRate 0.0319 Epoch: 8 Global Step: 145330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:16,668-Speed 9665.97 samples/sec Loss 6.2366 LearningRate 0.0319 Epoch: 8 Global Step: 145340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:17,769-Speed 9300.18 samples/sec Loss 6.1625 LearningRate 0.0319 Epoch: 8 Global Step: 145350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:18,844-Speed 9530.26 samples/sec Loss 6.1877 LearningRate 0.0319 Epoch: 8 Global Step: 145360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:19,915-Speed 9566.62 samples/sec Loss 6.2514 LearningRate 0.0319 Epoch: 8 Global Step: 145370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:20,977-Speed 9653.12 samples/sec Loss 6.2602 LearningRate 0.0319 Epoch: 8 Global Step: 145380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:22,041-Speed 9632.14 samples/sec Loss 6.1522 LearningRate 0.0319 Epoch: 8 Global Step: 145390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:23,125-Speed 9469.93 samples/sec Loss 6.1623 LearningRate 0.0319 Epoch: 8 Global Step: 145400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:24,186-Speed 9658.53 samples/sec Loss 6.2341 LearningRate 0.0319 Epoch: 8 Global Step: 145410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:25,262-Speed 9527.04 samples/sec Loss 6.1304 LearningRate 0.0319 Epoch: 8 Global Step: 145420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:26,337-Speed 9524.64 samples/sec Loss 6.2352 LearningRate 0.0318 Epoch: 8 Global Step: 145430 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:23:27,430-Speed 9376.46 samples/sec Loss 6.2620 LearningRate 0.0318 Epoch: 8 Global Step: 145440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:28,494-Speed 9625.20 samples/sec Loss 6.3057 LearningRate 0.0318 Epoch: 8 Global Step: 145450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:29,566-Speed 9558.54 samples/sec Loss 6.1153 LearningRate 0.0318 Epoch: 8 Global Step: 145460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:30,647-Speed 9479.53 samples/sec Loss 6.1845 LearningRate 0.0318 Epoch: 8 Global Step: 145470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:31,707-Speed 9671.44 samples/sec Loss 6.1607 LearningRate 0.0318 Epoch: 8 Global Step: 145480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:32,769-Speed 9646.19 samples/sec Loss 6.2154 LearningRate 0.0318 Epoch: 8 Global Step: 145490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:33,849-Speed 9495.96 samples/sec Loss 6.2186 LearningRate 0.0318 Epoch: 8 Global Step: 145500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:34,906-Speed 9696.58 samples/sec Loss 6.2580 LearningRate 0.0318 Epoch: 8 Global Step: 145510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:36,007-Speed 9307.09 samples/sec Loss 6.1972 LearningRate 0.0318 Epoch: 8 Global Step: 145520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:37,130-Speed 9116.93 samples/sec Loss 6.2518 LearningRate 0.0318 Epoch: 8 Global Step: 145530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:38,271-Speed 8982.58 samples/sec Loss 6.2559 LearningRate 0.0318 Epoch: 8 Global Step: 145540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:39,368-Speed 9337.83 samples/sec Loss 6.2414 LearningRate 0.0318 Epoch: 8 Global Step: 145550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:40,422-Speed 9721.15 samples/sec Loss 6.2656 LearningRate 0.0318 Epoch: 8 Global Step: 145560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:41,506-Speed 9460.31 samples/sec Loss 6.1907 LearningRate 0.0318 Epoch: 8 Global Step: 145570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:42,618-Speed 9212.52 samples/sec Loss 6.2470 LearningRate 0.0318 Epoch: 8 Global Step: 145580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:43,717-Speed 9325.52 samples/sec Loss 6.2964 LearningRate 0.0318 Epoch: 8 Global Step: 145590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:44,806-Speed 9407.88 samples/sec Loss 6.1682 LearningRate 0.0318 Epoch: 8 Global Step: 145600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:23:45,845-Speed 9853.93 samples/sec Loss 6.1501 LearningRate 0.0318 Epoch: 8 Global Step: 145610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:46,930-Speed 9448.94 samples/sec Loss 6.3069 LearningRate 0.0318 Epoch: 8 Global Step: 145620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:48,003-Speed 9549.46 samples/sec Loss 6.1700 LearningRate 0.0318 Epoch: 8 Global Step: 145630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:49,133-Speed 9065.73 samples/sec Loss 6.3285 LearningRate 0.0318 Epoch: 8 Global Step: 145640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:50,243-Speed 9228.59 samples/sec Loss 6.2477 LearningRate 0.0318 Epoch: 8 Global Step: 145650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:51,352-Speed 9236.94 samples/sec Loss 6.3014 LearningRate 0.0318 Epoch: 8 Global Step: 145660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:52,449-Speed 9345.64 samples/sec Loss 6.3323 LearningRate 0.0318 Epoch: 8 Global Step: 145670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:53,532-Speed 9461.79 samples/sec Loss 6.1538 LearningRate 0.0318 Epoch: 8 Global Step: 145680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:54,591-Speed 9674.05 samples/sec Loss 6.2186 LearningRate 0.0318 Epoch: 8 Global Step: 145690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:55,690-Speed 9322.47 samples/sec Loss 6.2435 LearningRate 0.0318 Epoch: 8 Global Step: 145700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:56,831-Speed 8980.08 samples/sec Loss 6.1795 LearningRate 0.0318 Epoch: 8 Global Step: 145710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:23:57,882-Speed 9746.50 samples/sec Loss 6.1658 LearningRate 0.0318 Epoch: 8 Global Step: 145720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:23:58,964-Speed 9471.12 samples/sec Loss 6.1630 LearningRate 0.0317 Epoch: 8 Global Step: 145730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:00,066-Speed 9295.30 samples/sec Loss 6.1286 LearningRate 0.0317 Epoch: 8 Global Step: 145740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:01,127-Speed 9660.73 samples/sec Loss 6.1650 LearningRate 0.0317 Epoch: 8 Global Step: 145750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:02,201-Speed 9537.99 samples/sec Loss 6.2190 LearningRate 0.0317 Epoch: 8 Global Step: 145760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:03,293-Speed 9377.55 samples/sec Loss 6.2177 LearningRate 0.0317 Epoch: 8 Global Step: 145770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:04,360-Speed 9609.58 samples/sec Loss 6.2067 LearningRate 0.0317 Epoch: 8 Global Step: 145780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:05,438-Speed 9505.72 samples/sec Loss 6.2470 LearningRate 0.0317 Epoch: 8 Global Step: 145790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:06,546-Speed 9249.66 samples/sec Loss 6.1837 LearningRate 0.0317 Epoch: 8 Global Step: 145800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:07,634-Speed 9409.53 samples/sec Loss 6.1293 LearningRate 0.0317 Epoch: 8 Global Step: 145810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:08,715-Speed 9480.66 samples/sec Loss 6.2213 LearningRate 0.0317 Epoch: 8 Global Step: 145820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:09,809-Speed 9368.31 samples/sec Loss 6.3019 LearningRate 0.0317 Epoch: 8 Global Step: 145830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:10,870-Speed 9669.72 samples/sec Loss 6.2080 LearningRate 0.0317 Epoch: 8 Global Step: 145840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:11,953-Speed 9457.92 samples/sec Loss 6.1650 LearningRate 0.0317 Epoch: 8 Global Step: 145850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:13,028-Speed 9533.23 samples/sec Loss 6.1301 LearningRate 0.0317 Epoch: 8 Global Step: 145860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:14,086-Speed 9678.57 samples/sec Loss 6.2801 LearningRate 0.0317 Epoch: 8 Global Step: 145870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:15,141-Speed 9710.54 samples/sec Loss 6.1592 LearningRate 0.0317 Epoch: 8 Global Step: 145880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:16,211-Speed 9574.10 samples/sec Loss 6.2382 LearningRate 0.0317 Epoch: 8 Global Step: 145890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:17,272-Speed 9661.10 samples/sec Loss 6.3815 LearningRate 0.0317 Epoch: 8 Global Step: 145900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:18,317-Speed 9802.05 samples/sec Loss 6.2140 LearningRate 0.0317 Epoch: 8 Global Step: 145910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:19,383-Speed 9612.50 samples/sec Loss 6.3280 LearningRate 0.0317 Epoch: 8 Global Step: 145920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:24:20,469-Speed 9430.36 samples/sec Loss 6.1708 LearningRate 0.0317 Epoch: 8 Global Step: 145930 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:24:21,542-Speed 9549.12 samples/sec Loss 6.2563 LearningRate 0.0317 Epoch: 8 Global Step: 145940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:24:22,590-Speed 9778.45 samples/sec Loss 6.3161 LearningRate 0.0317 Epoch: 8 Global Step: 145950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:24:23,690-Speed 9319.50 samples/sec Loss 6.2587 LearningRate 0.0317 Epoch: 8 Global Step: 145960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:24,793-Speed 9292.64 samples/sec Loss 6.2833 LearningRate 0.0317 Epoch: 8 Global Step: 145970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:25,838-Speed 9802.91 samples/sec Loss 6.2638 LearningRate 0.0317 Epoch: 8 Global Step: 145980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:26,918-Speed 9488.96 samples/sec Loss 6.1933 LearningRate 0.0317 Epoch: 8 Global Step: 145990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:27,977-Speed 9670.61 samples/sec Loss 6.2157 LearningRate 0.0317 Epoch: 8 Global Step: 146000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:24:49,740-[lfw][146000]XNorm: 10.177552 Training: 2022-04-11 17:24:49,741-[lfw][146000]Accuracy-Flip: 0.99617+-0.00269 Training: 2022-04-11 17:24:49,742-[lfw][146000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:25:14,940-[cfp_fp][146000]XNorm: 8.666866 Training: 2022-04-11 17:25:14,940-[cfp_fp][146000]Accuracy-Flip: 0.95743+-0.00777 Training: 2022-04-11 17:25:14,941-[cfp_fp][146000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:25:36,671-[agedb_30][146000]XNorm: 9.805984 Training: 2022-04-11 17:25:36,672-[agedb_30][146000]Accuracy-Flip: 0.96783+-0.00913 Training: 2022-04-11 17:25:36,672-[agedb_30][146000]Accuracy-Highest: 0.96783 Training: 2022-04-11 17:25:37,747-Speed 146.77 samples/sec Loss 6.3326 LearningRate 0.0317 Epoch: 8 Global Step: 146010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:25:38,788-Speed 9849.75 samples/sec Loss 6.2823 LearningRate 0.0316 Epoch: 8 Global Step: 146020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:25:39,895-Speed 9249.01 samples/sec Loss 6.2275 LearningRate 0.0316 Epoch: 8 Global Step: 146030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:25:40,995-Speed 9320.44 samples/sec Loss 6.4050 LearningRate 0.0316 Epoch: 8 Global Step: 146040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:25:42,030-Speed 9892.81 samples/sec Loss 6.2818 LearningRate 0.0316 Epoch: 8 Global Step: 146050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:25:43,156-Speed 9103.56 samples/sec Loss 6.2794 LearningRate 0.0316 Epoch: 8 Global Step: 146060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:25:44,230-Speed 9534.58 samples/sec Loss 6.1632 LearningRate 0.0316 Epoch: 8 Global Step: 146070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:25:45,289-Speed 9677.53 samples/sec Loss 6.2980 LearningRate 0.0316 Epoch: 8 Global Step: 146080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:25:46,372-Speed 9465.36 samples/sec Loss 6.2030 LearningRate 0.0316 Epoch: 8 Global Step: 146090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:25:47,445-Speed 9541.31 samples/sec Loss 6.1731 LearningRate 0.0316 Epoch: 8 Global Step: 146100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:25:48,577-Speed 9052.44 samples/sec Loss 6.2612 LearningRate 0.0316 Epoch: 8 Global Step: 146110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:25:49,674-Speed 9345.95 samples/sec Loss 6.2326 LearningRate 0.0316 Epoch: 8 Global Step: 146120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:25:50,781-Speed 9255.60 samples/sec Loss 6.1902 LearningRate 0.0316 Epoch: 8 Global Step: 146130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:25:51,851-Speed 9567.93 samples/sec Loss 6.2735 LearningRate 0.0316 Epoch: 8 Global Step: 146140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:25:52,952-Speed 9307.96 samples/sec Loss 6.1767 LearningRate 0.0316 Epoch: 8 Global Step: 146150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:25:54,095-Speed 8967.41 samples/sec Loss 6.2588 LearningRate 0.0316 Epoch: 8 Global Step: 146160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:25:55,207-Speed 9209.59 samples/sec Loss 6.1859 LearningRate 0.0316 Epoch: 8 Global Step: 146170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:25:56,308-Speed 9309.27 samples/sec Loss 6.3946 LearningRate 0.0316 Epoch: 8 Global Step: 146180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:25:57,384-Speed 9522.05 samples/sec Loss 6.2578 LearningRate 0.0316 Epoch: 8 Global Step: 146190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:25:58,481-Speed 9338.81 samples/sec Loss 6.2224 LearningRate 0.0316 Epoch: 8 Global Step: 146200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:25:59,563-Speed 9473.92 samples/sec Loss 6.2577 LearningRate 0.0316 Epoch: 8 Global Step: 146210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:00,662-Speed 9319.26 samples/sec Loss 6.2985 LearningRate 0.0316 Epoch: 8 Global Step: 146220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:01,743-Speed 9482.29 samples/sec Loss 6.2377 LearningRate 0.0316 Epoch: 8 Global Step: 146230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:02,799-Speed 9699.28 samples/sec Loss 6.1839 LearningRate 0.0316 Epoch: 8 Global Step: 146240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:03,851-Speed 9740.97 samples/sec Loss 6.2476 LearningRate 0.0316 Epoch: 8 Global Step: 146250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:04,953-Speed 9297.43 samples/sec Loss 6.3055 LearningRate 0.0316 Epoch: 8 Global Step: 146260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:06,045-Speed 9381.66 samples/sec Loss 6.2798 LearningRate 0.0316 Epoch: 8 Global Step: 146270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:07,108-Speed 9636.65 samples/sec Loss 6.3085 LearningRate 0.0316 Epoch: 8 Global Step: 146280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:08,179-Speed 9571.90 samples/sec Loss 6.2009 LearningRate 0.0316 Epoch: 8 Global Step: 146290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:09,249-Speed 9609.92 samples/sec Loss 6.3047 LearningRate 0.0316 Epoch: 8 Global Step: 146300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:10,319-Speed 9572.15 samples/sec Loss 6.2955 LearningRate 0.0316 Epoch: 8 Global Step: 146310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:11,363-Speed 9818.92 samples/sec Loss 6.2895 LearningRate 0.0315 Epoch: 8 Global Step: 146320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:12,443-Speed 9486.60 samples/sec Loss 6.1751 LearningRate 0.0315 Epoch: 8 Global Step: 146330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:13,567-Speed 9110.39 samples/sec Loss 6.0903 LearningRate 0.0315 Epoch: 8 Global Step: 146340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:14,685-Speed 9167.47 samples/sec Loss 6.0840 LearningRate 0.0315 Epoch: 8 Global Step: 146350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:15,767-Speed 9472.78 samples/sec Loss 6.2279 LearningRate 0.0315 Epoch: 8 Global Step: 146360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:16,814-Speed 9785.11 samples/sec Loss 6.2337 LearningRate 0.0315 Epoch: 8 Global Step: 146370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:17,911-Speed 9339.99 samples/sec Loss 6.0151 LearningRate 0.0315 Epoch: 8 Global Step: 146380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:18,956-Speed 9808.89 samples/sec Loss 6.2659 LearningRate 0.0315 Epoch: 8 Global Step: 146390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:20,034-Speed 9507.24 samples/sec Loss 6.2715 LearningRate 0.0315 Epoch: 8 Global Step: 146400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:21,079-Speed 9802.37 samples/sec Loss 6.3527 LearningRate 0.0315 Epoch: 8 Global Step: 146410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:22,135-Speed 9698.99 samples/sec Loss 6.2935 LearningRate 0.0315 Epoch: 8 Global Step: 146420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:23,271-Speed 9017.91 samples/sec Loss 6.2814 LearningRate 0.0315 Epoch: 8 Global Step: 146430 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:24,333-Speed 9652.06 samples/sec Loss 6.2045 LearningRate 0.0315 Epoch: 8 Global Step: 146440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:25,411-Speed 9504.46 samples/sec Loss 6.2597 LearningRate 0.0315 Epoch: 8 Global Step: 146450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:26,516-Speed 9278.74 samples/sec Loss 6.2188 LearningRate 0.0315 Epoch: 8 Global Step: 146460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:27,656-Speed 8986.49 samples/sec Loss 6.2549 LearningRate 0.0315 Epoch: 8 Global Step: 146470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:28,722-Speed 9611.61 samples/sec Loss 6.1611 LearningRate 0.0315 Epoch: 8 Global Step: 146480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:29,775-Speed 9727.00 samples/sec Loss 6.2497 LearningRate 0.0315 Epoch: 8 Global Step: 146490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:30,881-Speed 9265.09 samples/sec Loss 6.2453 LearningRate 0.0315 Epoch: 8 Global Step: 146500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:31,964-Speed 9460.76 samples/sec Loss 6.2589 LearningRate 0.0315 Epoch: 8 Global Step: 146510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:33,082-Speed 9165.41 samples/sec Loss 6.2153 LearningRate 0.0315 Epoch: 8 Global Step: 146520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:34,198-Speed 9178.62 samples/sec Loss 6.1563 LearningRate 0.0315 Epoch: 8 Global Step: 146530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:35,244-Speed 9798.86 samples/sec Loss 6.2609 LearningRate 0.0315 Epoch: 8 Global Step: 146540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:36,285-Speed 9840.06 samples/sec Loss 6.2656 LearningRate 0.0315 Epoch: 8 Global Step: 146550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:37,340-Speed 9712.29 samples/sec Loss 6.2555 LearningRate 0.0315 Epoch: 8 Global Step: 146560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:38,433-Speed 9373.12 samples/sec Loss 6.2905 LearningRate 0.0315 Epoch: 8 Global Step: 146570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:39,523-Speed 9405.11 samples/sec Loss 6.2037 LearningRate 0.0315 Epoch: 8 Global Step: 146580 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:40,602-Speed 9491.26 samples/sec Loss 6.3187 LearningRate 0.0315 Epoch: 8 Global Step: 146590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:41,669-Speed 9605.48 samples/sec Loss 6.1369 LearningRate 0.0315 Epoch: 8 Global Step: 146600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:42,803-Speed 9030.11 samples/sec Loss 6.1719 LearningRate 0.0315 Epoch: 8 Global Step: 146610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:43,842-Speed 9865.65 samples/sec Loss 6.1395 LearningRate 0.0314 Epoch: 8 Global Step: 146620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:44,946-Speed 9277.54 samples/sec Loss 6.1652 LearningRate 0.0314 Epoch: 8 Global Step: 146630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:45,983-Speed 9878.59 samples/sec Loss 6.2678 LearningRate 0.0314 Epoch: 8 Global Step: 146640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:47,035-Speed 9746.31 samples/sec Loss 6.3035 LearningRate 0.0314 Epoch: 8 Global Step: 146650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:48,119-Speed 9446.39 samples/sec Loss 6.2851 LearningRate 0.0314 Epoch: 8 Global Step: 146660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:49,201-Speed 9471.15 samples/sec Loss 6.2417 LearningRate 0.0314 Epoch: 8 Global Step: 146670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:50,267-Speed 9616.28 samples/sec Loss 6.3555 LearningRate 0.0314 Epoch: 8 Global Step: 146680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:51,333-Speed 9613.07 samples/sec Loss 6.1504 LearningRate 0.0314 Epoch: 8 Global Step: 146690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:26:52,381-Speed 9776.24 samples/sec Loss 6.1952 LearningRate 0.0314 Epoch: 8 Global Step: 146700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:53,481-Speed 9314.07 samples/sec Loss 6.2931 LearningRate 0.0314 Epoch: 8 Global Step: 146710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:54,559-Speed 9505.13 samples/sec Loss 6.2208 LearningRate 0.0314 Epoch: 8 Global Step: 146720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:55,629-Speed 9580.45 samples/sec Loss 6.3204 LearningRate 0.0314 Epoch: 8 Global Step: 146730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:56,729-Speed 9314.51 samples/sec Loss 6.2094 LearningRate 0.0314 Epoch: 8 Global Step: 146740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:57,802-Speed 9544.38 samples/sec Loss 6.1554 LearningRate 0.0314 Epoch: 8 Global Step: 146750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:58,863-Speed 9658.22 samples/sec Loss 6.1547 LearningRate 0.0314 Epoch: 8 Global Step: 146760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:26:59,945-Speed 9469.53 samples/sec Loss 6.2165 LearningRate 0.0314 Epoch: 8 Global Step: 146770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:01,005-Speed 9665.39 samples/sec Loss 6.1764 LearningRate 0.0314 Epoch: 8 Global Step: 146780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:02,047-Speed 9831.55 samples/sec Loss 6.1967 LearningRate 0.0314 Epoch: 8 Global Step: 146790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:03,127-Speed 9485.76 samples/sec Loss 6.1941 LearningRate 0.0314 Epoch: 8 Global Step: 146800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:27:04,220-Speed 9374.32 samples/sec Loss 6.2290 LearningRate 0.0314 Epoch: 8 Global Step: 146810 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:27:05,285-Speed 9619.42 samples/sec Loss 6.1451 LearningRate 0.0314 Epoch: 8 Global Step: 146820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-11 17:27:06,363-Speed 9507.59 samples/sec Loss 6.2875 LearningRate 0.0314 Epoch: 8 Global Step: 146830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:07,489-Speed 9101.11 samples/sec Loss 6.1441 LearningRate 0.0314 Epoch: 8 Global Step: 146840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:08,614-Speed 9120.29 samples/sec Loss 6.0730 LearningRate 0.0314 Epoch: 8 Global Step: 146850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:09,707-Speed 9374.12 samples/sec Loss 6.1787 LearningRate 0.0314 Epoch: 8 Global Step: 146860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:10,817-Speed 9230.52 samples/sec Loss 6.1640 LearningRate 0.0314 Epoch: 8 Global Step: 146870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:11,880-Speed 9647.49 samples/sec Loss 6.3178 LearningRate 0.0314 Epoch: 8 Global Step: 146880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:12,979-Speed 9321.95 samples/sec Loss 6.1887 LearningRate 0.0314 Epoch: 8 Global Step: 146890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:14,076-Speed 9336.62 samples/sec Loss 6.2109 LearningRate 0.0314 Epoch: 8 Global Step: 146900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:15,151-Speed 9527.64 samples/sec Loss 6.2541 LearningRate 0.0314 Epoch: 8 Global Step: 146910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:16,271-Speed 9716.64 samples/sec Loss 6.2130 LearningRate 0.0313 Epoch: 8 Global Step: 146920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:17,356-Speed 9435.97 samples/sec Loss 6.2557 LearningRate 0.0313 Epoch: 8 Global Step: 146930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:18,422-Speed 9612.86 samples/sec Loss 6.3504 LearningRate 0.0313 Epoch: 8 Global Step: 146940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:19,540-Speed 9178.04 samples/sec Loss 6.1107 LearningRate 0.0313 Epoch: 8 Global Step: 146950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:20,606-Speed 9614.81 samples/sec Loss 6.1484 LearningRate 0.0313 Epoch: 8 Global Step: 146960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:21,669-Speed 9637.97 samples/sec Loss 6.1087 LearningRate 0.0313 Epoch: 8 Global Step: 146970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:22,729-Speed 9661.56 samples/sec Loss 6.2602 LearningRate 0.0313 Epoch: 8 Global Step: 146980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:23,763-Speed 9910.14 samples/sec Loss 6.3461 LearningRate 0.0313 Epoch: 8 Global Step: 146990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:24,841-Speed 9508.40 samples/sec Loss 6.1490 LearningRate 0.0313 Epoch: 8 Global Step: 147000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:25,952-Speed 9228.54 samples/sec Loss 6.2462 LearningRate 0.0313 Epoch: 8 Global Step: 147010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:27,017-Speed 9613.70 samples/sec Loss 6.1557 LearningRate 0.0313 Epoch: 8 Global Step: 147020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:28,113-Speed 9349.35 samples/sec Loss 6.1529 LearningRate 0.0313 Epoch: 8 Global Step: 147030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:27:29,183-Speed 9582.46 samples/sec Loss 6.2260 LearningRate 0.0313 Epoch: 8 Global Step: 147040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:27:30,267-Speed 9454.53 samples/sec Loss 6.2123 LearningRate 0.0313 Epoch: 8 Global Step: 147050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:27:31,369-Speed 9295.35 samples/sec Loss 6.2231 LearningRate 0.0313 Epoch: 8 Global Step: 147060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:27:32,462-Speed 9371.29 samples/sec Loss 6.2399 LearningRate 0.0313 Epoch: 8 Global Step: 147070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:27:33,562-Speed 9312.54 samples/sec Loss 6.1367 LearningRate 0.0313 Epoch: 8 Global Step: 147080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:27:34,661-Speed 9324.89 samples/sec Loss 6.1916 LearningRate 0.0313 Epoch: 8 Global Step: 147090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:27:35,771-Speed 9234.03 samples/sec Loss 6.2225 LearningRate 0.0313 Epoch: 8 Global Step: 147100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:27:36,896-Speed 9104.20 samples/sec Loss 6.2182 LearningRate 0.0313 Epoch: 8 Global Step: 147110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:27:37,986-Speed 9403.19 samples/sec Loss 6.1888 LearningRate 0.0313 Epoch: 8 Global Step: 147120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 17:27:39,088-Speed 9302.76 samples/sec Loss 6.2818 LearningRate 0.0313 Epoch: 8 Global Step: 147130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:40,140-Speed 9738.71 samples/sec Loss 6.1445 LearningRate 0.0313 Epoch: 8 Global Step: 147140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:41,205-Speed 9619.96 samples/sec Loss 6.2218 LearningRate 0.0313 Epoch: 8 Global Step: 147150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:42,289-Speed 9446.84 samples/sec Loss 6.2048 LearningRate 0.0313 Epoch: 8 Global Step: 147160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:43,365-Speed 9522.36 samples/sec Loss 6.1811 LearningRate 0.0313 Epoch: 8 Global Step: 147170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:44,429-Speed 9630.27 samples/sec Loss 6.2715 LearningRate 0.0313 Epoch: 8 Global Step: 147180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:45,466-Speed 9886.13 samples/sec Loss 6.3139 LearningRate 0.0313 Epoch: 8 Global Step: 147190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 17:27:46,578-Speed 9215.04 samples/sec Loss 6.1366 LearningRate 0.0313 Epoch: 8 Global Step: 147200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:27:47,680-Speed 9296.08 samples/sec Loss 6.2086 LearningRate 0.0312 Epoch: 8 Global Step: 147210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:27:48,756-Speed 9524.20 samples/sec Loss 6.1454 LearningRate 0.0312 Epoch: 8 Global Step: 147220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:27:49,835-Speed 9499.93 samples/sec Loss 6.2221 LearningRate 0.0312 Epoch: 8 Global Step: 147230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:27:50,914-Speed 9495.23 samples/sec Loss 6.2224 LearningRate 0.0312 Epoch: 8 Global Step: 147240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:27:52,005-Speed 9388.70 samples/sec Loss 6.3498 LearningRate 0.0312 Epoch: 8 Global Step: 147250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:27:53,050-Speed 9810.08 samples/sec Loss 6.2556 LearningRate 0.0312 Epoch: 8 Global Step: 147260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:27:54,149-Speed 9318.34 samples/sec Loss 6.2512 LearningRate 0.0312 Epoch: 8 Global Step: 147270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:27:55,219-Speed 9575.04 samples/sec Loss 6.3095 LearningRate 0.0312 Epoch: 8 Global Step: 147280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:27:56,286-Speed 9606.97 samples/sec Loss 6.2231 LearningRate 0.0312 Epoch: 8 Global Step: 147290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:27:57,335-Speed 9764.96 samples/sec Loss 6.2220 LearningRate 0.0312 Epoch: 8 Global Step: 147300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:27:58,394-Speed 9680.07 samples/sec Loss 6.0466 LearningRate 0.0312 Epoch: 8 Global Step: 147310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:27:59,449-Speed 9707.32 samples/sec Loss 6.3122 LearningRate 0.0312 Epoch: 8 Global Step: 147320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:00,557-Speed 9246.71 samples/sec Loss 6.2345 LearningRate 0.0312 Epoch: 8 Global Step: 147330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:01,650-Speed 9373.65 samples/sec Loss 6.2874 LearningRate 0.0312 Epoch: 8 Global Step: 147340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:02,701-Speed 9752.44 samples/sec Loss 6.1969 LearningRate 0.0312 Epoch: 8 Global Step: 147350 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:28:03,754-Speed 9728.09 samples/sec Loss 6.1747 LearningRate 0.0312 Epoch: 8 Global Step: 147360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:04,811-Speed 9691.45 samples/sec Loss 6.1822 LearningRate 0.0312 Epoch: 8 Global Step: 147370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:05,876-Speed 9616.99 samples/sec Loss 6.1925 LearningRate 0.0312 Epoch: 8 Global Step: 147380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:06,983-Speed 9260.04 samples/sec Loss 6.2659 LearningRate 0.0312 Epoch: 8 Global Step: 147390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:08,044-Speed 9651.83 samples/sec Loss 6.1023 LearningRate 0.0312 Epoch: 8 Global Step: 147400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:09,137-Speed 9397.81 samples/sec Loss 6.2599 LearningRate 0.0312 Epoch: 8 Global Step: 147410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:10,208-Speed 9568.27 samples/sec Loss 6.2479 LearningRate 0.0312 Epoch: 8 Global Step: 147420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:11,281-Speed 9550.87 samples/sec Loss 6.2469 LearningRate 0.0312 Epoch: 8 Global Step: 147430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:12,354-Speed 9549.03 samples/sec Loss 6.1809 LearningRate 0.0312 Epoch: 8 Global Step: 147440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:13,397-Speed 9822.26 samples/sec Loss 6.2778 LearningRate 0.0312 Epoch: 8 Global Step: 147450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:14,445-Speed 9777.59 samples/sec Loss 6.1836 LearningRate 0.0312 Epoch: 8 Global Step: 147460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:28:15,491-Speed 9790.19 samples/sec Loss 6.1731 LearningRate 0.0312 Epoch: 8 Global Step: 147470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:28:16,564-Speed 9547.15 samples/sec Loss 6.2293 LearningRate 0.0312 Epoch: 8 Global Step: 147480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:28:17,692-Speed 9082.01 samples/sec Loss 6.2090 LearningRate 0.0312 Epoch: 8 Global Step: 147490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:28:18,744-Speed 9744.30 samples/sec Loss 6.1984 LearningRate 0.0312 Epoch: 8 Global Step: 147500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:28:19,812-Speed 9600.30 samples/sec Loss 6.0853 LearningRate 0.0311 Epoch: 8 Global Step: 147510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:20,890-Speed 9505.46 samples/sec Loss 6.2977 LearningRate 0.0311 Epoch: 8 Global Step: 147520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:21,977-Speed 9425.06 samples/sec Loss 6.1580 LearningRate 0.0311 Epoch: 8 Global Step: 147530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:23,057-Speed 9482.14 samples/sec Loss 6.3664 LearningRate 0.0311 Epoch: 8 Global Step: 147540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:24,116-Speed 9682.24 samples/sec Loss 6.1815 LearningRate 0.0311 Epoch: 8 Global Step: 147550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:25,154-Speed 9870.30 samples/sec Loss 6.2916 LearningRate 0.0311 Epoch: 8 Global Step: 147560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:26,238-Speed 9450.02 samples/sec Loss 6.1872 LearningRate 0.0311 Epoch: 8 Global Step: 147570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:27,325-Speed 9427.39 samples/sec Loss 6.2609 LearningRate 0.0311 Epoch: 8 Global Step: 147580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:28,440-Speed 9195.66 samples/sec Loss 6.1191 LearningRate 0.0311 Epoch: 8 Global Step: 147590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:29,551-Speed 9215.17 samples/sec Loss 6.3122 LearningRate 0.0311 Epoch: 8 Global Step: 147600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:30,642-Speed 9397.82 samples/sec Loss 6.2807 LearningRate 0.0311 Epoch: 8 Global Step: 147610 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:28:31,723-Speed 9478.85 samples/sec Loss 6.2403 LearningRate 0.0311 Epoch: 8 Global Step: 147620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:32,810-Speed 9424.27 samples/sec Loss 6.2041 LearningRate 0.0311 Epoch: 8 Global Step: 147630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:33,902-Speed 9379.05 samples/sec Loss 6.2060 LearningRate 0.0311 Epoch: 8 Global Step: 147640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:34,930-Speed 9965.05 samples/sec Loss 6.2790 LearningRate 0.0311 Epoch: 8 Global Step: 147650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:36,052-Speed 9142.83 samples/sec Loss 6.2022 LearningRate 0.0311 Epoch: 8 Global Step: 147660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:37,127-Speed 9532.48 samples/sec Loss 6.0980 LearningRate 0.0311 Epoch: 8 Global Step: 147670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:38,198-Speed 9568.03 samples/sec Loss 6.2224 LearningRate 0.0311 Epoch: 8 Global Step: 147680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:39,246-Speed 9774.63 samples/sec Loss 6.2505 LearningRate 0.0311 Epoch: 8 Global Step: 147690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:40,297-Speed 9750.92 samples/sec Loss 6.1896 LearningRate 0.0311 Epoch: 8 Global Step: 147700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:41,384-Speed 9427.32 samples/sec Loss 6.1058 LearningRate 0.0311 Epoch: 8 Global Step: 147710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:42,478-Speed 9366.30 samples/sec Loss 6.2192 LearningRate 0.0311 Epoch: 8 Global Step: 147720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:28:43,533-Speed 9707.17 samples/sec Loss 6.1859 LearningRate 0.0311 Epoch: 8 Global Step: 147730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:44,615-Speed 9466.23 samples/sec Loss 6.1918 LearningRate 0.0311 Epoch: 8 Global Step: 147740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:45,668-Speed 9733.63 samples/sec Loss 6.1762 LearningRate 0.0311 Epoch: 8 Global Step: 147750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:46,748-Speed 9488.68 samples/sec Loss 6.1666 LearningRate 0.0311 Epoch: 8 Global Step: 147760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:47,810-Speed 9651.80 samples/sec Loss 6.2437 LearningRate 0.0311 Epoch: 8 Global Step: 147770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:48,851-Speed 9839.69 samples/sec Loss 6.2290 LearningRate 0.0311 Epoch: 8 Global Step: 147780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:49,896-Speed 9809.36 samples/sec Loss 6.1985 LearningRate 0.0311 Epoch: 8 Global Step: 147790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:50,993-Speed 9341.29 samples/sec Loss 6.1873 LearningRate 0.0311 Epoch: 8 Global Step: 147800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:52,093-Speed 9315.47 samples/sec Loss 6.2165 LearningRate 0.0310 Epoch: 8 Global Step: 147810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:53,129-Speed 9891.44 samples/sec Loss 6.1027 LearningRate 0.0310 Epoch: 8 Global Step: 147820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:54,254-Speed 9103.84 samples/sec Loss 6.1426 LearningRate 0.0310 Epoch: 8 Global Step: 147830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:55,333-Speed 9499.00 samples/sec Loss 6.2243 LearningRate 0.0310 Epoch: 8 Global Step: 147840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:56,386-Speed 9734.67 samples/sec Loss 6.1715 LearningRate 0.0310 Epoch: 8 Global Step: 147850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:57,484-Speed 9331.88 samples/sec Loss 6.3195 LearningRate 0.0310 Epoch: 8 Global Step: 147860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:58,570-Speed 9434.04 samples/sec Loss 6.2205 LearningRate 0.0310 Epoch: 8 Global Step: 147870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:28:59,627-Speed 9689.65 samples/sec Loss 6.2265 LearningRate 0.0310 Epoch: 8 Global Step: 147880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:00,720-Speed 9372.42 samples/sec Loss 6.1076 LearningRate 0.0310 Epoch: 8 Global Step: 147890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:01,807-Speed 9427.82 samples/sec Loss 6.2018 LearningRate 0.0310 Epoch: 8 Global Step: 147900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:02,902-Speed 9351.05 samples/sec Loss 6.1916 LearningRate 0.0310 Epoch: 8 Global Step: 147910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:03,952-Speed 9764.13 samples/sec Loss 6.2331 LearningRate 0.0310 Epoch: 8 Global Step: 147920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:05,026-Speed 9540.93 samples/sec Loss 6.1635 LearningRate 0.0310 Epoch: 8 Global Step: 147930 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:29:06,143-Speed 9171.67 samples/sec Loss 6.1590 LearningRate 0.0310 Epoch: 8 Global Step: 147940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:07,239-Speed 9349.16 samples/sec Loss 6.1397 LearningRate 0.0310 Epoch: 8 Global Step: 147950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:08,348-Speed 9240.66 samples/sec Loss 6.1536 LearningRate 0.0310 Epoch: 8 Global Step: 147960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:09,455-Speed 9357.48 samples/sec Loss 6.1375 LearningRate 0.0310 Epoch: 8 Global Step: 147970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:10,518-Speed 9636.88 samples/sec Loss 6.1738 LearningRate 0.0310 Epoch: 8 Global Step: 147980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:11,585-Speed 9605.06 samples/sec Loss 6.1776 LearningRate 0.0310 Epoch: 8 Global Step: 147990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:12,661-Speed 9520.85 samples/sec Loss 6.2050 LearningRate 0.0310 Epoch: 8 Global Step: 148000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:29:34,752-[lfw][148000]XNorm: 10.113518 Training: 2022-04-11 17:29:34,753-[lfw][148000]Accuracy-Flip: 0.99583+-0.00227 Training: 2022-04-11 17:29:34,753-[lfw][148000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:30:00,126-[cfp_fp][148000]XNorm: 8.712494 Training: 2022-04-11 17:30:00,127-[cfp_fp][148000]Accuracy-Flip: 0.95929+-0.00897 Training: 2022-04-11 17:30:00,128-[cfp_fp][148000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:30:22,529-[agedb_30][148000]XNorm: 9.862267 Training: 2022-04-11 17:30:22,530-[agedb_30][148000]Accuracy-Flip: 0.96183+-0.01047 Training: 2022-04-11 17:30:22,530-[agedb_30][148000]Accuracy-Highest: 0.96783 Training: 2022-04-11 17:30:23,624-Speed 144.30 samples/sec Loss 6.1677 LearningRate 0.0310 Epoch: 8 Global Step: 148010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:24,713-Speed 9408.31 samples/sec Loss 6.1801 LearningRate 0.0310 Epoch: 8 Global Step: 148020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:25,791-Speed 9500.37 samples/sec Loss 6.1171 LearningRate 0.0310 Epoch: 8 Global Step: 148030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:26,852-Speed 9658.57 samples/sec Loss 6.2886 LearningRate 0.0310 Epoch: 8 Global Step: 148040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:27,910-Speed 9682.63 samples/sec Loss 6.1401 LearningRate 0.0310 Epoch: 8 Global Step: 148050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:29,009-Speed 9326.36 samples/sec Loss 6.2257 LearningRate 0.0310 Epoch: 8 Global Step: 148060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:30,081-Speed 9565.19 samples/sec Loss 6.0874 LearningRate 0.0310 Epoch: 8 Global Step: 148070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:31,184-Speed 9286.76 samples/sec Loss 6.1955 LearningRate 0.0310 Epoch: 8 Global Step: 148080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:32,268-Speed 9455.39 samples/sec Loss 6.1507 LearningRate 0.0310 Epoch: 8 Global Step: 148090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:33,322-Speed 9719.79 samples/sec Loss 6.3237 LearningRate 0.0310 Epoch: 8 Global Step: 148100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:34,422-Speed 9311.71 samples/sec Loss 6.1996 LearningRate 0.0309 Epoch: 8 Global Step: 148110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:35,512-Speed 9400.41 samples/sec Loss 6.0735 LearningRate 0.0309 Epoch: 8 Global Step: 148120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:36,641-Speed 9075.77 samples/sec Loss 6.2147 LearningRate 0.0309 Epoch: 8 Global Step: 148130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:37,762-Speed 9143.34 samples/sec Loss 6.2280 LearningRate 0.0309 Epoch: 8 Global Step: 148140 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:30:38,840-Speed 9498.74 samples/sec Loss 6.1845 LearningRate 0.0309 Epoch: 8 Global Step: 148150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:39,940-Speed 9318.90 samples/sec Loss 6.1245 LearningRate 0.0309 Epoch: 8 Global Step: 148160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:41,062-Speed 9130.29 samples/sec Loss 6.2884 LearningRate 0.0309 Epoch: 8 Global Step: 148170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:42,153-Speed 9390.89 samples/sec Loss 6.2208 LearningRate 0.0309 Epoch: 8 Global Step: 148180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:43,219-Speed 9611.69 samples/sec Loss 6.2357 LearningRate 0.0309 Epoch: 8 Global Step: 148190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:44,319-Speed 9314.22 samples/sec Loss 6.2558 LearningRate 0.0309 Epoch: 8 Global Step: 148200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:45,359-Speed 9851.62 samples/sec Loss 6.1167 LearningRate 0.0309 Epoch: 8 Global Step: 148210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:46,425-Speed 9611.44 samples/sec Loss 6.2365 LearningRate 0.0309 Epoch: 8 Global Step: 148220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:47,510-Speed 9448.20 samples/sec Loss 6.2021 LearningRate 0.0309 Epoch: 8 Global Step: 148230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:48,623-Speed 9205.44 samples/sec Loss 6.2999 LearningRate 0.0309 Epoch: 8 Global Step: 148240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:49,723-Speed 9311.89 samples/sec Loss 6.2573 LearningRate 0.0309 Epoch: 8 Global Step: 148250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:50,789-Speed 9617.97 samples/sec Loss 6.2229 LearningRate 0.0309 Epoch: 8 Global Step: 148260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:51,884-Speed 9352.29 samples/sec Loss 6.2897 LearningRate 0.0309 Epoch: 8 Global Step: 148270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:52,994-Speed 9231.90 samples/sec Loss 6.1450 LearningRate 0.0309 Epoch: 8 Global Step: 148280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:54,070-Speed 9521.80 samples/sec Loss 6.2711 LearningRate 0.0309 Epoch: 8 Global Step: 148290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:55,172-Speed 9303.71 samples/sec Loss 6.1143 LearningRate 0.0309 Epoch: 8 Global Step: 148300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:56,273-Speed 9314.52 samples/sec Loss 6.2343 LearningRate 0.0309 Epoch: 8 Global Step: 148310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:57,331-Speed 9682.99 samples/sec Loss 6.2189 LearningRate 0.0309 Epoch: 8 Global Step: 148320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:58,408-Speed 9511.38 samples/sec Loss 6.2051 LearningRate 0.0309 Epoch: 8 Global Step: 148330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:30:59,461-Speed 9729.05 samples/sec Loss 6.1658 LearningRate 0.0309 Epoch: 8 Global Step: 148340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:00,526-Speed 9620.42 samples/sec Loss 6.1485 LearningRate 0.0309 Epoch: 8 Global Step: 148350 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:01,669-Speed 8961.76 samples/sec Loss 6.2367 LearningRate 0.0309 Epoch: 8 Global Step: 148360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:02,776-Speed 9257.52 samples/sec Loss 6.1060 LearningRate 0.0309 Epoch: 8 Global Step: 148370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:03,871-Speed 9360.17 samples/sec Loss 6.2011 LearningRate 0.0309 Epoch: 8 Global Step: 148380 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:04,951-Speed 9484.24 samples/sec Loss 6.2670 LearningRate 0.0309 Epoch: 8 Global Step: 148390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:06,030-Speed 9494.80 samples/sec Loss 6.1460 LearningRate 0.0309 Epoch: 8 Global Step: 148400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:07,073-Speed 9825.89 samples/sec Loss 6.2103 LearningRate 0.0308 Epoch: 8 Global Step: 148410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:08,153-Speed 9486.13 samples/sec Loss 6.1965 LearningRate 0.0308 Epoch: 8 Global Step: 148420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:09,228-Speed 9531.84 samples/sec Loss 6.1464 LearningRate 0.0308 Epoch: 8 Global Step: 148430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:10,348-Speed 9147.38 samples/sec Loss 6.2057 LearningRate 0.0308 Epoch: 8 Global Step: 148440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:11,460-Speed 9221.62 samples/sec Loss 6.1952 LearningRate 0.0308 Epoch: 8 Global Step: 148450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:12,529-Speed 9581.66 samples/sec Loss 6.1432 LearningRate 0.0308 Epoch: 8 Global Step: 148460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:13,599-Speed 9575.95 samples/sec Loss 6.3050 LearningRate 0.0308 Epoch: 8 Global Step: 148470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:14,680-Speed 9479.60 samples/sec Loss 6.1410 LearningRate 0.0308 Epoch: 8 Global Step: 148480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:15,780-Speed 9312.64 samples/sec Loss 6.2378 LearningRate 0.0308 Epoch: 8 Global Step: 148490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:16,865-Speed 9438.94 samples/sec Loss 6.1755 LearningRate 0.0308 Epoch: 8 Global Step: 148500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:17,949-Speed 9452.58 samples/sec Loss 6.1412 LearningRate 0.0308 Epoch: 8 Global Step: 148510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:19,052-Speed 9290.02 samples/sec Loss 6.2874 LearningRate 0.0308 Epoch: 8 Global Step: 148520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:20,133-Speed 9486.77 samples/sec Loss 6.1946 LearningRate 0.0308 Epoch: 8 Global Step: 148530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:21,234-Speed 9304.36 samples/sec Loss 6.2829 LearningRate 0.0308 Epoch: 8 Global Step: 148540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:22,289-Speed 9712.92 samples/sec Loss 6.2225 LearningRate 0.0308 Epoch: 8 Global Step: 148550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:23,341-Speed 9733.76 samples/sec Loss 6.2174 LearningRate 0.0308 Epoch: 8 Global Step: 148560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:24,403-Speed 9649.60 samples/sec Loss 6.0440 LearningRate 0.0308 Epoch: 8 Global Step: 148570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:25,487-Speed 9456.28 samples/sec Loss 6.2040 LearningRate 0.0308 Epoch: 8 Global Step: 148580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:26,556-Speed 9585.96 samples/sec Loss 6.3123 LearningRate 0.0308 Epoch: 8 Global Step: 148590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:27,632-Speed 9521.78 samples/sec Loss 6.1584 LearningRate 0.0308 Epoch: 8 Global Step: 148600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:28,702-Speed 9575.71 samples/sec Loss 6.1460 LearningRate 0.0308 Epoch: 8 Global Step: 148610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:29,825-Speed 9117.56 samples/sec Loss 6.2272 LearningRate 0.0308 Epoch: 8 Global Step: 148620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:30,919-Speed 9371.91 samples/sec Loss 6.1940 LearningRate 0.0308 Epoch: 8 Global Step: 148630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:31,944-Speed 9994.95 samples/sec Loss 6.2810 LearningRate 0.0308 Epoch: 8 Global Step: 148640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:33,016-Speed 9554.80 samples/sec Loss 6.1025 LearningRate 0.0308 Epoch: 8 Global Step: 148650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:34,109-Speed 9376.45 samples/sec Loss 6.2837 LearningRate 0.0308 Epoch: 8 Global Step: 148660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:35,164-Speed 9709.72 samples/sec Loss 6.2748 LearningRate 0.0308 Epoch: 8 Global Step: 148670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:36,241-Speed 9515.86 samples/sec Loss 6.2175 LearningRate 0.0308 Epoch: 8 Global Step: 148680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:37,327-Speed 9431.61 samples/sec Loss 6.0783 LearningRate 0.0308 Epoch: 8 Global Step: 148690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:38,405-Speed 9510.54 samples/sec Loss 6.2192 LearningRate 0.0308 Epoch: 8 Global Step: 148700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:39,445-Speed 9850.24 samples/sec Loss 6.1946 LearningRate 0.0307 Epoch: 8 Global Step: 148710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:40,497-Speed 9742.31 samples/sec Loss 6.0784 LearningRate 0.0307 Epoch: 8 Global Step: 148720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:41,561-Speed 9630.09 samples/sec Loss 6.1378 LearningRate 0.0307 Epoch: 8 Global Step: 148730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:42,650-Speed 9409.25 samples/sec Loss 6.1450 LearningRate 0.0307 Epoch: 8 Global Step: 148740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:43,719-Speed 9580.88 samples/sec Loss 6.1587 LearningRate 0.0307 Epoch: 8 Global Step: 148750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:44,801-Speed 9469.79 samples/sec Loss 6.2232 LearningRate 0.0307 Epoch: 8 Global Step: 148760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:45,894-Speed 9374.33 samples/sec Loss 6.1242 LearningRate 0.0307 Epoch: 8 Global Step: 148770 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:31:46,969-Speed 9529.46 samples/sec Loss 6.1795 LearningRate 0.0307 Epoch: 8 Global Step: 148780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:48,022-Speed 9730.89 samples/sec Loss 6.1678 LearningRate 0.0307 Epoch: 8 Global Step: 148790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:49,100-Speed 9506.34 samples/sec Loss 6.1028 LearningRate 0.0307 Epoch: 8 Global Step: 148800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:50,145-Speed 9807.62 samples/sec Loss 6.1546 LearningRate 0.0307 Epoch: 8 Global Step: 148810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:51,231-Speed 9438.31 samples/sec Loss 6.2294 LearningRate 0.0307 Epoch: 8 Global Step: 148820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:52,316-Speed 9443.12 samples/sec Loss 6.2648 LearningRate 0.0307 Epoch: 8 Global Step: 148830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:53,376-Speed 9664.88 samples/sec Loss 6.2801 LearningRate 0.0307 Epoch: 8 Global Step: 148840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:31:54,432-Speed 9704.12 samples/sec Loss 6.3053 LearningRate 0.0307 Epoch: 8 Global Step: 148850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:31:55,515-Speed 9458.77 samples/sec Loss 6.1582 LearningRate 0.0307 Epoch: 8 Global Step: 148860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:31:56,590-Speed 9532.44 samples/sec Loss 6.2090 LearningRate 0.0307 Epoch: 8 Global Step: 148870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:31:57,687-Speed 9345.98 samples/sec Loss 6.1770 LearningRate 0.0307 Epoch: 8 Global Step: 148880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:31:58,757-Speed 9572.37 samples/sec Loss 6.1811 LearningRate 0.0307 Epoch: 8 Global Step: 148890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:31:59,822-Speed 9618.43 samples/sec Loss 6.2152 LearningRate 0.0307 Epoch: 8 Global Step: 148900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:32:00,906-Speed 9453.31 samples/sec Loss 6.1629 LearningRate 0.0307 Epoch: 8 Global Step: 148910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:32:01,995-Speed 9409.54 samples/sec Loss 6.1894 LearningRate 0.0307 Epoch: 8 Global Step: 148920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:32:03,082-Speed 9424.76 samples/sec Loss 6.2207 LearningRate 0.0307 Epoch: 8 Global Step: 148930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:32:04,136-Speed 9722.09 samples/sec Loss 6.1907 LearningRate 0.0307 Epoch: 8 Global Step: 148940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:32:05,222-Speed 9437.53 samples/sec Loss 6.1487 LearningRate 0.0307 Epoch: 8 Global Step: 148950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:06,339-Speed 9171.15 samples/sec Loss 6.0736 LearningRate 0.0307 Epoch: 8 Global Step: 148960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:07,405-Speed 9608.25 samples/sec Loss 6.1965 LearningRate 0.0307 Epoch: 8 Global Step: 148970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:08,470-Speed 9628.97 samples/sec Loss 6.1947 LearningRate 0.0307 Epoch: 8 Global Step: 148980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:09,497-Speed 9977.91 samples/sec Loss 6.1434 LearningRate 0.0307 Epoch: 8 Global Step: 148990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:10,581-Speed 9451.16 samples/sec Loss 6.1587 LearningRate 0.0307 Epoch: 8 Global Step: 149000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:11,640-Speed 9671.91 samples/sec Loss 6.1661 LearningRate 0.0306 Epoch: 8 Global Step: 149010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:12,688-Speed 9781.23 samples/sec Loss 6.1007 LearningRate 0.0306 Epoch: 8 Global Step: 149020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:13,760-Speed 9558.84 samples/sec Loss 6.1519 LearningRate 0.0306 Epoch: 8 Global Step: 149030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:14,849-Speed 9405.25 samples/sec Loss 6.2062 LearningRate 0.0306 Epoch: 8 Global Step: 149040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:15,911-Speed 9645.77 samples/sec Loss 6.2199 LearningRate 0.0306 Epoch: 8 Global Step: 149050 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:32:16,980-Speed 9590.45 samples/sec Loss 6.1953 LearningRate 0.0306 Epoch: 8 Global Step: 149060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:18,099-Speed 9149.07 samples/sec Loss 6.2265 LearningRate 0.0306 Epoch: 8 Global Step: 149070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:19,229-Speed 9073.79 samples/sec Loss 6.2178 LearningRate 0.0306 Epoch: 8 Global Step: 149080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:20,343-Speed 9198.82 samples/sec Loss 6.1574 LearningRate 0.0306 Epoch: 8 Global Step: 149090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:21,398-Speed 9712.01 samples/sec Loss 6.3089 LearningRate 0.0306 Epoch: 8 Global Step: 149100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:22,493-Speed 9351.33 samples/sec Loss 6.2730 LearningRate 0.0306 Epoch: 8 Global Step: 149110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:23,589-Speed 9349.41 samples/sec Loss 6.2059 LearningRate 0.0306 Epoch: 8 Global Step: 149120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:24,685-Speed 9349.88 samples/sec Loss 6.2411 LearningRate 0.0306 Epoch: 8 Global Step: 149130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:25,772-Speed 9433.93 samples/sec Loss 6.2622 LearningRate 0.0306 Epoch: 8 Global Step: 149140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:26,839-Speed 9607.36 samples/sec Loss 6.2171 LearningRate 0.0306 Epoch: 8 Global Step: 149150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:27,930-Speed 9387.15 samples/sec Loss 6.2385 LearningRate 0.0306 Epoch: 8 Global Step: 149160 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:32:28,992-Speed 9650.56 samples/sec Loss 6.1848 LearningRate 0.0306 Epoch: 8 Global Step: 149170 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:32:30,031-Speed 9854.95 samples/sec Loss 6.2246 LearningRate 0.0306 Epoch: 8 Global Step: 149180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:31,146-Speed 9190.07 samples/sec Loss 6.1964 LearningRate 0.0306 Epoch: 8 Global Step: 149190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:32,218-Speed 9562.53 samples/sec Loss 6.1316 LearningRate 0.0306 Epoch: 8 Global Step: 149200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:33,276-Speed 9677.71 samples/sec Loss 6.1904 LearningRate 0.0306 Epoch: 8 Global Step: 149210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:34,352-Speed 9521.83 samples/sec Loss 6.1977 LearningRate 0.0306 Epoch: 8 Global Step: 149220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:35,415-Speed 9644.87 samples/sec Loss 6.1790 LearningRate 0.0306 Epoch: 8 Global Step: 149230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:36,492-Speed 9509.69 samples/sec Loss 6.1435 LearningRate 0.0306 Epoch: 8 Global Step: 149240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:37,618-Speed 9097.49 samples/sec Loss 6.0379 LearningRate 0.0306 Epoch: 8 Global Step: 149250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:38,704-Speed 9437.29 samples/sec Loss 6.0291 LearningRate 0.0306 Epoch: 8 Global Step: 149260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:39,786-Speed 9472.57 samples/sec Loss 6.1694 LearningRate 0.0306 Epoch: 8 Global Step: 149270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:40,836-Speed 9760.07 samples/sec Loss 6.1868 LearningRate 0.0306 Epoch: 8 Global Step: 149280 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:32:41,919-Speed 9458.53 samples/sec Loss 6.1767 LearningRate 0.0306 Epoch: 8 Global Step: 149290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:42,994-Speed 9529.99 samples/sec Loss 6.1993 LearningRate 0.0306 Epoch: 8 Global Step: 149300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:44,089-Speed 9361.10 samples/sec Loss 6.2555 LearningRate 0.0306 Epoch: 8 Global Step: 149310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:45,169-Speed 9483.45 samples/sec Loss 6.1359 LearningRate 0.0305 Epoch: 8 Global Step: 149320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:46,226-Speed 9699.50 samples/sec Loss 6.1465 LearningRate 0.0305 Epoch: 8 Global Step: 149330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:47,326-Speed 9308.60 samples/sec Loss 6.2795 LearningRate 0.0305 Epoch: 8 Global Step: 149340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:48,372-Speed 9796.60 samples/sec Loss 6.1838 LearningRate 0.0305 Epoch: 8 Global Step: 149350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:49,505-Speed 9045.06 samples/sec Loss 6.2521 LearningRate 0.0305 Epoch: 8 Global Step: 149360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:50,591-Speed 9442.94 samples/sec Loss 6.1199 LearningRate 0.0305 Epoch: 8 Global Step: 149370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:51,655-Speed 9626.36 samples/sec Loss 6.2149 LearningRate 0.0305 Epoch: 8 Global Step: 149380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:52,720-Speed 9620.64 samples/sec Loss 6.2190 LearningRate 0.0305 Epoch: 8 Global Step: 149390 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:32:53,784-Speed 9629.24 samples/sec Loss 6.1341 LearningRate 0.0305 Epoch: 8 Global Step: 149400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:54,890-Speed 9266.05 samples/sec Loss 6.1742 LearningRate 0.0305 Epoch: 8 Global Step: 149410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:55,989-Speed 9324.07 samples/sec Loss 6.2115 LearningRate 0.0305 Epoch: 8 Global Step: 149420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:57,089-Speed 9315.02 samples/sec Loss 6.2214 LearningRate 0.0305 Epoch: 8 Global Step: 149430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:58,176-Speed 9424.45 samples/sec Loss 6.2227 LearningRate 0.0305 Epoch: 8 Global Step: 149440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:32:59,247-Speed 9569.20 samples/sec Loss 6.0894 LearningRate 0.0305 Epoch: 8 Global Step: 149450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:00,308-Speed 9658.83 samples/sec Loss 6.1408 LearningRate 0.0305 Epoch: 8 Global Step: 149460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:01,401-Speed 9374.58 samples/sec Loss 6.1001 LearningRate 0.0305 Epoch: 8 Global Step: 149470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:02,505-Speed 9276.79 samples/sec Loss 6.1323 LearningRate 0.0305 Epoch: 8 Global Step: 149480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:03,580-Speed 9534.82 samples/sec Loss 6.2222 LearningRate 0.0305 Epoch: 8 Global Step: 149490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:04,642-Speed 9647.79 samples/sec Loss 6.0966 LearningRate 0.0305 Epoch: 8 Global Step: 149500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:33:05,717-Speed 9529.96 samples/sec Loss 6.2740 LearningRate 0.0305 Epoch: 8 Global Step: 149510 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:33:06,789-Speed 9556.55 samples/sec Loss 6.1911 LearningRate 0.0305 Epoch: 8 Global Step: 149520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:33:07,913-Speed 9117.54 samples/sec Loss 6.1953 LearningRate 0.0305 Epoch: 8 Global Step: 149530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:09,023-Speed 9231.89 samples/sec Loss 6.1070 LearningRate 0.0305 Epoch: 8 Global Step: 149540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:10,100-Speed 9512.88 samples/sec Loss 6.1212 LearningRate 0.0305 Epoch: 8 Global Step: 149550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:11,185-Speed 9451.17 samples/sec Loss 6.0806 LearningRate 0.0305 Epoch: 8 Global Step: 149560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:12,241-Speed 9708.16 samples/sec Loss 6.0598 LearningRate 0.0305 Epoch: 8 Global Step: 149570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:13,346-Speed 9271.28 samples/sec Loss 6.1965 LearningRate 0.0305 Epoch: 8 Global Step: 149580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:14,407-Speed 9650.41 samples/sec Loss 6.2009 LearningRate 0.0305 Epoch: 8 Global Step: 149590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:15,506-Speed 9327.13 samples/sec Loss 6.1198 LearningRate 0.0305 Epoch: 8 Global Step: 149600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:16,559-Speed 9730.12 samples/sec Loss 6.1454 LearningRate 0.0305 Epoch: 8 Global Step: 149610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:17,649-Speed 9395.45 samples/sec Loss 6.1138 LearningRate 0.0304 Epoch: 8 Global Step: 149620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:18,711-Speed 9649.60 samples/sec Loss 6.2959 LearningRate 0.0304 Epoch: 8 Global Step: 149630 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:33:19,778-Speed 9599.12 samples/sec Loss 6.2280 LearningRate 0.0304 Epoch: 8 Global Step: 149640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:20,868-Speed 9405.09 samples/sec Loss 6.0109 LearningRate 0.0304 Epoch: 8 Global Step: 149650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:21,927-Speed 9676.44 samples/sec Loss 6.1492 LearningRate 0.0304 Epoch: 8 Global Step: 149660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:22,996-Speed 9591.22 samples/sec Loss 6.1957 LearningRate 0.0304 Epoch: 8 Global Step: 149670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:24,082-Speed 9437.60 samples/sec Loss 6.2589 LearningRate 0.0304 Epoch: 8 Global Step: 149680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:25,170-Speed 9413.72 samples/sec Loss 6.1531 LearningRate 0.0304 Epoch: 8 Global Step: 149690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:26,279-Speed 9240.55 samples/sec Loss 6.0835 LearningRate 0.0304 Epoch: 8 Global Step: 149700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:27,354-Speed 9533.74 samples/sec Loss 6.1960 LearningRate 0.0304 Epoch: 8 Global Step: 149710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:28,455-Speed 9306.76 samples/sec Loss 6.1891 LearningRate 0.0304 Epoch: 8 Global Step: 149720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:29,513-Speed 9687.39 samples/sec Loss 6.2004 LearningRate 0.0304 Epoch: 8 Global Step: 149730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:30,563-Speed 9749.50 samples/sec Loss 6.1775 LearningRate 0.0304 Epoch: 8 Global Step: 149740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:33:31,640-Speed 9513.53 samples/sec Loss 6.0919 LearningRate 0.0304 Epoch: 8 Global Step: 149750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:33:32,748-Speed 9250.37 samples/sec Loss 6.2012 LearningRate 0.0304 Epoch: 8 Global Step: 149760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:33:33,824-Speed 9521.47 samples/sec Loss 6.1405 LearningRate 0.0304 Epoch: 8 Global Step: 149770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:34,883-Speed 9676.99 samples/sec Loss 6.0932 LearningRate 0.0304 Epoch: 8 Global Step: 149780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:35,969-Speed 9431.63 samples/sec Loss 6.1064 LearningRate 0.0304 Epoch: 8 Global Step: 149790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:37,072-Speed 9288.09 samples/sec Loss 6.2032 LearningRate 0.0304 Epoch: 8 Global Step: 149800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:38,150-Speed 9506.38 samples/sec Loss 6.2483 LearningRate 0.0304 Epoch: 8 Global Step: 149810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:39,215-Speed 9620.15 samples/sec Loss 6.2060 LearningRate 0.0304 Epoch: 8 Global Step: 149820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:40,307-Speed 9386.03 samples/sec Loss 6.2217 LearningRate 0.0304 Epoch: 8 Global Step: 149830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:41,378-Speed 9569.83 samples/sec Loss 6.3418 LearningRate 0.0304 Epoch: 8 Global Step: 149840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:42,434-Speed 9701.46 samples/sec Loss 6.2346 LearningRate 0.0304 Epoch: 8 Global Step: 149850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:43,515-Speed 9481.96 samples/sec Loss 6.2115 LearningRate 0.0304 Epoch: 8 Global Step: 149860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:33:44,608-Speed 9372.04 samples/sec Loss 6.2283 LearningRate 0.0304 Epoch: 8 Global Step: 149870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:33:45,677-Speed 9582.03 samples/sec Loss 6.0609 LearningRate 0.0304 Epoch: 8 Global Step: 149880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:33:46,806-Speed 9074.30 samples/sec Loss 6.1872 LearningRate 0.0304 Epoch: 8 Global Step: 149890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:33:47,888-Speed 9469.25 samples/sec Loss 6.1767 LearningRate 0.0304 Epoch: 8 Global Step: 149900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:33:48,989-Speed 9304.66 samples/sec Loss 6.2221 LearningRate 0.0304 Epoch: 8 Global Step: 149910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:33:50,063-Speed 9549.34 samples/sec Loss 6.0724 LearningRate 0.0303 Epoch: 8 Global Step: 149920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:33:51,144-Speed 9476.33 samples/sec Loss 6.1622 LearningRate 0.0303 Epoch: 8 Global Step: 149930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:33:52,213-Speed 9586.67 samples/sec Loss 6.1694 LearningRate 0.0303 Epoch: 8 Global Step: 149940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:33:53,274-Speed 9654.41 samples/sec Loss 6.2272 LearningRate 0.0303 Epoch: 8 Global Step: 149950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:33:54,382-Speed 9246.10 samples/sec Loss 6.0969 LearningRate 0.0303 Epoch: 8 Global Step: 149960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:55,457-Speed 9531.58 samples/sec Loss 6.1849 LearningRate 0.0303 Epoch: 8 Global Step: 149970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:56,514-Speed 9690.01 samples/sec Loss 6.1566 LearningRate 0.0303 Epoch: 8 Global Step: 149980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:57,587-Speed 9556.13 samples/sec Loss 6.1493 LearningRate 0.0303 Epoch: 8 Global Step: 149990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:33:58,723-Speed 9015.02 samples/sec Loss 6.1378 LearningRate 0.0303 Epoch: 8 Global Step: 150000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:34:20,662-[lfw][150000]XNorm: 10.094884 Training: 2022-04-11 17:34:20,663-[lfw][150000]Accuracy-Flip: 0.99650+-0.00283 Training: 2022-04-11 17:34:20,663-[lfw][150000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:34:46,006-[cfp_fp][150000]XNorm: 8.577838 Training: 2022-04-11 17:34:46,007-[cfp_fp][150000]Accuracy-Flip: 0.95886+-0.00966 Training: 2022-04-11 17:34:46,007-[cfp_fp][150000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:35:07,852-[agedb_30][150000]XNorm: 9.737256 Training: 2022-04-11 17:35:07,853-[agedb_30][150000]Accuracy-Flip: 0.96567+-0.00676 Training: 2022-04-11 17:35:07,853-[agedb_30][150000]Accuracy-Highest: 0.96783 Training: 2022-04-11 17:35:08,985-Speed 145.74 samples/sec Loss 6.1997 LearningRate 0.0303 Epoch: 8 Global Step: 150010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:10,022-Speed 9876.72 samples/sec Loss 6.0972 LearningRate 0.0303 Epoch: 8 Global Step: 150020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:11,076-Speed 9719.98 samples/sec Loss 6.2037 LearningRate 0.0303 Epoch: 8 Global Step: 150030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:12,156-Speed 9491.26 samples/sec Loss 6.0296 LearningRate 0.0303 Epoch: 8 Global Step: 150040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:13,252-Speed 9344.10 samples/sec Loss 6.2588 LearningRate 0.0303 Epoch: 8 Global Step: 150050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:14,317-Speed 9619.81 samples/sec Loss 6.1438 LearningRate 0.0303 Epoch: 8 Global Step: 150060 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:35:15,399-Speed 9478.49 samples/sec Loss 6.1572 LearningRate 0.0303 Epoch: 8 Global Step: 150070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:16,476-Speed 9511.15 samples/sec Loss 6.2452 LearningRate 0.0303 Epoch: 8 Global Step: 150080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:17,587-Speed 9221.50 samples/sec Loss 6.2671 LearningRate 0.0303 Epoch: 8 Global Step: 150090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:18,683-Speed 9345.40 samples/sec Loss 6.2647 LearningRate 0.0303 Epoch: 8 Global Step: 150100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:19,760-Speed 9516.80 samples/sec Loss 6.2585 LearningRate 0.0303 Epoch: 8 Global Step: 150110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:20,830-Speed 9570.28 samples/sec Loss 6.1465 LearningRate 0.0303 Epoch: 8 Global Step: 150120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:21,904-Speed 9540.45 samples/sec Loss 6.0186 LearningRate 0.0303 Epoch: 8 Global Step: 150130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:22,969-Speed 9625.98 samples/sec Loss 6.0589 LearningRate 0.0303 Epoch: 8 Global Step: 150140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:24,024-Speed 9709.48 samples/sec Loss 6.1551 LearningRate 0.0303 Epoch: 8 Global Step: 150150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:25,154-Speed 9063.59 samples/sec Loss 6.1579 LearningRate 0.0303 Epoch: 8 Global Step: 150160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:26,243-Speed 9410.79 samples/sec Loss 6.1888 LearningRate 0.0303 Epoch: 8 Global Step: 150170 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:35:27,307-Speed 9624.96 samples/sec Loss 6.1795 LearningRate 0.0303 Epoch: 8 Global Step: 150180 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:35:28,400-Speed 9377.54 samples/sec Loss 6.2236 LearningRate 0.0303 Epoch: 8 Global Step: 150190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:29,461-Speed 9662.02 samples/sec Loss 6.1764 LearningRate 0.0303 Epoch: 8 Global Step: 150200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:35:30,779-Speed 7774.79 samples/sec Loss 6.1648 LearningRate 0.0303 Epoch: 8 Global Step: 150210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:00,098-Speed 349.27 samples/sec Loss 6.0630 LearningRate 0.0302 Epoch: 9 Global Step: 150220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:02,338-Speed 4575.05 samples/sec Loss 5.4139 LearningRate 0.0302 Epoch: 9 Global Step: 150230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:04,245-Speed 5373.38 samples/sec Loss 5.3798 LearningRate 0.0302 Epoch: 9 Global Step: 150240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:05,550-Speed 7847.79 samples/sec Loss 5.4377 LearningRate 0.0302 Epoch: 9 Global Step: 150250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:06,635-Speed 9440.83 samples/sec Loss 5.3727 LearningRate 0.0302 Epoch: 9 Global Step: 150260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:08,347-Speed 5986.79 samples/sec Loss 5.3082 LearningRate 0.0302 Epoch: 9 Global Step: 150270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:09,617-Speed 8069.09 samples/sec Loss 5.3834 LearningRate 0.0302 Epoch: 9 Global Step: 150280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:10,743-Speed 9093.83 samples/sec Loss 5.4596 LearningRate 0.0302 Epoch: 9 Global Step: 150290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:11,874-Speed 9062.26 samples/sec Loss 5.4681 LearningRate 0.0302 Epoch: 9 Global Step: 150300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:13,039-Speed 8794.37 samples/sec Loss 5.4692 LearningRate 0.0302 Epoch: 9 Global Step: 150310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:14,125-Speed 9439.30 samples/sec Loss 5.4152 LearningRate 0.0302 Epoch: 9 Global Step: 150320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:36:15,250-Speed 9102.72 samples/sec Loss 5.3779 LearningRate 0.0302 Epoch: 9 Global Step: 150330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:36:16,354-Speed 9277.53 samples/sec Loss 5.3618 LearningRate 0.0302 Epoch: 9 Global Step: 150340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:36:17,485-Speed 9062.45 samples/sec Loss 5.3853 LearningRate 0.0302 Epoch: 9 Global Step: 150350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:36:18,559-Speed 9538.20 samples/sec Loss 5.4133 LearningRate 0.0302 Epoch: 9 Global Step: 150360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:36:19,689-Speed 9068.36 samples/sec Loss 5.3533 LearningRate 0.0302 Epoch: 9 Global Step: 150370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:36:20,801-Speed 9218.49 samples/sec Loss 5.4325 LearningRate 0.0302 Epoch: 9 Global Step: 150380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:36:21,944-Speed 8961.73 samples/sec Loss 5.3775 LearningRate 0.0302 Epoch: 9 Global Step: 150390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:36:23,047-Speed 9291.67 samples/sec Loss 5.3122 LearningRate 0.0302 Epoch: 9 Global Step: 150400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:36:24,145-Speed 9334.17 samples/sec Loss 5.3432 LearningRate 0.0302 Epoch: 9 Global Step: 150410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:36:25,297-Speed 8892.61 samples/sec Loss 5.4207 LearningRate 0.0302 Epoch: 9 Global Step: 150420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:26,419-Speed 9132.14 samples/sec Loss 5.4448 LearningRate 0.0302 Epoch: 9 Global Step: 150430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:27,547-Speed 9078.70 samples/sec Loss 5.4171 LearningRate 0.0302 Epoch: 9 Global Step: 150440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:28,674-Speed 9088.58 samples/sec Loss 5.4746 LearningRate 0.0302 Epoch: 9 Global Step: 150450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:29,832-Speed 8847.49 samples/sec Loss 5.5185 LearningRate 0.0302 Epoch: 9 Global Step: 150460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:30,961-Speed 9075.52 samples/sec Loss 5.4657 LearningRate 0.0302 Epoch: 9 Global Step: 150470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:32,086-Speed 9107.66 samples/sec Loss 5.4316 LearningRate 0.0302 Epoch: 9 Global Step: 150480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:33,239-Speed 8885.66 samples/sec Loss 5.2612 LearningRate 0.0302 Epoch: 9 Global Step: 150490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:34,350-Speed 9230.45 samples/sec Loss 5.3448 LearningRate 0.0302 Epoch: 9 Global Step: 150500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:35,415-Speed 9627.19 samples/sec Loss 5.4667 LearningRate 0.0302 Epoch: 9 Global Step: 150510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:36,539-Speed 9113.70 samples/sec Loss 5.5063 LearningRate 0.0302 Epoch: 9 Global Step: 150520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:36:37,626-Speed 9426.93 samples/sec Loss 5.4556 LearningRate 0.0301 Epoch: 9 Global Step: 150530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:38,741-Speed 9189.16 samples/sec Loss 5.4500 LearningRate 0.0301 Epoch: 9 Global Step: 150540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:39,791-Speed 9760.83 samples/sec Loss 5.4070 LearningRate 0.0301 Epoch: 9 Global Step: 150550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:40,873-Speed 9468.75 samples/sec Loss 5.4377 LearningRate 0.0301 Epoch: 9 Global Step: 150560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:41,959-Speed 9436.93 samples/sec Loss 5.3901 LearningRate 0.0301 Epoch: 9 Global Step: 150570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:43,109-Speed 8908.59 samples/sec Loss 5.3807 LearningRate 0.0301 Epoch: 9 Global Step: 150580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:44,232-Speed 9119.57 samples/sec Loss 5.4172 LearningRate 0.0301 Epoch: 9 Global Step: 150590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:45,309-Speed 9513.78 samples/sec Loss 5.3809 LearningRate 0.0301 Epoch: 9 Global Step: 150600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:46,374-Speed 9625.54 samples/sec Loss 5.5130 LearningRate 0.0301 Epoch: 9 Global Step: 150610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:47,441-Speed 9604.82 samples/sec Loss 5.5298 LearningRate 0.0301 Epoch: 9 Global Step: 150620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:48,501-Speed 9661.31 samples/sec Loss 5.4709 LearningRate 0.0301 Epoch: 9 Global Step: 150630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:49,642-Speed 8978.00 samples/sec Loss 5.5057 LearningRate 0.0301 Epoch: 9 Global Step: 150640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:50,738-Speed 9353.91 samples/sec Loss 5.5233 LearningRate 0.0301 Epoch: 9 Global Step: 150650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:51,872-Speed 9036.79 samples/sec Loss 5.4363 LearningRate 0.0301 Epoch: 9 Global Step: 150660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:52,981-Speed 9238.82 samples/sec Loss 5.5406 LearningRate 0.0301 Epoch: 9 Global Step: 150670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:54,048-Speed 9598.99 samples/sec Loss 5.4025 LearningRate 0.0301 Epoch: 9 Global Step: 150680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:55,108-Speed 9666.54 samples/sec Loss 5.4426 LearningRate 0.0301 Epoch: 9 Global Step: 150690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:56,173-Speed 9620.70 samples/sec Loss 5.4489 LearningRate 0.0301 Epoch: 9 Global Step: 150700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:57,285-Speed 9214.93 samples/sec Loss 5.4858 LearningRate 0.0301 Epoch: 9 Global Step: 150710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:58,359-Speed 9536.13 samples/sec Loss 5.4539 LearningRate 0.0301 Epoch: 9 Global Step: 150720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:36:59,589-Speed 8331.67 samples/sec Loss 5.4516 LearningRate 0.0301 Epoch: 9 Global Step: 150730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:37:00,656-Speed 9602.11 samples/sec Loss 5.5425 LearningRate 0.0301 Epoch: 9 Global Step: 150740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:01,751-Speed 9358.72 samples/sec Loss 5.5433 LearningRate 0.0301 Epoch: 9 Global Step: 150750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:02,832-Speed 9473.75 samples/sec Loss 5.5326 LearningRate 0.0301 Epoch: 9 Global Step: 150760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:03,926-Speed 9368.10 samples/sec Loss 5.4254 LearningRate 0.0301 Epoch: 9 Global Step: 150770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:05,035-Speed 9236.79 samples/sec Loss 5.5012 LearningRate 0.0301 Epoch: 9 Global Step: 150780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:06,149-Speed 9198.31 samples/sec Loss 5.5760 LearningRate 0.0301 Epoch: 9 Global Step: 150790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:07,311-Speed 8819.42 samples/sec Loss 5.4466 LearningRate 0.0301 Epoch: 9 Global Step: 150800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:08,416-Speed 9271.33 samples/sec Loss 5.4837 LearningRate 0.0301 Epoch: 9 Global Step: 150810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:09,515-Speed 9320.78 samples/sec Loss 5.5947 LearningRate 0.0301 Epoch: 9 Global Step: 150820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:10,598-Speed 9460.35 samples/sec Loss 5.5421 LearningRate 0.0300 Epoch: 9 Global Step: 150830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:11,686-Speed 9419.42 samples/sec Loss 5.5425 LearningRate 0.0300 Epoch: 9 Global Step: 150840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:12,774-Speed 9421.79 samples/sec Loss 5.4956 LearningRate 0.0300 Epoch: 9 Global Step: 150850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:13,857-Speed 9465.86 samples/sec Loss 5.5874 LearningRate 0.0300 Epoch: 9 Global Step: 150860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:15,391-Speed 6677.02 samples/sec Loss 5.5778 LearningRate 0.0300 Epoch: 9 Global Step: 150870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:16,652-Speed 8129.59 samples/sec Loss 5.5778 LearningRate 0.0300 Epoch: 9 Global Step: 150880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:17,907-Speed 8159.66 samples/sec Loss 5.5394 LearningRate 0.0300 Epoch: 9 Global Step: 150890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:18,966-Speed 9679.31 samples/sec Loss 5.5212 LearningRate 0.0300 Epoch: 9 Global Step: 150900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:20,251-Speed 7971.75 samples/sec Loss 5.6131 LearningRate 0.0300 Epoch: 9 Global Step: 150910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:21,349-Speed 9333.61 samples/sec Loss 5.5344 LearningRate 0.0300 Epoch: 9 Global Step: 150920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:22,491-Speed 8968.17 samples/sec Loss 5.4429 LearningRate 0.0300 Epoch: 9 Global Step: 150930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:23,793-Speed 7870.37 samples/sec Loss 5.5605 LearningRate 0.0300 Epoch: 9 Global Step: 150940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:24,881-Speed 9413.19 samples/sec Loss 5.4143 LearningRate 0.0300 Epoch: 9 Global Step: 150950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:25,952-Speed 9574.74 samples/sec Loss 5.5971 LearningRate 0.0300 Epoch: 9 Global Step: 150960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:27,038-Speed 9433.08 samples/sec Loss 5.5256 LearningRate 0.0300 Epoch: 9 Global Step: 150970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:28,118-Speed 9480.60 samples/sec Loss 5.4241 LearningRate 0.0300 Epoch: 9 Global Step: 150980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:29,252-Speed 9036.55 samples/sec Loss 5.5391 LearningRate 0.0300 Epoch: 9 Global Step: 150990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:30,345-Speed 9376.58 samples/sec Loss 5.5402 LearningRate 0.0300 Epoch: 9 Global Step: 151000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:31,478-Speed 9042.87 samples/sec Loss 5.5542 LearningRate 0.0300 Epoch: 9 Global Step: 151010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:32,557-Speed 9496.91 samples/sec Loss 5.6175 LearningRate 0.0300 Epoch: 9 Global Step: 151020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:33,668-Speed 9226.41 samples/sec Loss 5.4324 LearningRate 0.0300 Epoch: 9 Global Step: 151030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:34,777-Speed 9240.00 samples/sec Loss 5.5813 LearningRate 0.0300 Epoch: 9 Global Step: 151040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:35,919-Speed 8971.67 samples/sec Loss 5.5682 LearningRate 0.0300 Epoch: 9 Global Step: 151050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:37,019-Speed 9315.71 samples/sec Loss 5.5581 LearningRate 0.0300 Epoch: 9 Global Step: 151060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:38,102-Speed 9461.60 samples/sec Loss 5.5166 LearningRate 0.0300 Epoch: 9 Global Step: 151070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:39,217-Speed 9186.05 samples/sec Loss 5.6788 LearningRate 0.0300 Epoch: 9 Global Step: 151080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:40,293-Speed 9521.02 samples/sec Loss 5.5725 LearningRate 0.0300 Epoch: 9 Global Step: 151090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:41,355-Speed 9648.74 samples/sec Loss 5.5432 LearningRate 0.0300 Epoch: 9 Global Step: 151100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:42,417-Speed 9652.17 samples/sec Loss 5.5271 LearningRate 0.0300 Epoch: 9 Global Step: 151110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:43,529-Speed 9214.03 samples/sec Loss 5.4796 LearningRate 0.0300 Epoch: 9 Global Step: 151120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:44,606-Speed 9516.01 samples/sec Loss 5.5500 LearningRate 0.0300 Epoch: 9 Global Step: 151130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:45,745-Speed 8994.39 samples/sec Loss 5.6272 LearningRate 0.0299 Epoch: 9 Global Step: 151140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:46,849-Speed 9279.46 samples/sec Loss 5.5479 LearningRate 0.0299 Epoch: 9 Global Step: 151150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:47,919-Speed 9574.19 samples/sec Loss 5.5992 LearningRate 0.0299 Epoch: 9 Global Step: 151160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:49,022-Speed 9286.03 samples/sec Loss 5.5172 LearningRate 0.0299 Epoch: 9 Global Step: 151170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:50,107-Speed 9447.77 samples/sec Loss 5.6389 LearningRate 0.0299 Epoch: 9 Global Step: 151180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:51,216-Speed 9244.31 samples/sec Loss 5.5207 LearningRate 0.0299 Epoch: 9 Global Step: 151190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:52,312-Speed 9345.35 samples/sec Loss 5.6363 LearningRate 0.0299 Epoch: 9 Global Step: 151200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:53,398-Speed 9438.80 samples/sec Loss 5.5180 LearningRate 0.0299 Epoch: 9 Global Step: 151210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:54,554-Speed 8856.41 samples/sec Loss 5.6633 LearningRate 0.0299 Epoch: 9 Global Step: 151220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:55,673-Speed 9156.47 samples/sec Loss 5.6301 LearningRate 0.0299 Epoch: 9 Global Step: 151230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:37:56,777-Speed 9282.39 samples/sec Loss 5.4974 LearningRate 0.0299 Epoch: 9 Global Step: 151240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:57,836-Speed 9676.96 samples/sec Loss 5.5128 LearningRate 0.0299 Epoch: 9 Global Step: 151250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:37:58,936-Speed 9314.94 samples/sec Loss 5.6660 LearningRate 0.0299 Epoch: 9 Global Step: 151260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:38:00,048-Speed 9218.81 samples/sec Loss 5.4727 LearningRate 0.0299 Epoch: 9 Global Step: 151270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:38:01,152-Speed 9280.43 samples/sec Loss 5.5228 LearningRate 0.0299 Epoch: 9 Global Step: 151280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:38:02,218-Speed 9603.90 samples/sec Loss 5.6064 LearningRate 0.0299 Epoch: 9 Global Step: 151290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:38:03,380-Speed 8821.99 samples/sec Loss 5.5925 LearningRate 0.0299 Epoch: 9 Global Step: 151300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:38:04,535-Speed 8865.00 samples/sec Loss 5.5437 LearningRate 0.0299 Epoch: 9 Global Step: 151310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:38:05,649-Speed 9201.22 samples/sec Loss 5.5735 LearningRate 0.0299 Epoch: 9 Global Step: 151320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:38:06,773-Speed 9116.70 samples/sec Loss 5.6522 LearningRate 0.0299 Epoch: 9 Global Step: 151330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:38:07,862-Speed 9409.67 samples/sec Loss 5.5525 LearningRate 0.0299 Epoch: 9 Global Step: 151340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:08,992-Speed 9069.17 samples/sec Loss 5.5101 LearningRate 0.0299 Epoch: 9 Global Step: 151350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:10,075-Speed 9464.65 samples/sec Loss 5.6729 LearningRate 0.0299 Epoch: 9 Global Step: 151360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:11,199-Speed 9110.44 samples/sec Loss 5.6250 LearningRate 0.0299 Epoch: 9 Global Step: 151370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:12,305-Speed 9267.74 samples/sec Loss 5.4947 LearningRate 0.0299 Epoch: 9 Global Step: 151380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:13,403-Speed 9333.33 samples/sec Loss 5.5384 LearningRate 0.0299 Epoch: 9 Global Step: 151390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:14,515-Speed 9210.83 samples/sec Loss 5.5828 LearningRate 0.0299 Epoch: 9 Global Step: 151400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:15,613-Speed 9334.45 samples/sec Loss 5.5025 LearningRate 0.0299 Epoch: 9 Global Step: 151410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:16,720-Speed 9254.69 samples/sec Loss 5.5566 LearningRate 0.0299 Epoch: 9 Global Step: 151420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:17,793-Speed 9546.90 samples/sec Loss 5.5255 LearningRate 0.0299 Epoch: 9 Global Step: 151430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:18,888-Speed 9362.38 samples/sec Loss 5.6088 LearningRate 0.0298 Epoch: 9 Global Step: 151440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:38:20,032-Speed 8953.20 samples/sec Loss 5.5319 LearningRate 0.0298 Epoch: 9 Global Step: 151450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:21,122-Speed 9403.54 samples/sec Loss 5.5602 LearningRate 0.0298 Epoch: 9 Global Step: 151460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:22,270-Speed 8924.53 samples/sec Loss 5.6007 LearningRate 0.0298 Epoch: 9 Global Step: 151470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:23,350-Speed 9487.22 samples/sec Loss 5.5628 LearningRate 0.0298 Epoch: 9 Global Step: 151480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:24,489-Speed 8989.66 samples/sec Loss 5.5723 LearningRate 0.0298 Epoch: 9 Global Step: 151490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:25,574-Speed 9451.38 samples/sec Loss 5.6442 LearningRate 0.0298 Epoch: 9 Global Step: 151500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:26,684-Speed 9239.14 samples/sec Loss 5.5662 LearningRate 0.0298 Epoch: 9 Global Step: 151510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:27,794-Speed 9228.64 samples/sec Loss 5.4624 LearningRate 0.0298 Epoch: 9 Global Step: 151520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:28,930-Speed 9019.95 samples/sec Loss 5.6658 LearningRate 0.0298 Epoch: 9 Global Step: 151530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:29,994-Speed 9635.34 samples/sec Loss 5.6606 LearningRate 0.0298 Epoch: 9 Global Step: 151540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:31,090-Speed 9342.91 samples/sec Loss 5.5510 LearningRate 0.0298 Epoch: 9 Global Step: 151550 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:38:32,172-Speed 9471.86 samples/sec Loss 5.6174 LearningRate 0.0298 Epoch: 9 Global Step: 151560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:33,284-Speed 9209.17 samples/sec Loss 5.6426 LearningRate 0.0298 Epoch: 9 Global Step: 151570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:34,380-Speed 9350.98 samples/sec Loss 5.5438 LearningRate 0.0298 Epoch: 9 Global Step: 151580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:35,454-Speed 9540.61 samples/sec Loss 5.5402 LearningRate 0.0298 Epoch: 9 Global Step: 151590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:36,541-Speed 9424.72 samples/sec Loss 5.5722 LearningRate 0.0298 Epoch: 9 Global Step: 151600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:37,609-Speed 9592.70 samples/sec Loss 5.6233 LearningRate 0.0298 Epoch: 9 Global Step: 151610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:38,696-Speed 9428.04 samples/sec Loss 5.6403 LearningRate 0.0298 Epoch: 9 Global Step: 151620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:39,745-Speed 9767.72 samples/sec Loss 5.6512 LearningRate 0.0298 Epoch: 9 Global Step: 151630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:40,796-Speed 9750.25 samples/sec Loss 5.5991 LearningRate 0.0298 Epoch: 9 Global Step: 151640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:41,847-Speed 9745.29 samples/sec Loss 5.6581 LearningRate 0.0298 Epoch: 9 Global Step: 151650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:42,974-Speed 9088.70 samples/sec Loss 5.6164 LearningRate 0.0298 Epoch: 9 Global Step: 151660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:38:44,116-Speed 8978.42 samples/sec Loss 5.6443 LearningRate 0.0298 Epoch: 9 Global Step: 151670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:38:45,191-Speed 9532.16 samples/sec Loss 5.6351 LearningRate 0.0298 Epoch: 9 Global Step: 151680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:38:46,299-Speed 9255.74 samples/sec Loss 5.6403 LearningRate 0.0298 Epoch: 9 Global Step: 151690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:38:47,398-Speed 9315.68 samples/sec Loss 5.5469 LearningRate 0.0298 Epoch: 9 Global Step: 151700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:38:48,498-Speed 9314.52 samples/sec Loss 5.5902 LearningRate 0.0298 Epoch: 9 Global Step: 151710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:49,558-Speed 9674.75 samples/sec Loss 5.5858 LearningRate 0.0298 Epoch: 9 Global Step: 151720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:50,697-Speed 8988.45 samples/sec Loss 5.6268 LearningRate 0.0298 Epoch: 9 Global Step: 151730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:51,804-Speed 9257.52 samples/sec Loss 5.5681 LearningRate 0.0298 Epoch: 9 Global Step: 151740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:52,891-Speed 9422.90 samples/sec Loss 5.6623 LearningRate 0.0297 Epoch: 9 Global Step: 151750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:54,041-Speed 8909.77 samples/sec Loss 5.6893 LearningRate 0.0297 Epoch: 9 Global Step: 151760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:55,113-Speed 9561.03 samples/sec Loss 5.7100 LearningRate 0.0297 Epoch: 9 Global Step: 151770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:56,223-Speed 9236.62 samples/sec Loss 5.5553 LearningRate 0.0297 Epoch: 9 Global Step: 151780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:57,334-Speed 9217.48 samples/sec Loss 5.5864 LearningRate 0.0297 Epoch: 9 Global Step: 151790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:58,415-Speed 9480.64 samples/sec Loss 5.6573 LearningRate 0.0297 Epoch: 9 Global Step: 151800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:38:59,503-Speed 9415.28 samples/sec Loss 5.7141 LearningRate 0.0297 Epoch: 9 Global Step: 151810 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:39:00,560-Speed 9694.60 samples/sec Loss 5.6704 LearningRate 0.0297 Epoch: 9 Global Step: 151820 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:39:01,633-Speed 9552.93 samples/sec Loss 5.6138 LearningRate 0.0297 Epoch: 9 Global Step: 151830 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:39:02,757-Speed 9113.87 samples/sec Loss 5.6284 LearningRate 0.0297 Epoch: 9 Global Step: 151840 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:39:03,896-Speed 8999.56 samples/sec Loss 5.7053 LearningRate 0.0297 Epoch: 9 Global Step: 151850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:04,989-Speed 9377.62 samples/sec Loss 5.5859 LearningRate 0.0297 Epoch: 9 Global Step: 151860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:06,079-Speed 9394.50 samples/sec Loss 5.6435 LearningRate 0.0297 Epoch: 9 Global Step: 151870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:07,178-Speed 9325.86 samples/sec Loss 5.5388 LearningRate 0.0297 Epoch: 9 Global Step: 151880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:08,263-Speed 9447.90 samples/sec Loss 5.5950 LearningRate 0.0297 Epoch: 9 Global Step: 151890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:09,380-Speed 9168.08 samples/sec Loss 5.6305 LearningRate 0.0297 Epoch: 9 Global Step: 151900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:10,441-Speed 9660.56 samples/sec Loss 5.6578 LearningRate 0.0297 Epoch: 9 Global Step: 151910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:11,565-Speed 9111.93 samples/sec Loss 5.6461 LearningRate 0.0297 Epoch: 9 Global Step: 151920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:12,628-Speed 9637.80 samples/sec Loss 5.6973 LearningRate 0.0297 Epoch: 9 Global Step: 151930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:13,703-Speed 9536.78 samples/sec Loss 5.6267 LearningRate 0.0297 Epoch: 9 Global Step: 151940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:14,773-Speed 9578.48 samples/sec Loss 5.6743 LearningRate 0.0297 Epoch: 9 Global Step: 151950 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:39:15,851-Speed 9502.05 samples/sec Loss 5.6548 LearningRate 0.0297 Epoch: 9 Global Step: 151960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:39:16,929-Speed 9508.14 samples/sec Loss 5.7430 LearningRate 0.0297 Epoch: 9 Global Step: 151970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:18,050-Speed 9133.68 samples/sec Loss 5.7389 LearningRate 0.0297 Epoch: 9 Global Step: 151980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:19,104-Speed 9725.88 samples/sec Loss 5.5751 LearningRate 0.0297 Epoch: 9 Global Step: 151990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:20,221-Speed 9171.19 samples/sec Loss 5.5875 LearningRate 0.0297 Epoch: 9 Global Step: 152000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:39:42,250-[lfw][152000]XNorm: 10.094522 Training: 2022-04-11 17:39:42,251-[lfw][152000]Accuracy-Flip: 0.99617+-0.00236 Training: 2022-04-11 17:39:42,251-[lfw][152000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:40:07,643-[cfp_fp][152000]XNorm: 8.675968 Training: 2022-04-11 17:40:07,644-[cfp_fp][152000]Accuracy-Flip: 0.96314+-0.00912 Training: 2022-04-11 17:40:07,644-[cfp_fp][152000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:40:29,539-[agedb_30][152000]XNorm: 9.791307 Training: 2022-04-11 17:40:29,540-[agedb_30][152000]Accuracy-Flip: 0.96700+-0.00819 Training: 2022-04-11 17:40:29,540-[agedb_30][152000]Accuracy-Highest: 0.96783 Training: 2022-04-11 17:40:30,643-Speed 145.41 samples/sec Loss 5.5529 LearningRate 0.0297 Epoch: 9 Global Step: 152010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:31,732-Speed 9407.80 samples/sec Loss 5.7041 LearningRate 0.0297 Epoch: 9 Global Step: 152020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:32,813-Speed 9479.96 samples/sec Loss 5.5241 LearningRate 0.0297 Epoch: 9 Global Step: 152030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:33,914-Speed 9298.90 samples/sec Loss 5.6040 LearningRate 0.0297 Epoch: 9 Global Step: 152040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:34,990-Speed 9525.27 samples/sec Loss 5.6037 LearningRate 0.0296 Epoch: 9 Global Step: 152050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:36,074-Speed 9450.67 samples/sec Loss 5.7864 LearningRate 0.0296 Epoch: 9 Global Step: 152060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:37,192-Speed 9167.40 samples/sec Loss 5.6711 LearningRate 0.0296 Epoch: 9 Global Step: 152070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:38,283-Speed 9397.43 samples/sec Loss 5.7700 LearningRate 0.0296 Epoch: 9 Global Step: 152080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:39,372-Speed 9403.86 samples/sec Loss 5.7427 LearningRate 0.0296 Epoch: 9 Global Step: 152090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:40,513-Speed 8982.47 samples/sec Loss 5.6514 LearningRate 0.0296 Epoch: 9 Global Step: 152100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:41,588-Speed 9524.00 samples/sec Loss 5.7275 LearningRate 0.0296 Epoch: 9 Global Step: 152110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:42,669-Speed 9479.51 samples/sec Loss 5.6288 LearningRate 0.0296 Epoch: 9 Global Step: 152120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:43,771-Speed 9297.97 samples/sec Loss 5.6721 LearningRate 0.0296 Epoch: 9 Global Step: 152130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:44,870-Speed 9319.79 samples/sec Loss 5.6936 LearningRate 0.0296 Epoch: 9 Global Step: 152140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:45,953-Speed 9460.44 samples/sec Loss 5.6776 LearningRate 0.0296 Epoch: 9 Global Step: 152150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:47,023-Speed 9581.77 samples/sec Loss 5.8072 LearningRate 0.0296 Epoch: 9 Global Step: 152160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:48,110-Speed 9426.29 samples/sec Loss 5.6729 LearningRate 0.0296 Epoch: 9 Global Step: 152170 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:40:49,187-Speed 9509.41 samples/sec Loss 5.7403 LearningRate 0.0296 Epoch: 9 Global Step: 152180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:50,298-Speed 9227.82 samples/sec Loss 5.6967 LearningRate 0.0296 Epoch: 9 Global Step: 152190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:51,404-Speed 9258.13 samples/sec Loss 5.7227 LearningRate 0.0296 Epoch: 9 Global Step: 152200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:52,507-Speed 9299.93 samples/sec Loss 5.7318 LearningRate 0.0296 Epoch: 9 Global Step: 152210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:53,622-Speed 9194.41 samples/sec Loss 5.6977 LearningRate 0.0296 Epoch: 9 Global Step: 152220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:54,736-Speed 9192.44 samples/sec Loss 5.6784 LearningRate 0.0296 Epoch: 9 Global Step: 152230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:55,826-Speed 9401.14 samples/sec Loss 5.8298 LearningRate 0.0296 Epoch: 9 Global Step: 152240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:56,885-Speed 9681.89 samples/sec Loss 5.6298 LearningRate 0.0296 Epoch: 9 Global Step: 152250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:57,990-Speed 9270.29 samples/sec Loss 5.6406 LearningRate 0.0296 Epoch: 9 Global Step: 152260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:40:59,063-Speed 9543.98 samples/sec Loss 5.7621 LearningRate 0.0296 Epoch: 9 Global Step: 152270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:00,178-Speed 9189.71 samples/sec Loss 5.8145 LearningRate 0.0296 Epoch: 9 Global Step: 152280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:01,267-Speed 9417.19 samples/sec Loss 5.6694 LearningRate 0.0296 Epoch: 9 Global Step: 152290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:02,388-Speed 9139.03 samples/sec Loss 5.7120 LearningRate 0.0296 Epoch: 9 Global Step: 152300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:03,489-Speed 9300.99 samples/sec Loss 5.7040 LearningRate 0.0296 Epoch: 9 Global Step: 152310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:04,575-Speed 9440.57 samples/sec Loss 5.7143 LearningRate 0.0296 Epoch: 9 Global Step: 152320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:05,636-Speed 9651.45 samples/sec Loss 5.7191 LearningRate 0.0296 Epoch: 9 Global Step: 152330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:06,709-Speed 9554.50 samples/sec Loss 5.6301 LearningRate 0.0296 Epoch: 9 Global Step: 152340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:07,819-Speed 9228.18 samples/sec Loss 5.5611 LearningRate 0.0296 Epoch: 9 Global Step: 152350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:08,923-Speed 9277.47 samples/sec Loss 5.6450 LearningRate 0.0295 Epoch: 9 Global Step: 152360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:10,006-Speed 9459.59 samples/sec Loss 5.7490 LearningRate 0.0295 Epoch: 9 Global Step: 152370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:11,096-Speed 9407.56 samples/sec Loss 5.7318 LearningRate 0.0295 Epoch: 9 Global Step: 152380 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:41:12,163-Speed 9601.99 samples/sec Loss 5.7816 LearningRate 0.0295 Epoch: 9 Global Step: 152390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:13,289-Speed 9099.58 samples/sec Loss 5.7347 LearningRate 0.0295 Epoch: 9 Global Step: 152400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:14,360-Speed 9562.01 samples/sec Loss 5.6910 LearningRate 0.0295 Epoch: 9 Global Step: 152410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:15,470-Speed 9230.93 samples/sec Loss 5.6718 LearningRate 0.0295 Epoch: 9 Global Step: 152420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:16,552-Speed 9474.50 samples/sec Loss 5.7799 LearningRate 0.0295 Epoch: 9 Global Step: 152430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:17,629-Speed 9512.64 samples/sec Loss 5.7461 LearningRate 0.0295 Epoch: 9 Global Step: 152440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:18,717-Speed 9415.98 samples/sec Loss 5.7086 LearningRate 0.0295 Epoch: 9 Global Step: 152450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:19,819-Speed 9298.98 samples/sec Loss 5.6948 LearningRate 0.0295 Epoch: 9 Global Step: 152460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:20,886-Speed 9606.14 samples/sec Loss 5.8271 LearningRate 0.0295 Epoch: 9 Global Step: 152470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:21,946-Speed 9663.46 samples/sec Loss 5.7098 LearningRate 0.0295 Epoch: 9 Global Step: 152480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:23,052-Speed 9263.73 samples/sec Loss 5.7136 LearningRate 0.0295 Epoch: 9 Global Step: 152490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:41:24,143-Speed 9390.24 samples/sec Loss 5.6789 LearningRate 0.0295 Epoch: 9 Global Step: 152500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:41:25,239-Speed 9355.04 samples/sec Loss 5.6373 LearningRate 0.0295 Epoch: 9 Global Step: 152510 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:41:26,308-Speed 9584.65 samples/sec Loss 5.7515 LearningRate 0.0295 Epoch: 9 Global Step: 152520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:41:27,409-Speed 9302.25 samples/sec Loss 5.6613 LearningRate 0.0295 Epoch: 9 Global Step: 152530 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:41:28,513-Speed 9282.52 samples/sec Loss 5.6690 LearningRate 0.0295 Epoch: 9 Global Step: 152540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:41:29,639-Speed 9097.40 samples/sec Loss 5.6555 LearningRate 0.0295 Epoch: 9 Global Step: 152550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:30,695-Speed 9710.32 samples/sec Loss 5.6493 LearningRate 0.0295 Epoch: 9 Global Step: 152560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:31,769-Speed 9534.86 samples/sec Loss 5.7608 LearningRate 0.0295 Epoch: 9 Global Step: 152570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:32,825-Speed 9707.44 samples/sec Loss 5.6914 LearningRate 0.0295 Epoch: 9 Global Step: 152580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:33,928-Speed 9290.72 samples/sec Loss 5.7064 LearningRate 0.0295 Epoch: 9 Global Step: 152590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:35,018-Speed 9396.81 samples/sec Loss 5.7386 LearningRate 0.0295 Epoch: 9 Global Step: 152600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:36,095-Speed 9519.61 samples/sec Loss 5.7110 LearningRate 0.0295 Epoch: 9 Global Step: 152610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:37,217-Speed 9130.80 samples/sec Loss 5.7550 LearningRate 0.0295 Epoch: 9 Global Step: 152620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:38,317-Speed 9311.84 samples/sec Loss 5.7292 LearningRate 0.0295 Epoch: 9 Global Step: 152630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:39,426-Speed 9235.25 samples/sec Loss 5.7134 LearningRate 0.0295 Epoch: 9 Global Step: 152640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:40,495-Speed 9588.01 samples/sec Loss 5.7040 LearningRate 0.0295 Epoch: 9 Global Step: 152650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:41:41,571-Speed 9518.65 samples/sec Loss 5.8050 LearningRate 0.0295 Epoch: 9 Global Step: 152660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:42,679-Speed 9249.01 samples/sec Loss 5.7296 LearningRate 0.0294 Epoch: 9 Global Step: 152670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:43,758-Speed 9494.41 samples/sec Loss 5.6368 LearningRate 0.0294 Epoch: 9 Global Step: 152680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:44,864-Speed 9261.89 samples/sec Loss 5.7033 LearningRate 0.0294 Epoch: 9 Global Step: 152690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:45,929-Speed 9621.67 samples/sec Loss 5.6934 LearningRate 0.0294 Epoch: 9 Global Step: 152700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:47,080-Speed 8906.86 samples/sec Loss 5.7253 LearningRate 0.0294 Epoch: 9 Global Step: 152710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:48,184-Speed 9281.21 samples/sec Loss 5.8396 LearningRate 0.0294 Epoch: 9 Global Step: 152720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:49,295-Speed 9226.51 samples/sec Loss 5.7792 LearningRate 0.0294 Epoch: 9 Global Step: 152730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:50,379-Speed 9449.26 samples/sec Loss 5.6408 LearningRate 0.0294 Epoch: 9 Global Step: 152740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:51,502-Speed 9124.26 samples/sec Loss 5.7982 LearningRate 0.0294 Epoch: 9 Global Step: 152750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:52,572-Speed 9573.77 samples/sec Loss 5.8437 LearningRate 0.0294 Epoch: 9 Global Step: 152760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:53,696-Speed 9120.68 samples/sec Loss 5.8456 LearningRate 0.0294 Epoch: 9 Global Step: 152770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:54,818-Speed 9131.67 samples/sec Loss 5.8131 LearningRate 0.0294 Epoch: 9 Global Step: 152780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:55,924-Speed 9262.95 samples/sec Loss 5.7889 LearningRate 0.0294 Epoch: 9 Global Step: 152790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:57,026-Speed 9298.08 samples/sec Loss 5.6899 LearningRate 0.0294 Epoch: 9 Global Step: 152800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:58,133-Speed 9256.81 samples/sec Loss 5.7947 LearningRate 0.0294 Epoch: 9 Global Step: 152810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:41:59,222-Speed 9402.18 samples/sec Loss 5.7655 LearningRate 0.0294 Epoch: 9 Global Step: 152820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:00,293-Speed 9570.84 samples/sec Loss 5.6281 LearningRate 0.0294 Epoch: 9 Global Step: 152830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:01,427-Speed 9030.94 samples/sec Loss 5.6861 LearningRate 0.0294 Epoch: 9 Global Step: 152840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:02,531-Speed 9286.86 samples/sec Loss 5.7774 LearningRate 0.0294 Epoch: 9 Global Step: 152850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:03,654-Speed 9118.37 samples/sec Loss 5.7355 LearningRate 0.0294 Epoch: 9 Global Step: 152860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:04,743-Speed 9414.02 samples/sec Loss 5.6555 LearningRate 0.0294 Epoch: 9 Global Step: 152870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:05,908-Speed 8796.36 samples/sec Loss 5.6971 LearningRate 0.0294 Epoch: 9 Global Step: 152880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:06,958-Speed 9749.83 samples/sec Loss 5.7501 LearningRate 0.0294 Epoch: 9 Global Step: 152890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:08,065-Speed 9259.06 samples/sec Loss 5.6979 LearningRate 0.0294 Epoch: 9 Global Step: 152900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:09,195-Speed 9073.02 samples/sec Loss 5.6185 LearningRate 0.0294 Epoch: 9 Global Step: 152910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:10,255-Speed 9664.77 samples/sec Loss 5.7257 LearningRate 0.0294 Epoch: 9 Global Step: 152920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:11,369-Speed 9198.17 samples/sec Loss 5.7477 LearningRate 0.0294 Epoch: 9 Global Step: 152930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:12,419-Speed 9754.04 samples/sec Loss 5.8802 LearningRate 0.0294 Epoch: 9 Global Step: 152940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:13,477-Speed 9681.86 samples/sec Loss 5.6835 LearningRate 0.0294 Epoch: 9 Global Step: 152950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:14,576-Speed 9321.63 samples/sec Loss 5.7687 LearningRate 0.0294 Epoch: 9 Global Step: 152960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:42:15,630-Speed 9726.18 samples/sec Loss 5.7161 LearningRate 0.0294 Epoch: 9 Global Step: 152970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:42:16,759-Speed 9075.65 samples/sec Loss 5.6706 LearningRate 0.0293 Epoch: 9 Global Step: 152980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:42:17,820-Speed 9659.28 samples/sec Loss 5.7834 LearningRate 0.0293 Epoch: 9 Global Step: 152990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:18,946-Speed 9098.42 samples/sec Loss 5.7700 LearningRate 0.0293 Epoch: 9 Global Step: 153000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:20,039-Speed 9372.14 samples/sec Loss 5.8325 LearningRate 0.0293 Epoch: 9 Global Step: 153010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:21,144-Speed 9279.85 samples/sec Loss 5.7389 LearningRate 0.0293 Epoch: 9 Global Step: 153020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:22,225-Speed 9471.53 samples/sec Loss 5.7688 LearningRate 0.0293 Epoch: 9 Global Step: 153030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:23,308-Speed 9466.57 samples/sec Loss 5.8700 LearningRate 0.0293 Epoch: 9 Global Step: 153040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:24,391-Speed 9460.02 samples/sec Loss 5.7792 LearningRate 0.0293 Epoch: 9 Global Step: 153050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:25,501-Speed 9229.52 samples/sec Loss 5.6517 LearningRate 0.0293 Epoch: 9 Global Step: 153060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:26,637-Speed 9020.71 samples/sec Loss 5.6214 LearningRate 0.0293 Epoch: 9 Global Step: 153070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:27,713-Speed 9520.35 samples/sec Loss 5.8853 LearningRate 0.0293 Epoch: 9 Global Step: 153080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:28,759-Speed 9795.61 samples/sec Loss 5.7285 LearningRate 0.0293 Epoch: 9 Global Step: 153090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:29,846-Speed 9428.27 samples/sec Loss 5.7563 LearningRate 0.0293 Epoch: 9 Global Step: 153100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:30,958-Speed 9217.05 samples/sec Loss 5.7082 LearningRate 0.0293 Epoch: 9 Global Step: 153110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:32,105-Speed 8930.83 samples/sec Loss 5.7653 LearningRate 0.0293 Epoch: 9 Global Step: 153120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:33,205-Speed 9315.29 samples/sec Loss 5.8095 LearningRate 0.0293 Epoch: 9 Global Step: 153130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:34,294-Speed 9402.61 samples/sec Loss 5.8661 LearningRate 0.0293 Epoch: 9 Global Step: 153140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:35,404-Speed 9238.39 samples/sec Loss 5.7940 LearningRate 0.0293 Epoch: 9 Global Step: 153150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:36,506-Speed 9299.90 samples/sec Loss 5.7438 LearningRate 0.0293 Epoch: 9 Global Step: 153160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:37,632-Speed 9095.30 samples/sec Loss 5.8178 LearningRate 0.0293 Epoch: 9 Global Step: 153170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:38,688-Speed 9705.93 samples/sec Loss 5.7320 LearningRate 0.0293 Epoch: 9 Global Step: 153180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:39,795-Speed 9249.61 samples/sec Loss 5.7870 LearningRate 0.0293 Epoch: 9 Global Step: 153190 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:42:40,851-Speed 9708.73 samples/sec Loss 5.8124 LearningRate 0.0293 Epoch: 9 Global Step: 153200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:42:41,955-Speed 9279.58 samples/sec Loss 5.8177 LearningRate 0.0293 Epoch: 9 Global Step: 153210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:43,053-Speed 9329.80 samples/sec Loss 5.8581 LearningRate 0.0293 Epoch: 9 Global Step: 153220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:44,176-Speed 9125.15 samples/sec Loss 5.7827 LearningRate 0.0293 Epoch: 9 Global Step: 153230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:45,270-Speed 9371.14 samples/sec Loss 5.8539 LearningRate 0.0293 Epoch: 9 Global Step: 153240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:46,346-Speed 9533.80 samples/sec Loss 5.8950 LearningRate 0.0293 Epoch: 9 Global Step: 153250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:47,479-Speed 9038.60 samples/sec Loss 5.8235 LearningRate 0.0293 Epoch: 9 Global Step: 153260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:48,585-Speed 9264.17 samples/sec Loss 5.7067 LearningRate 0.0293 Epoch: 9 Global Step: 153270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:49,702-Speed 9175.68 samples/sec Loss 5.7109 LearningRate 0.0292 Epoch: 9 Global Step: 153280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:50,782-Speed 9488.91 samples/sec Loss 5.8280 LearningRate 0.0292 Epoch: 9 Global Step: 153290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:51,875-Speed 9374.86 samples/sec Loss 5.6855 LearningRate 0.0292 Epoch: 9 Global Step: 153300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:52,947-Speed 9560.03 samples/sec Loss 5.8239 LearningRate 0.0292 Epoch: 9 Global Step: 153310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:42:54,027-Speed 9491.49 samples/sec Loss 5.8094 LearningRate 0.0292 Epoch: 9 Global Step: 153320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:42:55,183-Speed 8856.99 samples/sec Loss 5.7346 LearningRate 0.0292 Epoch: 9 Global Step: 153330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:56,313-Speed 9067.79 samples/sec Loss 5.6801 LearningRate 0.0292 Epoch: 9 Global Step: 153340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:57,389-Speed 9525.83 samples/sec Loss 5.8148 LearningRate 0.0292 Epoch: 9 Global Step: 153350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:58,479-Speed 9404.56 samples/sec Loss 5.7287 LearningRate 0.0292 Epoch: 9 Global Step: 153360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:42:59,545-Speed 9610.35 samples/sec Loss 5.7760 LearningRate 0.0292 Epoch: 9 Global Step: 153370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:00,668-Speed 9124.14 samples/sec Loss 5.8331 LearningRate 0.0292 Epoch: 9 Global Step: 153380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:01,772-Speed 9281.27 samples/sec Loss 5.6935 LearningRate 0.0292 Epoch: 9 Global Step: 153390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:02,884-Speed 9208.90 samples/sec Loss 5.7883 LearningRate 0.0292 Epoch: 9 Global Step: 153400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:04,027-Speed 8968.64 samples/sec Loss 5.7200 LearningRate 0.0292 Epoch: 9 Global Step: 153410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:05,129-Speed 9295.54 samples/sec Loss 5.7521 LearningRate 0.0292 Epoch: 9 Global Step: 153420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:06,227-Speed 9328.65 samples/sec Loss 5.8764 LearningRate 0.0292 Epoch: 9 Global Step: 153430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:07,317-Speed 9400.68 samples/sec Loss 5.8015 LearningRate 0.0292 Epoch: 9 Global Step: 153440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:08,393-Speed 9521.53 samples/sec Loss 5.7722 LearningRate 0.0292 Epoch: 9 Global Step: 153450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:09,465-Speed 9557.21 samples/sec Loss 5.8070 LearningRate 0.0292 Epoch: 9 Global Step: 153460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:10,567-Speed 9307.55 samples/sec Loss 5.9128 LearningRate 0.0292 Epoch: 9 Global Step: 153470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:11,659-Speed 9374.85 samples/sec Loss 5.8072 LearningRate 0.0292 Epoch: 9 Global Step: 153480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:12,728-Speed 9589.01 samples/sec Loss 5.8706 LearningRate 0.0292 Epoch: 9 Global Step: 153490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:13,832-Speed 9280.90 samples/sec Loss 5.6899 LearningRate 0.0292 Epoch: 9 Global Step: 153500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:14,976-Speed 8958.20 samples/sec Loss 5.8704 LearningRate 0.0292 Epoch: 9 Global Step: 153510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:16,085-Speed 9243.67 samples/sec Loss 5.7165 LearningRate 0.0292 Epoch: 9 Global Step: 153520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:17,193-Speed 9242.77 samples/sec Loss 5.7495 LearningRate 0.0292 Epoch: 9 Global Step: 153530 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:43:18,334-Speed 8982.47 samples/sec Loss 5.8223 LearningRate 0.0292 Epoch: 9 Global Step: 153540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:19,424-Speed 9395.70 samples/sec Loss 5.8080 LearningRate 0.0292 Epoch: 9 Global Step: 153550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:20,542-Speed 9172.50 samples/sec Loss 5.7626 LearningRate 0.0292 Epoch: 9 Global Step: 153560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:21,616-Speed 9533.41 samples/sec Loss 5.7486 LearningRate 0.0292 Epoch: 9 Global Step: 153570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:22,730-Speed 9202.34 samples/sec Loss 5.8458 LearningRate 0.0292 Epoch: 9 Global Step: 153580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:23,828-Speed 9336.68 samples/sec Loss 5.8131 LearningRate 0.0291 Epoch: 9 Global Step: 153590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:24,943-Speed 9190.07 samples/sec Loss 5.7726 LearningRate 0.0291 Epoch: 9 Global Step: 153600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:26,028-Speed 9444.99 samples/sec Loss 5.8362 LearningRate 0.0291 Epoch: 9 Global Step: 153610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:27,113-Speed 9437.29 samples/sec Loss 5.6675 LearningRate 0.0291 Epoch: 9 Global Step: 153620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:28,237-Speed 9115.03 samples/sec Loss 5.9753 LearningRate 0.0291 Epoch: 9 Global Step: 153630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:29,309-Speed 9567.09 samples/sec Loss 5.8984 LearningRate 0.0291 Epoch: 9 Global Step: 153640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:30,484-Speed 8717.17 samples/sec Loss 5.8111 LearningRate 0.0291 Epoch: 9 Global Step: 153650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:31,584-Speed 9316.62 samples/sec Loss 5.8269 LearningRate 0.0291 Epoch: 9 Global Step: 153660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:32,668-Speed 9452.79 samples/sec Loss 5.7798 LearningRate 0.0291 Epoch: 9 Global Step: 153670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:33,747-Speed 9490.00 samples/sec Loss 5.7966 LearningRate 0.0291 Epoch: 9 Global Step: 153680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:34,860-Speed 9208.16 samples/sec Loss 5.7830 LearningRate 0.0291 Epoch: 9 Global Step: 153690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:35,980-Speed 9145.22 samples/sec Loss 5.8357 LearningRate 0.0291 Epoch: 9 Global Step: 153700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:37,084-Speed 9281.02 samples/sec Loss 5.8412 LearningRate 0.0291 Epoch: 9 Global Step: 153710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:38,160-Speed 9521.53 samples/sec Loss 5.8080 LearningRate 0.0291 Epoch: 9 Global Step: 153720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:39,239-Speed 9493.85 samples/sec Loss 5.8421 LearningRate 0.0291 Epoch: 9 Global Step: 153730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:40,308-Speed 9590.23 samples/sec Loss 5.8152 LearningRate 0.0291 Epoch: 9 Global Step: 153740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:43:41,362-Speed 9721.24 samples/sec Loss 5.7722 LearningRate 0.0291 Epoch: 9 Global Step: 153750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:42,449-Speed 9428.45 samples/sec Loss 5.6515 LearningRate 0.0291 Epoch: 9 Global Step: 153760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:43,535-Speed 9434.66 samples/sec Loss 5.7290 LearningRate 0.0291 Epoch: 9 Global Step: 153770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:44,617-Speed 9469.85 samples/sec Loss 5.7306 LearningRate 0.0291 Epoch: 9 Global Step: 153780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:45,686-Speed 9587.42 samples/sec Loss 5.8424 LearningRate 0.0291 Epoch: 9 Global Step: 153790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:46,780-Speed 9359.74 samples/sec Loss 5.8327 LearningRate 0.0291 Epoch: 9 Global Step: 153800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:47,885-Speed 9275.21 samples/sec Loss 5.8919 LearningRate 0.0291 Epoch: 9 Global Step: 153810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:48,956-Speed 9569.51 samples/sec Loss 5.7970 LearningRate 0.0291 Epoch: 9 Global Step: 153820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:50,061-Speed 9266.30 samples/sec Loss 5.7215 LearningRate 0.0291 Epoch: 9 Global Step: 153830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:51,188-Speed 9092.44 samples/sec Loss 5.8240 LearningRate 0.0291 Epoch: 9 Global Step: 153840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:52,239-Speed 9748.09 samples/sec Loss 5.8048 LearningRate 0.0291 Epoch: 9 Global Step: 153850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:43:53,323-Speed 9454.87 samples/sec Loss 5.7826 LearningRate 0.0291 Epoch: 9 Global Step: 153860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:54,490-Speed 8780.35 samples/sec Loss 5.7572 LearningRate 0.0291 Epoch: 9 Global Step: 153870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:55,596-Speed 9264.97 samples/sec Loss 5.8051 LearningRate 0.0291 Epoch: 9 Global Step: 153880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:56,751-Speed 8871.96 samples/sec Loss 5.8812 LearningRate 0.0291 Epoch: 9 Global Step: 153890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:57,855-Speed 9278.67 samples/sec Loss 5.8653 LearningRate 0.0290 Epoch: 9 Global Step: 153900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:43:58,959-Speed 9281.83 samples/sec Loss 5.8162 LearningRate 0.0290 Epoch: 9 Global Step: 153910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:44:00,081-Speed 9133.67 samples/sec Loss 5.8967 LearningRate 0.0290 Epoch: 9 Global Step: 153920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:44:01,193-Speed 9213.23 samples/sec Loss 5.7743 LearningRate 0.0290 Epoch: 9 Global Step: 153930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:44:02,295-Speed 9295.09 samples/sec Loss 5.8543 LearningRate 0.0290 Epoch: 9 Global Step: 153940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:44:03,363-Speed 9596.50 samples/sec Loss 5.8052 LearningRate 0.0290 Epoch: 9 Global Step: 153950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:44:04,427-Speed 9627.28 samples/sec Loss 5.8302 LearningRate 0.0290 Epoch: 9 Global Step: 153960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:44:05,508-Speed 9480.61 samples/sec Loss 5.7312 LearningRate 0.0290 Epoch: 9 Global Step: 153970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:44:06,658-Speed 8910.88 samples/sec Loss 5.7303 LearningRate 0.0290 Epoch: 9 Global Step: 153980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:44:07,754-Speed 9343.64 samples/sec Loss 5.8524 LearningRate 0.0290 Epoch: 9 Global Step: 153990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:44:08,889-Speed 9031.07 samples/sec Loss 5.7550 LearningRate 0.0290 Epoch: 9 Global Step: 154000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:44:31,219-[lfw][154000]XNorm: 9.903580 Training: 2022-04-11 17:44:31,220-[lfw][154000]Accuracy-Flip: 0.99600+-0.00327 Training: 2022-04-11 17:44:31,220-[lfw][154000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:44:56,692-[cfp_fp][154000]XNorm: 8.532174 Training: 2022-04-11 17:44:56,693-[cfp_fp][154000]Accuracy-Flip: 0.95914+-0.01094 Training: 2022-04-11 17:44:56,693-[cfp_fp][154000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:45:18,679-[agedb_30][154000]XNorm: 9.655456 Training: 2022-04-11 17:45:18,680-[agedb_30][154000]Accuracy-Flip: 0.96283+-0.00860 Training: 2022-04-11 17:45:18,680-[agedb_30][154000]Accuracy-Highest: 0.96783 Training: 2022-04-11 17:45:19,770-Speed 144.47 samples/sec Loss 5.8909 LearningRate 0.0290 Epoch: 9 Global Step: 154010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:20,845-Speed 9533.03 samples/sec Loss 5.8693 LearningRate 0.0290 Epoch: 9 Global Step: 154020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:21,907-Speed 9650.61 samples/sec Loss 5.8160 LearningRate 0.0290 Epoch: 9 Global Step: 154030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:22,969-Speed 9654.35 samples/sec Loss 5.8544 LearningRate 0.0290 Epoch: 9 Global Step: 154040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:24,058-Speed 9407.92 samples/sec Loss 5.9875 LearningRate 0.0290 Epoch: 9 Global Step: 154050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:25,208-Speed 8910.61 samples/sec Loss 5.8262 LearningRate 0.0290 Epoch: 9 Global Step: 154060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:26,284-Speed 9522.66 samples/sec Loss 5.7821 LearningRate 0.0290 Epoch: 9 Global Step: 154070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:27,341-Speed 9697.58 samples/sec Loss 5.7567 LearningRate 0.0290 Epoch: 9 Global Step: 154080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:28,434-Speed 9369.40 samples/sec Loss 5.8289 LearningRate 0.0290 Epoch: 9 Global Step: 154090 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:45:29,485-Speed 9750.32 samples/sec Loss 5.8356 LearningRate 0.0290 Epoch: 9 Global Step: 154100 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:45:30,583-Speed 9330.63 samples/sec Loss 5.8900 LearningRate 0.0290 Epoch: 9 Global Step: 154110 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:45:31,664-Speed 9478.52 samples/sec Loss 5.8658 LearningRate 0.0290 Epoch: 9 Global Step: 154120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:32,772-Speed 9245.54 samples/sec Loss 5.8804 LearningRate 0.0290 Epoch: 9 Global Step: 154130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:33,876-Speed 9282.65 samples/sec Loss 5.7719 LearningRate 0.0290 Epoch: 9 Global Step: 154140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:34,938-Speed 9647.89 samples/sec Loss 5.7537 LearningRate 0.0290 Epoch: 9 Global Step: 154150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:36,031-Speed 9374.05 samples/sec Loss 5.6986 LearningRate 0.0290 Epoch: 9 Global Step: 154160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:37,111-Speed 9490.37 samples/sec Loss 5.7488 LearningRate 0.0290 Epoch: 9 Global Step: 154170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:38,193-Speed 9467.50 samples/sec Loss 5.9000 LearningRate 0.0290 Epoch: 9 Global Step: 154180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:39,365-Speed 8743.19 samples/sec Loss 5.8434 LearningRate 0.0290 Epoch: 9 Global Step: 154190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:40,446-Speed 9475.07 samples/sec Loss 5.7774 LearningRate 0.0290 Epoch: 9 Global Step: 154200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:41,508-Speed 9645.85 samples/sec Loss 5.7966 LearningRate 0.0289 Epoch: 9 Global Step: 154210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:42,605-Speed 9340.63 samples/sec Loss 5.7885 LearningRate 0.0289 Epoch: 9 Global Step: 154220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:45:43,700-Speed 9357.77 samples/sec Loss 5.7708 LearningRate 0.0289 Epoch: 9 Global Step: 154230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:44,764-Speed 9627.81 samples/sec Loss 5.8694 LearningRate 0.0289 Epoch: 9 Global Step: 154240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:45,904-Speed 8988.38 samples/sec Loss 5.7989 LearningRate 0.0289 Epoch: 9 Global Step: 154250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:47,000-Speed 9346.75 samples/sec Loss 5.7635 LearningRate 0.0289 Epoch: 9 Global Step: 154260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:48,094-Speed 9366.37 samples/sec Loss 5.8929 LearningRate 0.0289 Epoch: 9 Global Step: 154270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:49,198-Speed 9282.70 samples/sec Loss 5.7762 LearningRate 0.0289 Epoch: 9 Global Step: 154280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:50,311-Speed 9202.57 samples/sec Loss 5.7885 LearningRate 0.0289 Epoch: 9 Global Step: 154290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:51,427-Speed 9187.60 samples/sec Loss 5.9400 LearningRate 0.0289 Epoch: 9 Global Step: 154300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:52,543-Speed 9184.94 samples/sec Loss 5.9152 LearningRate 0.0289 Epoch: 9 Global Step: 154310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:53,733-Speed 8610.74 samples/sec Loss 5.8514 LearningRate 0.0289 Epoch: 9 Global Step: 154320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:54,798-Speed 9617.99 samples/sec Loss 5.8584 LearningRate 0.0289 Epoch: 9 Global Step: 154330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:45:55,887-Speed 9411.41 samples/sec Loss 5.7739 LearningRate 0.0289 Epoch: 9 Global Step: 154340 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:45:56,940-Speed 9736.92 samples/sec Loss 5.8396 LearningRate 0.0289 Epoch: 9 Global Step: 154350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:58,044-Speed 9274.87 samples/sec Loss 5.8409 LearningRate 0.0289 Epoch: 9 Global Step: 154360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:45:59,129-Speed 9441.59 samples/sec Loss 5.8517 LearningRate 0.0289 Epoch: 9 Global Step: 154370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:00,233-Speed 9283.54 samples/sec Loss 5.8074 LearningRate 0.0289 Epoch: 9 Global Step: 154380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:01,302-Speed 9586.78 samples/sec Loss 5.8234 LearningRate 0.0289 Epoch: 9 Global Step: 154390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:02,395-Speed 9377.90 samples/sec Loss 5.7122 LearningRate 0.0289 Epoch: 9 Global Step: 154400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:03,503-Speed 9244.48 samples/sec Loss 5.8810 LearningRate 0.0289 Epoch: 9 Global Step: 154410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:04,593-Speed 9402.65 samples/sec Loss 5.7808 LearningRate 0.0289 Epoch: 9 Global Step: 154420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:05,713-Speed 9143.06 samples/sec Loss 5.9039 LearningRate 0.0289 Epoch: 9 Global Step: 154430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:06,788-Speed 9535.11 samples/sec Loss 5.8284 LearningRate 0.0289 Epoch: 9 Global Step: 154440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:07,890-Speed 9297.09 samples/sec Loss 5.7450 LearningRate 0.0289 Epoch: 9 Global Step: 154450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:46:09,004-Speed 9200.80 samples/sec Loss 5.8703 LearningRate 0.0289 Epoch: 9 Global Step: 154460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:10,106-Speed 9297.76 samples/sec Loss 5.8192 LearningRate 0.0289 Epoch: 9 Global Step: 154470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:11,212-Speed 9266.65 samples/sec Loss 5.7919 LearningRate 0.0289 Epoch: 9 Global Step: 154480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:12,335-Speed 9122.11 samples/sec Loss 5.7803 LearningRate 0.0289 Epoch: 9 Global Step: 154490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:13,476-Speed 8982.27 samples/sec Loss 5.8540 LearningRate 0.0289 Epoch: 9 Global Step: 154500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:14,522-Speed 9791.57 samples/sec Loss 5.8877 LearningRate 0.0289 Epoch: 9 Global Step: 154510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:15,586-Speed 9632.65 samples/sec Loss 5.8408 LearningRate 0.0288 Epoch: 9 Global Step: 154520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:16,705-Speed 9152.76 samples/sec Loss 5.7728 LearningRate 0.0288 Epoch: 9 Global Step: 154530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:17,781-Speed 9521.39 samples/sec Loss 5.9508 LearningRate 0.0288 Epoch: 9 Global Step: 154540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:18,869-Speed 9417.71 samples/sec Loss 5.9015 LearningRate 0.0288 Epoch: 9 Global Step: 154550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:19,948-Speed 9491.87 samples/sec Loss 5.7942 LearningRate 0.0288 Epoch: 9 Global Step: 154560 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:46:21,007-Speed 9687.30 samples/sec Loss 5.8584 LearningRate 0.0288 Epoch: 9 Global Step: 154570 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:46:22,062-Speed 9708.19 samples/sec Loss 5.9983 LearningRate 0.0288 Epoch: 9 Global Step: 154580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:23,279-Speed 8420.15 samples/sec Loss 5.8452 LearningRate 0.0288 Epoch: 9 Global Step: 154590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:24,421-Speed 8971.10 samples/sec Loss 5.8536 LearningRate 0.0288 Epoch: 9 Global Step: 154600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:25,500-Speed 9491.70 samples/sec Loss 5.7740 LearningRate 0.0288 Epoch: 9 Global Step: 154610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:26,614-Speed 9197.77 samples/sec Loss 5.8237 LearningRate 0.0288 Epoch: 9 Global Step: 154620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:27,773-Speed 8843.48 samples/sec Loss 5.9271 LearningRate 0.0288 Epoch: 9 Global Step: 154630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:28,867-Speed 9359.58 samples/sec Loss 5.8733 LearningRate 0.0288 Epoch: 9 Global Step: 154640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:29,981-Speed 9201.59 samples/sec Loss 5.8453 LearningRate 0.0288 Epoch: 9 Global Step: 154650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:31,059-Speed 9507.68 samples/sec Loss 5.8554 LearningRate 0.0288 Epoch: 9 Global Step: 154660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:32,191-Speed 9051.42 samples/sec Loss 5.9398 LearningRate 0.0288 Epoch: 9 Global Step: 154670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:33,310-Speed 9157.95 samples/sec Loss 5.8239 LearningRate 0.0288 Epoch: 9 Global Step: 154680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:34,390-Speed 9484.81 samples/sec Loss 5.8720 LearningRate 0.0288 Epoch: 9 Global Step: 154690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:35,468-Speed 9505.17 samples/sec Loss 5.8301 LearningRate 0.0288 Epoch: 9 Global Step: 154700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:36,536-Speed 9591.72 samples/sec Loss 5.8930 LearningRate 0.0288 Epoch: 9 Global Step: 154710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:37,669-Speed 9040.05 samples/sec Loss 5.8752 LearningRate 0.0288 Epoch: 9 Global Step: 154720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:38,770-Speed 9315.67 samples/sec Loss 5.7252 LearningRate 0.0288 Epoch: 9 Global Step: 154730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:39,901-Speed 9055.26 samples/sec Loss 5.8349 LearningRate 0.0288 Epoch: 9 Global Step: 154740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:40,969-Speed 9596.97 samples/sec Loss 5.8749 LearningRate 0.0288 Epoch: 9 Global Step: 154750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:42,051-Speed 9467.06 samples/sec Loss 5.8610 LearningRate 0.0288 Epoch: 9 Global Step: 154760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:43,112-Speed 9657.65 samples/sec Loss 5.7617 LearningRate 0.0288 Epoch: 9 Global Step: 154770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:44,200-Speed 9411.71 samples/sec Loss 5.8614 LearningRate 0.0288 Epoch: 9 Global Step: 154780 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:46:45,340-Speed 8992.47 samples/sec Loss 5.9583 LearningRate 0.0288 Epoch: 9 Global Step: 154790 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:46:46,421-Speed 9483.66 samples/sec Loss 5.8446 LearningRate 0.0288 Epoch: 9 Global Step: 154800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:47,544-Speed 9117.26 samples/sec Loss 5.8948 LearningRate 0.0288 Epoch: 9 Global Step: 154810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:48,685-Speed 8983.47 samples/sec Loss 5.8499 LearningRate 0.0288 Epoch: 9 Global Step: 154820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:49,756-Speed 9564.15 samples/sec Loss 5.8702 LearningRate 0.0287 Epoch: 9 Global Step: 154830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:50,861-Speed 9275.98 samples/sec Loss 5.8659 LearningRate 0.0287 Epoch: 9 Global Step: 154840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:51,932-Speed 9565.51 samples/sec Loss 5.9159 LearningRate 0.0287 Epoch: 9 Global Step: 154850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:53,062-Speed 9068.89 samples/sec Loss 5.8120 LearningRate 0.0287 Epoch: 9 Global Step: 154860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:54,133-Speed 9563.84 samples/sec Loss 5.9382 LearningRate 0.0287 Epoch: 9 Global Step: 154870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:55,230-Speed 9338.63 samples/sec Loss 5.8851 LearningRate 0.0287 Epoch: 9 Global Step: 154880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:56,380-Speed 8915.75 samples/sec Loss 5.8270 LearningRate 0.0287 Epoch: 9 Global Step: 154890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:57,478-Speed 9328.29 samples/sec Loss 5.9388 LearningRate 0.0287 Epoch: 9 Global Step: 154900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:46:58,587-Speed 9239.25 samples/sec Loss 5.8340 LearningRate 0.0287 Epoch: 9 Global Step: 154910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:46:59,653-Speed 9609.25 samples/sec Loss 5.8582 LearningRate 0.0287 Epoch: 9 Global Step: 154920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:00,771-Speed 9163.10 samples/sec Loss 5.8008 LearningRate 0.0287 Epoch: 9 Global Step: 154930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:01,842-Speed 9571.01 samples/sec Loss 5.8457 LearningRate 0.0287 Epoch: 9 Global Step: 154940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:02,933-Speed 9387.61 samples/sec Loss 5.9683 LearningRate 0.0287 Epoch: 9 Global Step: 154950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:04,039-Speed 9270.27 samples/sec Loss 5.8472 LearningRate 0.0287 Epoch: 9 Global Step: 154960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:05,122-Speed 9462.17 samples/sec Loss 5.7208 LearningRate 0.0287 Epoch: 9 Global Step: 154970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:06,201-Speed 9493.04 samples/sec Loss 5.8468 LearningRate 0.0287 Epoch: 9 Global Step: 154980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:07,260-Speed 9677.49 samples/sec Loss 5.8621 LearningRate 0.0287 Epoch: 9 Global Step: 154990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:08,335-Speed 9525.58 samples/sec Loss 5.9174 LearningRate 0.0287 Epoch: 9 Global Step: 155000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:09,460-Speed 9111.47 samples/sec Loss 5.8916 LearningRate 0.0287 Epoch: 9 Global Step: 155010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:10,555-Speed 9360.77 samples/sec Loss 5.9144 LearningRate 0.0287 Epoch: 9 Global Step: 155020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:11,627-Speed 9553.85 samples/sec Loss 5.8649 LearningRate 0.0287 Epoch: 9 Global Step: 155030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:12,715-Speed 9422.00 samples/sec Loss 5.8111 LearningRate 0.0287 Epoch: 9 Global Step: 155040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:13,808-Speed 9369.50 samples/sec Loss 5.9330 LearningRate 0.0287 Epoch: 9 Global Step: 155050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:14,901-Speed 9379.26 samples/sec Loss 5.8672 LearningRate 0.0287 Epoch: 9 Global Step: 155060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:16,011-Speed 9229.99 samples/sec Loss 5.7692 LearningRate 0.0287 Epoch: 9 Global Step: 155070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:17,184-Speed 8733.35 samples/sec Loss 5.8211 LearningRate 0.0287 Epoch: 9 Global Step: 155080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:18,309-Speed 9112.12 samples/sec Loss 5.8605 LearningRate 0.0287 Epoch: 9 Global Step: 155090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:19,436-Speed 9089.70 samples/sec Loss 5.9269 LearningRate 0.0287 Epoch: 9 Global Step: 155100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:20,525-Speed 9403.11 samples/sec Loss 5.8299 LearningRate 0.0287 Epoch: 9 Global Step: 155110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:21,642-Speed 9174.03 samples/sec Loss 5.8053 LearningRate 0.0287 Epoch: 9 Global Step: 155120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:22,785-Speed 8966.36 samples/sec Loss 5.9617 LearningRate 0.0287 Epoch: 9 Global Step: 155130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:47:23,868-Speed 9463.96 samples/sec Loss 6.0034 LearningRate 0.0287 Epoch: 9 Global Step: 155140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:24,917-Speed 9763.19 samples/sec Loss 5.8621 LearningRate 0.0286 Epoch: 9 Global Step: 155150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:26,026-Speed 9245.51 samples/sec Loss 5.8653 LearningRate 0.0286 Epoch: 9 Global Step: 155160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:27,171-Speed 8945.36 samples/sec Loss 5.7687 LearningRate 0.0286 Epoch: 9 Global Step: 155170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:28,303-Speed 9050.74 samples/sec Loss 5.9192 LearningRate 0.0286 Epoch: 9 Global Step: 155180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:29,374-Speed 9565.09 samples/sec Loss 5.7760 LearningRate 0.0286 Epoch: 9 Global Step: 155190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:30,477-Speed 9291.18 samples/sec Loss 5.8544 LearningRate 0.0286 Epoch: 9 Global Step: 155200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:31,564-Speed 9421.38 samples/sec Loss 5.9489 LearningRate 0.0286 Epoch: 9 Global Step: 155210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:32,657-Speed 9382.47 samples/sec Loss 5.8638 LearningRate 0.0286 Epoch: 9 Global Step: 155220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:33,829-Speed 8737.05 samples/sec Loss 5.8497 LearningRate 0.0286 Epoch: 9 Global Step: 155230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:34,942-Speed 9210.41 samples/sec Loss 5.8753 LearningRate 0.0286 Epoch: 9 Global Step: 155240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:47:36,106-Speed 8796.82 samples/sec Loss 5.7669 LearningRate 0.0286 Epoch: 9 Global Step: 155250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:37,244-Speed 9004.43 samples/sec Loss 5.8812 LearningRate 0.0286 Epoch: 9 Global Step: 155260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:38,370-Speed 9103.43 samples/sec Loss 5.9139 LearningRate 0.0286 Epoch: 9 Global Step: 155270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:39,492-Speed 9131.44 samples/sec Loss 5.8717 LearningRate 0.0286 Epoch: 9 Global Step: 155280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:40,579-Speed 9429.74 samples/sec Loss 5.8872 LearningRate 0.0286 Epoch: 9 Global Step: 155290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:41,657-Speed 9507.43 samples/sec Loss 5.8622 LearningRate 0.0286 Epoch: 9 Global Step: 155300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:42,768-Speed 9221.91 samples/sec Loss 5.9048 LearningRate 0.0286 Epoch: 9 Global Step: 155310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:43,862-Speed 9369.20 samples/sec Loss 5.8271 LearningRate 0.0286 Epoch: 9 Global Step: 155320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:44,996-Speed 9031.46 samples/sec Loss 5.8436 LearningRate 0.0286 Epoch: 9 Global Step: 155330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:46,128-Speed 9052.83 samples/sec Loss 5.9260 LearningRate 0.0286 Epoch: 9 Global Step: 155340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:47,213-Speed 9447.66 samples/sec Loss 5.8838 LearningRate 0.0286 Epoch: 9 Global Step: 155350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:48,354-Speed 8978.03 samples/sec Loss 5.9160 LearningRate 0.0286 Epoch: 9 Global Step: 155360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:49,543-Speed 8615.64 samples/sec Loss 5.9714 LearningRate 0.0286 Epoch: 9 Global Step: 155370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:50,631-Speed 9415.29 samples/sec Loss 5.8367 LearningRate 0.0286 Epoch: 9 Global Step: 155380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:51,847-Speed 8425.70 samples/sec Loss 5.9285 LearningRate 0.0286 Epoch: 9 Global Step: 155390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:52,940-Speed 9375.54 samples/sec Loss 5.6923 LearningRate 0.0286 Epoch: 9 Global Step: 155400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:54,078-Speed 9003.23 samples/sec Loss 6.0165 LearningRate 0.0286 Epoch: 9 Global Step: 155410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:55,152-Speed 9538.34 samples/sec Loss 5.9419 LearningRate 0.0286 Epoch: 9 Global Step: 155420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:56,268-Speed 9182.56 samples/sec Loss 5.9526 LearningRate 0.0286 Epoch: 9 Global Step: 155430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:57,369-Speed 9304.45 samples/sec Loss 5.8318 LearningRate 0.0286 Epoch: 9 Global Step: 155440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:47:58,502-Speed 9044.11 samples/sec Loss 5.8544 LearningRate 0.0286 Epoch: 9 Global Step: 155450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:47:59,597-Speed 9356.40 samples/sec Loss 5.8897 LearningRate 0.0285 Epoch: 9 Global Step: 155460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:48:00,701-Speed 9281.75 samples/sec Loss 5.8490 LearningRate 0.0285 Epoch: 9 Global Step: 155470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:48:01,852-Speed 8904.04 samples/sec Loss 5.9104 LearningRate 0.0285 Epoch: 9 Global Step: 155480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:48:02,935-Speed 9461.98 samples/sec Loss 5.9612 LearningRate 0.0285 Epoch: 9 Global Step: 155490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:04,052-Speed 9175.89 samples/sec Loss 5.9001 LearningRate 0.0285 Epoch: 9 Global Step: 155500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:05,146-Speed 9365.49 samples/sec Loss 5.8905 LearningRate 0.0285 Epoch: 9 Global Step: 155510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:06,261-Speed 9188.17 samples/sec Loss 5.8645 LearningRate 0.0285 Epoch: 9 Global Step: 155520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:07,359-Speed 9333.12 samples/sec Loss 5.8072 LearningRate 0.0285 Epoch: 9 Global Step: 155530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:08,474-Speed 9190.18 samples/sec Loss 5.9049 LearningRate 0.0285 Epoch: 9 Global Step: 155540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:09,544-Speed 9573.66 samples/sec Loss 5.7435 LearningRate 0.0285 Epoch: 9 Global Step: 155550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:10,605-Speed 9660.12 samples/sec Loss 5.7573 LearningRate 0.0285 Epoch: 9 Global Step: 155560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:11,701-Speed 9344.99 samples/sec Loss 5.8354 LearningRate 0.0285 Epoch: 9 Global Step: 155570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:12,845-Speed 8955.99 samples/sec Loss 5.9483 LearningRate 0.0285 Epoch: 9 Global Step: 155580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:13,923-Speed 9506.70 samples/sec Loss 5.8531 LearningRate 0.0285 Epoch: 9 Global Step: 155590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:14,984-Speed 9656.28 samples/sec Loss 5.9112 LearningRate 0.0285 Epoch: 9 Global Step: 155600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:16,067-Speed 9458.98 samples/sec Loss 5.8524 LearningRate 0.0285 Epoch: 9 Global Step: 155610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:17,163-Speed 9350.25 samples/sec Loss 5.9198 LearningRate 0.0285 Epoch: 9 Global Step: 155620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:18,271-Speed 9249.90 samples/sec Loss 5.8092 LearningRate 0.0285 Epoch: 9 Global Step: 155630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:19,378-Speed 9248.72 samples/sec Loss 5.9226 LearningRate 0.0285 Epoch: 9 Global Step: 155640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:20,448-Speed 9582.02 samples/sec Loss 5.9072 LearningRate 0.0285 Epoch: 9 Global Step: 155650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:21,507-Speed 9673.43 samples/sec Loss 6.0097 LearningRate 0.0285 Epoch: 9 Global Step: 155660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:22,602-Speed 9369.35 samples/sec Loss 5.8969 LearningRate 0.0285 Epoch: 9 Global Step: 155670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:23,700-Speed 9324.60 samples/sec Loss 5.8610 LearningRate 0.0285 Epoch: 9 Global Step: 155680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:24,800-Speed 9314.43 samples/sec Loss 5.9073 LearningRate 0.0285 Epoch: 9 Global Step: 155690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:48:25,883-Speed 9462.48 samples/sec Loss 5.8262 LearningRate 0.0285 Epoch: 9 Global Step: 155700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:48:26,959-Speed 9526.20 samples/sec Loss 5.7722 LearningRate 0.0285 Epoch: 9 Global Step: 155710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:28,086-Speed 9089.72 samples/sec Loss 5.8442 LearningRate 0.0285 Epoch: 9 Global Step: 155720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:29,166-Speed 9489.04 samples/sec Loss 5.8532 LearningRate 0.0285 Epoch: 9 Global Step: 155730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:30,298-Speed 9045.05 samples/sec Loss 5.9258 LearningRate 0.0285 Epoch: 9 Global Step: 155740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:31,376-Speed 9506.89 samples/sec Loss 5.8747 LearningRate 0.0285 Epoch: 9 Global Step: 155750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:32,504-Speed 9087.82 samples/sec Loss 5.9433 LearningRate 0.0285 Epoch: 9 Global Step: 155760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:33,586-Speed 9468.07 samples/sec Loss 5.8261 LearningRate 0.0284 Epoch: 9 Global Step: 155770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:34,699-Speed 9201.43 samples/sec Loss 5.9039 LearningRate 0.0284 Epoch: 9 Global Step: 155780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:35,763-Speed 9632.91 samples/sec Loss 5.8010 LearningRate 0.0284 Epoch: 9 Global Step: 155790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:36,825-Speed 9649.80 samples/sec Loss 5.9953 LearningRate 0.0284 Epoch: 9 Global Step: 155800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:37,867-Speed 9834.20 samples/sec Loss 5.8612 LearningRate 0.0284 Epoch: 9 Global Step: 155810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:38,962-Speed 9364.22 samples/sec Loss 5.9017 LearningRate 0.0284 Epoch: 9 Global Step: 155820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:40,070-Speed 9242.07 samples/sec Loss 6.0365 LearningRate 0.0284 Epoch: 9 Global Step: 155830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:41,144-Speed 9544.36 samples/sec Loss 5.8519 LearningRate 0.0284 Epoch: 9 Global Step: 155840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:42,240-Speed 9350.20 samples/sec Loss 5.8726 LearningRate 0.0284 Epoch: 9 Global Step: 155850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:43,342-Speed 9295.80 samples/sec Loss 5.9037 LearningRate 0.0284 Epoch: 9 Global Step: 155860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:44,393-Speed 9745.51 samples/sec Loss 5.7417 LearningRate 0.0284 Epoch: 9 Global Step: 155870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:45,493-Speed 9313.76 samples/sec Loss 5.8127 LearningRate 0.0284 Epoch: 9 Global Step: 155880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:46,594-Speed 9303.33 samples/sec Loss 5.8331 LearningRate 0.0284 Epoch: 9 Global Step: 155890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:47,705-Speed 9227.75 samples/sec Loss 6.0111 LearningRate 0.0284 Epoch: 9 Global Step: 155900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:48,808-Speed 9287.42 samples/sec Loss 5.9193 LearningRate 0.0284 Epoch: 9 Global Step: 155910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:48:49,909-Speed 9304.75 samples/sec Loss 5.9428 LearningRate 0.0284 Epoch: 9 Global Step: 155920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:50,993-Speed 9452.39 samples/sec Loss 5.8710 LearningRate 0.0284 Epoch: 9 Global Step: 155930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:52,157-Speed 8805.26 samples/sec Loss 5.8271 LearningRate 0.0284 Epoch: 9 Global Step: 155940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:53,288-Speed 9057.38 samples/sec Loss 5.8870 LearningRate 0.0284 Epoch: 9 Global Step: 155950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:54,372-Speed 9453.51 samples/sec Loss 5.8946 LearningRate 0.0284 Epoch: 9 Global Step: 155960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:55,527-Speed 8865.34 samples/sec Loss 5.9954 LearningRate 0.0284 Epoch: 9 Global Step: 155970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:56,677-Speed 8915.64 samples/sec Loss 5.7752 LearningRate 0.0284 Epoch: 9 Global Step: 155980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:57,850-Speed 8730.03 samples/sec Loss 5.8288 LearningRate 0.0284 Epoch: 9 Global Step: 155990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:48:58,940-Speed 9405.13 samples/sec Loss 5.8820 LearningRate 0.0284 Epoch: 9 Global Step: 156000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:49:20,817-[lfw][156000]XNorm: 9.713624 Training: 2022-04-11 17:49:22,968-[lfw][156000]Accuracy-Flip: 0.99667+-0.00279 Training: 2022-04-11 17:49:22,969-[lfw][156000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:49:48,296-[cfp_fp][156000]XNorm: 8.345846 Training: 2022-04-11 17:49:48,296-[cfp_fp][156000]Accuracy-Flip: 0.95943+-0.01094 Training: 2022-04-11 17:49:48,297-[cfp_fp][156000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:50:10,002-[agedb_30][156000]XNorm: 9.458214 Training: 2022-04-11 17:50:10,003-[agedb_30][156000]Accuracy-Flip: 0.96667+-0.00969 Training: 2022-04-11 17:50:10,003-[agedb_30][156000]Accuracy-Highest: 0.96783 Training: 2022-04-11 17:50:11,085-Speed 141.94 samples/sec Loss 5.9465 LearningRate 0.0284 Epoch: 9 Global Step: 156010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:12,220-Speed 9021.00 samples/sec Loss 5.8702 LearningRate 0.0284 Epoch: 9 Global Step: 156020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:13,357-Speed 9016.21 samples/sec Loss 5.8714 LearningRate 0.0284 Epoch: 9 Global Step: 156030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:14,439-Speed 9466.36 samples/sec Loss 5.8165 LearningRate 0.0284 Epoch: 9 Global Step: 156040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:15,532-Speed 9375.60 samples/sec Loss 5.9611 LearningRate 0.0284 Epoch: 9 Global Step: 156050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:16,644-Speed 9212.75 samples/sec Loss 5.8173 LearningRate 0.0284 Epoch: 9 Global Step: 156060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:17,721-Speed 9513.55 samples/sec Loss 5.9116 LearningRate 0.0284 Epoch: 9 Global Step: 156070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:18,822-Speed 9307.27 samples/sec Loss 5.9637 LearningRate 0.0283 Epoch: 9 Global Step: 156080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:19,928-Speed 9258.39 samples/sec Loss 5.8948 LearningRate 0.0283 Epoch: 9 Global Step: 156090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:20,983-Speed 9715.11 samples/sec Loss 5.9115 LearningRate 0.0283 Epoch: 9 Global Step: 156100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:22,053-Speed 9576.58 samples/sec Loss 5.9016 LearningRate 0.0283 Epoch: 9 Global Step: 156110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:23,143-Speed 9396.15 samples/sec Loss 5.8822 LearningRate 0.0283 Epoch: 9 Global Step: 156120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:24,253-Speed 9233.33 samples/sec Loss 5.9721 LearningRate 0.0283 Epoch: 9 Global Step: 156130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:25,326-Speed 9551.94 samples/sec Loss 5.8867 LearningRate 0.0283 Epoch: 9 Global Step: 156140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:26,448-Speed 9131.78 samples/sec Loss 5.9092 LearningRate 0.0283 Epoch: 9 Global Step: 156150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:28,349-Speed 5388.15 samples/sec Loss 5.8540 LearningRate 0.0283 Epoch: 9 Global Step: 156160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:29,451-Speed 9296.57 samples/sec Loss 5.9026 LearningRate 0.0283 Epoch: 9 Global Step: 156170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:30,534-Speed 9460.94 samples/sec Loss 5.9065 LearningRate 0.0283 Epoch: 9 Global Step: 156180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:31,618-Speed 9457.22 samples/sec Loss 5.8891 LearningRate 0.0283 Epoch: 9 Global Step: 156190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:32,726-Speed 9246.73 samples/sec Loss 5.8493 LearningRate 0.0283 Epoch: 9 Global Step: 156200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:50:33,839-Speed 9207.23 samples/sec Loss 5.9235 LearningRate 0.0283 Epoch: 9 Global Step: 156210 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:50:34,913-Speed 9541.35 samples/sec Loss 6.0010 LearningRate 0.0283 Epoch: 9 Global Step: 156220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:50:35,973-Speed 9665.09 samples/sec Loss 5.9126 LearningRate 0.0283 Epoch: 9 Global Step: 156230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:37,074-Speed 9307.82 samples/sec Loss 5.8124 LearningRate 0.0283 Epoch: 9 Global Step: 156240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:38,174-Speed 9307.74 samples/sec Loss 5.9093 LearningRate 0.0283 Epoch: 9 Global Step: 156250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:39,232-Speed 9687.60 samples/sec Loss 5.8943 LearningRate 0.0283 Epoch: 9 Global Step: 156260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:40,325-Speed 9374.84 samples/sec Loss 5.8247 LearningRate 0.0283 Epoch: 9 Global Step: 156270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:41,429-Speed 9280.22 samples/sec Loss 5.9039 LearningRate 0.0283 Epoch: 9 Global Step: 156280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:42,566-Speed 9012.85 samples/sec Loss 5.8606 LearningRate 0.0283 Epoch: 9 Global Step: 156290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:43,682-Speed 9183.00 samples/sec Loss 5.8785 LearningRate 0.0283 Epoch: 9 Global Step: 156300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:44,771-Speed 9404.78 samples/sec Loss 5.9240 LearningRate 0.0283 Epoch: 9 Global Step: 156310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:45,850-Speed 9494.32 samples/sec Loss 5.8905 LearningRate 0.0283 Epoch: 9 Global Step: 156320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:50:46,955-Speed 9271.78 samples/sec Loss 5.9716 LearningRate 0.0283 Epoch: 9 Global Step: 156330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:48,082-Speed 9088.46 samples/sec Loss 5.8732 LearningRate 0.0283 Epoch: 9 Global Step: 156340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:49,202-Speed 9154.87 samples/sec Loss 5.9663 LearningRate 0.0283 Epoch: 9 Global Step: 156350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:50,297-Speed 9356.57 samples/sec Loss 5.9125 LearningRate 0.0283 Epoch: 9 Global Step: 156360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:51,436-Speed 8989.05 samples/sec Loss 5.8029 LearningRate 0.0283 Epoch: 9 Global Step: 156370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:52,513-Speed 9523.61 samples/sec Loss 5.9286 LearningRate 0.0283 Epoch: 9 Global Step: 156380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:53,685-Speed 8734.98 samples/sec Loss 5.9391 LearningRate 0.0283 Epoch: 9 Global Step: 156390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:54,801-Speed 9181.52 samples/sec Loss 5.9697 LearningRate 0.0282 Epoch: 9 Global Step: 156400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:55,875-Speed 9539.54 samples/sec Loss 5.7671 LearningRate 0.0282 Epoch: 9 Global Step: 156410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:56,955-Speed 9488.63 samples/sec Loss 5.9357 LearningRate 0.0282 Epoch: 9 Global Step: 156420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:50:58,048-Speed 9375.56 samples/sec Loss 5.8279 LearningRate 0.0282 Epoch: 9 Global Step: 156430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:50:59,133-Speed 9450.09 samples/sec Loss 5.7900 LearningRate 0.0282 Epoch: 9 Global Step: 156440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:00,272-Speed 8991.61 samples/sec Loss 5.9051 LearningRate 0.0282 Epoch: 9 Global Step: 156450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:01,353-Speed 9480.97 samples/sec Loss 5.8438 LearningRate 0.0282 Epoch: 9 Global Step: 156460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:02,503-Speed 8908.76 samples/sec Loss 5.8411 LearningRate 0.0282 Epoch: 9 Global Step: 156470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:03,603-Speed 9313.90 samples/sec Loss 5.7655 LearningRate 0.0282 Epoch: 9 Global Step: 156480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:04,682-Speed 9499.41 samples/sec Loss 5.8603 LearningRate 0.0282 Epoch: 9 Global Step: 156490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:05,785-Speed 9292.23 samples/sec Loss 5.9192 LearningRate 0.0282 Epoch: 9 Global Step: 156500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:06,876-Speed 9386.53 samples/sec Loss 5.8654 LearningRate 0.0282 Epoch: 9 Global Step: 156510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:07,987-Speed 9219.67 samples/sec Loss 5.9038 LearningRate 0.0282 Epoch: 9 Global Step: 156520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:09,045-Speed 9687.30 samples/sec Loss 5.9080 LearningRate 0.0282 Epoch: 9 Global Step: 156530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:10,176-Speed 9061.20 samples/sec Loss 5.9538 LearningRate 0.0282 Epoch: 9 Global Step: 156540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:51:11,265-Speed 9412.22 samples/sec Loss 5.9350 LearningRate 0.0282 Epoch: 9 Global Step: 156550 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:51:12,391-Speed 9096.97 samples/sec Loss 5.8453 LearningRate 0.0282 Epoch: 9 Global Step: 156560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:13,490-Speed 9319.03 samples/sec Loss 5.9571 LearningRate 0.0282 Epoch: 9 Global Step: 156570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:14,546-Speed 9708.89 samples/sec Loss 5.8314 LearningRate 0.0282 Epoch: 9 Global Step: 156580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:15,649-Speed 9292.32 samples/sec Loss 5.9765 LearningRate 0.0282 Epoch: 9 Global Step: 156590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:16,768-Speed 9154.04 samples/sec Loss 5.9113 LearningRate 0.0282 Epoch: 9 Global Step: 156600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:17,898-Speed 9067.00 samples/sec Loss 5.8634 LearningRate 0.0282 Epoch: 9 Global Step: 156610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:51:19,024-Speed 9101.90 samples/sec Loss 5.8954 LearningRate 0.0282 Epoch: 9 Global Step: 156620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:51:20,171-Speed 8931.28 samples/sec Loss 5.9169 LearningRate 0.0282 Epoch: 9 Global Step: 156630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:51:21,254-Speed 9460.78 samples/sec Loss 5.8991 LearningRate 0.0282 Epoch: 9 Global Step: 156640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:51:22,328-Speed 9537.89 samples/sec Loss 5.8092 LearningRate 0.0282 Epoch: 9 Global Step: 156650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:51:23,440-Speed 9220.21 samples/sec Loss 5.9533 LearningRate 0.0282 Epoch: 9 Global Step: 156660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:51:24,563-Speed 9125.46 samples/sec Loss 6.0474 LearningRate 0.0282 Epoch: 9 Global Step: 156670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:51:25,721-Speed 8844.86 samples/sec Loss 5.7505 LearningRate 0.0282 Epoch: 9 Global Step: 156680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:51:26,811-Speed 9400.08 samples/sec Loss 5.8844 LearningRate 0.0282 Epoch: 9 Global Step: 156690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:51:27,944-Speed 9038.50 samples/sec Loss 6.0465 LearningRate 0.0282 Epoch: 9 Global Step: 156700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:51:29,074-Speed 9067.24 samples/sec Loss 5.9100 LearningRate 0.0281 Epoch: 9 Global Step: 156710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:30,135-Speed 9657.36 samples/sec Loss 5.8981 LearningRate 0.0281 Epoch: 9 Global Step: 156720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:31,197-Speed 9647.18 samples/sec Loss 5.8971 LearningRate 0.0281 Epoch: 9 Global Step: 156730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:32,274-Speed 9521.15 samples/sec Loss 5.8976 LearningRate 0.0281 Epoch: 9 Global Step: 156740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:33,417-Speed 8966.33 samples/sec Loss 5.9003 LearningRate 0.0281 Epoch: 9 Global Step: 156750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:34,538-Speed 9135.63 samples/sec Loss 5.8149 LearningRate 0.0281 Epoch: 9 Global Step: 156760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:35,636-Speed 9332.76 samples/sec Loss 5.9598 LearningRate 0.0281 Epoch: 9 Global Step: 156770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:36,728-Speed 9387.62 samples/sec Loss 5.9075 LearningRate 0.0281 Epoch: 9 Global Step: 156780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:37,815-Speed 9425.64 samples/sec Loss 5.8698 LearningRate 0.0281 Epoch: 9 Global Step: 156790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:38,920-Speed 9268.30 samples/sec Loss 5.9658 LearningRate 0.0281 Epoch: 9 Global Step: 156800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:40,059-Speed 8993.95 samples/sec Loss 6.0604 LearningRate 0.0281 Epoch: 9 Global Step: 156810 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:51:41,143-Speed 9460.67 samples/sec Loss 5.8889 LearningRate 0.0281 Epoch: 9 Global Step: 156820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:42,315-Speed 8743.51 samples/sec Loss 6.0112 LearningRate 0.0281 Epoch: 9 Global Step: 156830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:43,435-Speed 9144.61 samples/sec Loss 5.8655 LearningRate 0.0281 Epoch: 9 Global Step: 156840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:44,578-Speed 8962.30 samples/sec Loss 5.7895 LearningRate 0.0281 Epoch: 9 Global Step: 156850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:45,628-Speed 9760.07 samples/sec Loss 5.8863 LearningRate 0.0281 Epoch: 9 Global Step: 156860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:46,720-Speed 9387.42 samples/sec Loss 5.9172 LearningRate 0.0281 Epoch: 9 Global Step: 156870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:47,802-Speed 9469.55 samples/sec Loss 5.9983 LearningRate 0.0281 Epoch: 9 Global Step: 156880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:48,894-Speed 9385.21 samples/sec Loss 5.9281 LearningRate 0.0281 Epoch: 9 Global Step: 156890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:49,988-Speed 9357.27 samples/sec Loss 5.9388 LearningRate 0.0281 Epoch: 9 Global Step: 156900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:51,062-Speed 9549.90 samples/sec Loss 5.9442 LearningRate 0.0281 Epoch: 9 Global Step: 156910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:52,168-Speed 9261.10 samples/sec Loss 5.8548 LearningRate 0.0281 Epoch: 9 Global Step: 156920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:53,253-Speed 9441.92 samples/sec Loss 5.8229 LearningRate 0.0281 Epoch: 9 Global Step: 156930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:54,347-Speed 9366.23 samples/sec Loss 5.9431 LearningRate 0.0281 Epoch: 9 Global Step: 156940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:55,468-Speed 9142.54 samples/sec Loss 5.8217 LearningRate 0.0281 Epoch: 9 Global Step: 156950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:56,599-Speed 9059.17 samples/sec Loss 5.8440 LearningRate 0.0281 Epoch: 9 Global Step: 156960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:57,724-Speed 9104.61 samples/sec Loss 5.9347 LearningRate 0.0281 Epoch: 9 Global Step: 156970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:58,856-Speed 9053.66 samples/sec Loss 5.7996 LearningRate 0.0281 Epoch: 9 Global Step: 156980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:51:59,983-Speed 9093.73 samples/sec Loss 5.8664 LearningRate 0.0281 Epoch: 9 Global Step: 156990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:01,141-Speed 8844.24 samples/sec Loss 6.0549 LearningRate 0.0281 Epoch: 9 Global Step: 157000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:02,258-Speed 9172.80 samples/sec Loss 5.9178 LearningRate 0.0281 Epoch: 9 Global Step: 157010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:03,361-Speed 9291.01 samples/sec Loss 5.8184 LearningRate 0.0281 Epoch: 9 Global Step: 157020 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:52:04,433-Speed 9564.75 samples/sec Loss 5.8238 LearningRate 0.0280 Epoch: 9 Global Step: 157030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:05,528-Speed 9350.32 samples/sec Loss 5.9026 LearningRate 0.0280 Epoch: 9 Global Step: 157040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:06,601-Speed 9555.13 samples/sec Loss 5.8211 LearningRate 0.0280 Epoch: 9 Global Step: 157050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:07,683-Speed 9461.62 samples/sec Loss 5.9199 LearningRate 0.0280 Epoch: 9 Global Step: 157060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:52:08,811-Speed 9088.49 samples/sec Loss 5.9123 LearningRate 0.0280 Epoch: 9 Global Step: 157070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:52:09,895-Speed 9453.09 samples/sec Loss 5.8972 LearningRate 0.0280 Epoch: 9 Global Step: 157080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:52:11,012-Speed 9166.14 samples/sec Loss 5.9579 LearningRate 0.0280 Epoch: 9 Global Step: 157090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:52:12,111-Speed 9328.40 samples/sec Loss 6.0105 LearningRate 0.0280 Epoch: 9 Global Step: 157100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:52:13,203-Speed 9381.50 samples/sec Loss 5.9831 LearningRate 0.0280 Epoch: 9 Global Step: 157110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:52:14,308-Speed 9274.65 samples/sec Loss 6.0215 LearningRate 0.0280 Epoch: 9 Global Step: 157120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:52:15,430-Speed 9129.06 samples/sec Loss 5.9081 LearningRate 0.0280 Epoch: 9 Global Step: 157130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:52:16,544-Speed 9199.62 samples/sec Loss 5.8884 LearningRate 0.0280 Epoch: 9 Global Step: 157140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:52:17,721-Speed 8706.55 samples/sec Loss 6.0490 LearningRate 0.0280 Epoch: 9 Global Step: 157150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:52:18,795-Speed 9537.61 samples/sec Loss 5.9570 LearningRate 0.0280 Epoch: 9 Global Step: 157160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:19,883-Speed 9411.57 samples/sec Loss 5.8771 LearningRate 0.0280 Epoch: 9 Global Step: 157170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:20,994-Speed 9223.04 samples/sec Loss 5.9229 LearningRate 0.0280 Epoch: 9 Global Step: 157180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:22,102-Speed 9256.32 samples/sec Loss 6.0146 LearningRate 0.0280 Epoch: 9 Global Step: 157190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:23,190-Speed 9416.86 samples/sec Loss 5.8578 LearningRate 0.0280 Epoch: 9 Global Step: 157200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:24,293-Speed 9291.56 samples/sec Loss 5.8549 LearningRate 0.0280 Epoch: 9 Global Step: 157210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:25,396-Speed 9293.51 samples/sec Loss 5.7404 LearningRate 0.0280 Epoch: 9 Global Step: 157220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:26,512-Speed 9177.55 samples/sec Loss 5.9113 LearningRate 0.0280 Epoch: 9 Global Step: 157230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:27,617-Speed 9271.59 samples/sec Loss 5.9845 LearningRate 0.0280 Epoch: 9 Global Step: 157240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:28,745-Speed 9083.37 samples/sec Loss 5.9556 LearningRate 0.0280 Epoch: 9 Global Step: 157250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:29,827-Speed 9478.27 samples/sec Loss 5.8488 LearningRate 0.0280 Epoch: 9 Global Step: 157260 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:52:30,946-Speed 9153.16 samples/sec Loss 5.8995 LearningRate 0.0280 Epoch: 9 Global Step: 157270 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:52:32,064-Speed 9166.67 samples/sec Loss 5.9873 LearningRate 0.0280 Epoch: 9 Global Step: 157280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:33,143-Speed 9488.30 samples/sec Loss 5.9442 LearningRate 0.0280 Epoch: 9 Global Step: 157290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:34,241-Speed 9334.57 samples/sec Loss 5.9183 LearningRate 0.0280 Epoch: 9 Global Step: 157300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:35,337-Speed 9355.84 samples/sec Loss 5.9564 LearningRate 0.0280 Epoch: 9 Global Step: 157310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:36,461-Speed 9114.10 samples/sec Loss 5.8904 LearningRate 0.0280 Epoch: 9 Global Step: 157320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:37,583-Speed 9132.29 samples/sec Loss 5.8867 LearningRate 0.0280 Epoch: 9 Global Step: 157330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:38,711-Speed 9079.88 samples/sec Loss 5.9181 LearningRate 0.0279 Epoch: 9 Global Step: 157340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:39,811-Speed 9317.26 samples/sec Loss 5.8933 LearningRate 0.0279 Epoch: 9 Global Step: 157350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:40,932-Speed 9136.57 samples/sec Loss 5.8917 LearningRate 0.0279 Epoch: 9 Global Step: 157360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:42,041-Speed 9249.51 samples/sec Loss 5.8715 LearningRate 0.0279 Epoch: 9 Global Step: 157370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:43,069-Speed 9958.77 samples/sec Loss 5.9122 LearningRate 0.0279 Epoch: 9 Global Step: 157380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:44,145-Speed 9525.50 samples/sec Loss 5.9069 LearningRate 0.0279 Epoch: 9 Global Step: 157390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:45,229-Speed 9455.50 samples/sec Loss 5.8179 LearningRate 0.0279 Epoch: 9 Global Step: 157400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:46,311-Speed 9467.88 samples/sec Loss 5.9097 LearningRate 0.0279 Epoch: 9 Global Step: 157410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:47,402-Speed 9390.08 samples/sec Loss 5.9228 LearningRate 0.0279 Epoch: 9 Global Step: 157420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:48,527-Speed 9103.83 samples/sec Loss 5.9143 LearningRate 0.0279 Epoch: 9 Global Step: 157430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:49,638-Speed 9225.10 samples/sec Loss 5.8767 LearningRate 0.0279 Epoch: 9 Global Step: 157440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:50,721-Speed 9457.63 samples/sec Loss 5.9320 LearningRate 0.0279 Epoch: 9 Global Step: 157450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:51,836-Speed 9193.67 samples/sec Loss 6.0009 LearningRate 0.0279 Epoch: 9 Global Step: 157460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:52,945-Speed 9243.04 samples/sec Loss 5.8988 LearningRate 0.0279 Epoch: 9 Global Step: 157470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:54,057-Speed 9214.10 samples/sec Loss 5.9625 LearningRate 0.0279 Epoch: 9 Global Step: 157480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:52:55,201-Speed 8956.10 samples/sec Loss 5.8525 LearningRate 0.0279 Epoch: 9 Global Step: 157490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:52:56,281-Speed 9486.18 samples/sec Loss 5.9136 LearningRate 0.0279 Epoch: 9 Global Step: 157500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:57,340-Speed 9674.64 samples/sec Loss 5.8458 LearningRate 0.0279 Epoch: 9 Global Step: 157510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:58,454-Speed 9192.73 samples/sec Loss 5.8647 LearningRate 0.0279 Epoch: 9 Global Step: 157520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:52:59,536-Speed 9475.42 samples/sec Loss 5.8551 LearningRate 0.0279 Epoch: 9 Global Step: 157530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:00,648-Speed 9217.31 samples/sec Loss 5.9117 LearningRate 0.0279 Epoch: 9 Global Step: 157540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:01,724-Speed 9519.39 samples/sec Loss 5.9545 LearningRate 0.0279 Epoch: 9 Global Step: 157550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:02,814-Speed 9404.78 samples/sec Loss 6.0121 LearningRate 0.0279 Epoch: 9 Global Step: 157560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:03,958-Speed 8955.74 samples/sec Loss 5.9317 LearningRate 0.0279 Epoch: 9 Global Step: 157570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:05,023-Speed 9618.90 samples/sec Loss 5.8907 LearningRate 0.0279 Epoch: 9 Global Step: 157580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:06,116-Speed 9372.01 samples/sec Loss 6.0768 LearningRate 0.0279 Epoch: 9 Global Step: 157590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:07,222-Speed 9265.20 samples/sec Loss 5.9078 LearningRate 0.0279 Epoch: 9 Global Step: 157600 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:53:08,340-Speed 9163.98 samples/sec Loss 6.0379 LearningRate 0.0279 Epoch: 9 Global Step: 157610 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:53:09,435-Speed 9359.14 samples/sec Loss 5.9205 LearningRate 0.0279 Epoch: 9 Global Step: 157620 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:53:10,533-Speed 9323.96 samples/sec Loss 5.9642 LearningRate 0.0279 Epoch: 9 Global Step: 157630 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:53:11,642-Speed 9244.88 samples/sec Loss 5.9349 LearningRate 0.0279 Epoch: 9 Global Step: 157640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:53:12,760-Speed 9166.11 samples/sec Loss 5.8554 LearningRate 0.0279 Epoch: 9 Global Step: 157650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:13,857-Speed 9337.54 samples/sec Loss 5.8610 LearningRate 0.0278 Epoch: 9 Global Step: 157660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:14,941-Speed 9452.77 samples/sec Loss 5.8989 LearningRate 0.0278 Epoch: 9 Global Step: 157670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:16,014-Speed 9546.65 samples/sec Loss 5.9488 LearningRate 0.0278 Epoch: 9 Global Step: 157680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:17,137-Speed 9127.32 samples/sec Loss 5.8894 LearningRate 0.0278 Epoch: 9 Global Step: 157690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:18,224-Speed 9429.40 samples/sec Loss 5.9570 LearningRate 0.0278 Epoch: 9 Global Step: 157700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:19,291-Speed 9596.56 samples/sec Loss 5.9251 LearningRate 0.0278 Epoch: 9 Global Step: 157710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:20,386-Speed 9359.35 samples/sec Loss 5.9496 LearningRate 0.0278 Epoch: 9 Global Step: 157720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:21,486-Speed 9316.71 samples/sec Loss 5.9784 LearningRate 0.0278 Epoch: 9 Global Step: 157730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:22,618-Speed 9053.06 samples/sec Loss 5.9676 LearningRate 0.0278 Epoch: 9 Global Step: 157740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:23,695-Speed 9512.56 samples/sec Loss 5.8894 LearningRate 0.0278 Epoch: 9 Global Step: 157750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:53:24,808-Speed 9208.00 samples/sec Loss 5.9902 LearningRate 0.0278 Epoch: 9 Global Step: 157760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:53:25,909-Speed 9302.84 samples/sec Loss 5.8570 LearningRate 0.0278 Epoch: 9 Global Step: 157770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:26,976-Speed 9603.87 samples/sec Loss 6.0142 LearningRate 0.0278 Epoch: 9 Global Step: 157780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:28,071-Speed 9359.19 samples/sec Loss 5.9498 LearningRate 0.0278 Epoch: 9 Global Step: 157790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:29,197-Speed 9097.70 samples/sec Loss 6.0132 LearningRate 0.0278 Epoch: 9 Global Step: 157800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:30,313-Speed 9179.26 samples/sec Loss 5.8921 LearningRate 0.0278 Epoch: 9 Global Step: 157810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:31,436-Speed 9124.23 samples/sec Loss 5.9053 LearningRate 0.0278 Epoch: 9 Global Step: 157820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:32,521-Speed 9446.92 samples/sec Loss 5.9147 LearningRate 0.0278 Epoch: 9 Global Step: 157830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:33,602-Speed 9479.23 samples/sec Loss 5.9756 LearningRate 0.0278 Epoch: 9 Global Step: 157840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:34,691-Speed 9401.26 samples/sec Loss 5.8806 LearningRate 0.0278 Epoch: 9 Global Step: 157850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:35,777-Speed 9438.01 samples/sec Loss 6.0452 LearningRate 0.0278 Epoch: 9 Global Step: 157860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:36,867-Speed 9397.69 samples/sec Loss 5.9240 LearningRate 0.0278 Epoch: 9 Global Step: 157870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:37,956-Speed 9418.13 samples/sec Loss 5.9440 LearningRate 0.0278 Epoch: 9 Global Step: 157880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:39,042-Speed 9429.85 samples/sec Loss 5.9796 LearningRate 0.0278 Epoch: 9 Global Step: 157890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:40,145-Speed 9296.12 samples/sec Loss 5.8964 LearningRate 0.0278 Epoch: 9 Global Step: 157900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:41,256-Speed 9221.33 samples/sec Loss 5.8510 LearningRate 0.0278 Epoch: 9 Global Step: 157910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:42,344-Speed 9416.76 samples/sec Loss 6.0100 LearningRate 0.0278 Epoch: 9 Global Step: 157920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:43,419-Speed 9532.85 samples/sec Loss 5.9421 LearningRate 0.0278 Epoch: 9 Global Step: 157930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:44,501-Speed 9469.92 samples/sec Loss 5.8777 LearningRate 0.0278 Epoch: 9 Global Step: 157940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:45,581-Speed 9484.75 samples/sec Loss 5.9997 LearningRate 0.0278 Epoch: 9 Global Step: 157950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:46,667-Speed 9433.13 samples/sec Loss 5.8389 LearningRate 0.0278 Epoch: 9 Global Step: 157960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:47,754-Speed 9423.50 samples/sec Loss 5.7857 LearningRate 0.0277 Epoch: 9 Global Step: 157970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:53:48,856-Speed 9298.60 samples/sec Loss 5.9654 LearningRate 0.0277 Epoch: 9 Global Step: 157980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:49,940-Speed 9452.86 samples/sec Loss 5.8832 LearningRate 0.0277 Epoch: 9 Global Step: 157990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:53:51,006-Speed 9614.19 samples/sec Loss 5.9040 LearningRate 0.0277 Epoch: 9 Global Step: 158000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:54:13,014-[lfw][158000]XNorm: 9.898848 Training: 2022-04-11 17:54:13,015-[lfw][158000]Accuracy-Flip: 0.99683+-0.00320 Training: 2022-04-11 17:54:13,015-[lfw][158000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:54:38,441-[cfp_fp][158000]XNorm: 8.480223 Training: 2022-04-11 17:54:38,442-[cfp_fp][158000]Accuracy-Flip: 0.96086+-0.00819 Training: 2022-04-11 17:54:38,442-[cfp_fp][158000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:55:00,351-[agedb_30][158000]XNorm: 9.613047 Training: 2022-04-11 17:55:00,352-[agedb_30][158000]Accuracy-Flip: 0.96667+-0.00937 Training: 2022-04-11 17:55:00,353-[agedb_30][158000]Accuracy-Highest: 0.96783 Training: 2022-04-11 17:55:01,474-Speed 145.31 samples/sec Loss 5.9447 LearningRate 0.0277 Epoch: 9 Global Step: 158010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:02,576-Speed 9303.58 samples/sec Loss 5.8995 LearningRate 0.0277 Epoch: 9 Global Step: 158020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:03,733-Speed 8851.23 samples/sec Loss 5.9123 LearningRate 0.0277 Epoch: 9 Global Step: 158030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:04,819-Speed 9440.63 samples/sec Loss 5.9192 LearningRate 0.0277 Epoch: 9 Global Step: 158040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:05,911-Speed 9376.98 samples/sec Loss 5.9141 LearningRate 0.0277 Epoch: 9 Global Step: 158050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:06,988-Speed 9512.15 samples/sec Loss 5.9469 LearningRate 0.0277 Epoch: 9 Global Step: 158060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:08,071-Speed 9472.02 samples/sec Loss 5.8988 LearningRate 0.0277 Epoch: 9 Global Step: 158070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:09,170-Speed 9317.67 samples/sec Loss 5.8958 LearningRate 0.0277 Epoch: 9 Global Step: 158080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:10,255-Speed 9448.45 samples/sec Loss 5.8716 LearningRate 0.0277 Epoch: 9 Global Step: 158090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:11,364-Speed 9239.22 samples/sec Loss 5.8386 LearningRate 0.0277 Epoch: 9 Global Step: 158100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:12,449-Speed 9441.24 samples/sec Loss 5.9266 LearningRate 0.0277 Epoch: 9 Global Step: 158110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:13,560-Speed 9223.97 samples/sec Loss 5.9345 LearningRate 0.0277 Epoch: 9 Global Step: 158120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:14,656-Speed 9349.56 samples/sec Loss 5.8590 LearningRate 0.0277 Epoch: 9 Global Step: 158130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:15,746-Speed 9398.64 samples/sec Loss 5.9098 LearningRate 0.0277 Epoch: 9 Global Step: 158140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:16,855-Speed 9239.18 samples/sec Loss 5.8890 LearningRate 0.0277 Epoch: 9 Global Step: 158150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:17,954-Speed 9321.32 samples/sec Loss 5.9714 LearningRate 0.0277 Epoch: 9 Global Step: 158160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:19,050-Speed 9352.06 samples/sec Loss 5.9765 LearningRate 0.0277 Epoch: 9 Global Step: 158170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:20,139-Speed 9404.67 samples/sec Loss 5.9547 LearningRate 0.0277 Epoch: 9 Global Step: 158180 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:55:21,241-Speed 9302.52 samples/sec Loss 6.0376 LearningRate 0.0277 Epoch: 9 Global Step: 158190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:22,296-Speed 9708.38 samples/sec Loss 5.9317 LearningRate 0.0277 Epoch: 9 Global Step: 158200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:23,432-Speed 9020.42 samples/sec Loss 5.9226 LearningRate 0.0277 Epoch: 9 Global Step: 158210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:24,521-Speed 9407.60 samples/sec Loss 5.8582 LearningRate 0.0277 Epoch: 9 Global Step: 158220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:25,603-Speed 9471.83 samples/sec Loss 5.8937 LearningRate 0.0277 Epoch: 9 Global Step: 158230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:26,692-Speed 9413.15 samples/sec Loss 5.9833 LearningRate 0.0277 Epoch: 9 Global Step: 158240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:27,844-Speed 8893.23 samples/sec Loss 5.8818 LearningRate 0.0277 Epoch: 9 Global Step: 158250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:28,941-Speed 9339.60 samples/sec Loss 5.9243 LearningRate 0.0277 Epoch: 9 Global Step: 158260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:30,095-Speed 8879.01 samples/sec Loss 6.0617 LearningRate 0.0277 Epoch: 9 Global Step: 158270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:31,219-Speed 9121.40 samples/sec Loss 5.8729 LearningRate 0.0277 Epoch: 9 Global Step: 158280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:32,385-Speed 8781.98 samples/sec Loss 5.8859 LearningRate 0.0276 Epoch: 9 Global Step: 158290 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:55:33,458-Speed 9553.89 samples/sec Loss 5.9368 LearningRate 0.0276 Epoch: 9 Global Step: 158300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:34,576-Speed 9160.38 samples/sec Loss 5.8827 LearningRate 0.0276 Epoch: 9 Global Step: 158310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:35,640-Speed 9637.96 samples/sec Loss 5.9557 LearningRate 0.0276 Epoch: 9 Global Step: 158320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:36,732-Speed 9380.54 samples/sec Loss 5.9703 LearningRate 0.0276 Epoch: 9 Global Step: 158330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:37,838-Speed 9263.84 samples/sec Loss 5.9940 LearningRate 0.0276 Epoch: 9 Global Step: 158340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:38,971-Speed 9037.70 samples/sec Loss 5.9224 LearningRate 0.0276 Epoch: 9 Global Step: 158350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:40,118-Speed 8938.98 samples/sec Loss 5.9229 LearningRate 0.0276 Epoch: 9 Global Step: 158360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:41,217-Speed 9317.09 samples/sec Loss 5.9126 LearningRate 0.0276 Epoch: 9 Global Step: 158370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:42,331-Speed 9195.51 samples/sec Loss 5.8487 LearningRate 0.0276 Epoch: 9 Global Step: 158380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:43,488-Speed 8865.33 samples/sec Loss 5.7950 LearningRate 0.0276 Epoch: 9 Global Step: 158390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:44,602-Speed 9194.62 samples/sec Loss 6.0188 LearningRate 0.0276 Epoch: 9 Global Step: 158400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:45,721-Speed 9153.60 samples/sec Loss 5.9555 LearningRate 0.0276 Epoch: 9 Global Step: 158410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:55:46,830-Speed 9243.04 samples/sec Loss 5.9145 LearningRate 0.0276 Epoch: 9 Global Step: 158420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:48,011-Speed 8675.62 samples/sec Loss 5.9344 LearningRate 0.0276 Epoch: 9 Global Step: 158430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:49,105-Speed 9368.89 samples/sec Loss 5.9602 LearningRate 0.0276 Epoch: 9 Global Step: 158440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:50,206-Speed 9301.23 samples/sec Loss 5.9890 LearningRate 0.0276 Epoch: 9 Global Step: 158450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:51,326-Speed 9150.65 samples/sec Loss 5.9595 LearningRate 0.0276 Epoch: 9 Global Step: 158460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:52,432-Speed 9263.74 samples/sec Loss 6.0480 LearningRate 0.0276 Epoch: 9 Global Step: 158470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:53,517-Speed 9444.46 samples/sec Loss 6.0005 LearningRate 0.0276 Epoch: 9 Global Step: 158480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:54,629-Speed 9212.31 samples/sec Loss 6.0151 LearningRate 0.0276 Epoch: 9 Global Step: 158490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:55,770-Speed 8976.87 samples/sec Loss 5.9863 LearningRate 0.0276 Epoch: 9 Global Step: 158500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:56,892-Speed 9137.28 samples/sec Loss 5.9941 LearningRate 0.0276 Epoch: 9 Global Step: 158510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:55:57,986-Speed 9364.01 samples/sec Loss 5.9450 LearningRate 0.0276 Epoch: 9 Global Step: 158520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:55:59,058-Speed 9564.31 samples/sec Loss 5.9072 LearningRate 0.0276 Epoch: 9 Global Step: 158530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:00,138-Speed 9483.13 samples/sec Loss 5.9586 LearningRate 0.0276 Epoch: 9 Global Step: 158540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:01,248-Speed 9230.08 samples/sec Loss 5.8859 LearningRate 0.0276 Epoch: 9 Global Step: 158550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:02,338-Speed 9407.30 samples/sec Loss 5.9219 LearningRate 0.0276 Epoch: 9 Global Step: 158560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:03,448-Speed 9228.54 samples/sec Loss 5.8735 LearningRate 0.0276 Epoch: 9 Global Step: 158570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:04,543-Speed 9365.50 samples/sec Loss 5.9284 LearningRate 0.0276 Epoch: 9 Global Step: 158580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:05,663-Speed 9147.70 samples/sec Loss 5.8539 LearningRate 0.0276 Epoch: 9 Global Step: 158590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:06,759-Speed 9344.48 samples/sec Loss 5.8671 LearningRate 0.0276 Epoch: 9 Global Step: 158600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:09,467-Speed 3781.80 samples/sec Loss 5.8816 LearningRate 0.0275 Epoch: 9 Global Step: 158610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:10,576-Speed 9241.08 samples/sec Loss 5.8839 LearningRate 0.0275 Epoch: 9 Global Step: 158620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:11,673-Speed 9336.15 samples/sec Loss 5.8836 LearningRate 0.0275 Epoch: 9 Global Step: 158630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:12,783-Speed 9234.11 samples/sec Loss 5.8373 LearningRate 0.0275 Epoch: 9 Global Step: 158640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:13,847-Speed 9632.66 samples/sec Loss 5.9551 LearningRate 0.0275 Epoch: 9 Global Step: 158650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:14,928-Speed 9476.17 samples/sec Loss 5.9395 LearningRate 0.0275 Epoch: 9 Global Step: 158660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:16,066-Speed 8998.22 samples/sec Loss 6.0298 LearningRate 0.0275 Epoch: 9 Global Step: 158670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:17,173-Speed 9264.08 samples/sec Loss 5.9296 LearningRate 0.0275 Epoch: 9 Global Step: 158680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:18,262-Speed 9407.28 samples/sec Loss 5.9019 LearningRate 0.0275 Epoch: 9 Global Step: 158690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:19,370-Speed 9247.23 samples/sec Loss 5.9387 LearningRate 0.0275 Epoch: 9 Global Step: 158700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:20,493-Speed 9125.99 samples/sec Loss 5.9965 LearningRate 0.0275 Epoch: 9 Global Step: 158710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:21,593-Speed 9318.90 samples/sec Loss 5.8303 LearningRate 0.0275 Epoch: 9 Global Step: 158720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:22,674-Speed 9472.62 samples/sec Loss 5.9641 LearningRate 0.0275 Epoch: 9 Global Step: 158730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:56:23,808-Speed 9039.25 samples/sec Loss 5.9081 LearningRate 0.0275 Epoch: 9 Global Step: 158740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:24,936-Speed 9076.09 samples/sec Loss 5.9461 LearningRate 0.0275 Epoch: 9 Global Step: 158750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:26,033-Speed 9338.87 samples/sec Loss 5.9845 LearningRate 0.0275 Epoch: 9 Global Step: 158760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:27,108-Speed 9532.30 samples/sec Loss 5.9633 LearningRate 0.0275 Epoch: 9 Global Step: 158770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:28,196-Speed 9424.80 samples/sec Loss 5.9800 LearningRate 0.0275 Epoch: 9 Global Step: 158780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:29,315-Speed 9156.69 samples/sec Loss 5.8678 LearningRate 0.0275 Epoch: 9 Global Step: 158790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:30,438-Speed 9125.47 samples/sec Loss 5.9326 LearningRate 0.0275 Epoch: 9 Global Step: 158800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:31,525-Speed 9423.58 samples/sec Loss 5.8642 LearningRate 0.0275 Epoch: 9 Global Step: 158810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:32,628-Speed 9289.17 samples/sec Loss 6.0334 LearningRate 0.0275 Epoch: 9 Global Step: 158820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:33,710-Speed 9469.35 samples/sec Loss 5.9380 LearningRate 0.0275 Epoch: 9 Global Step: 158830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:34,782-Speed 9561.53 samples/sec Loss 5.9628 LearningRate 0.0275 Epoch: 9 Global Step: 158840 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:56:35,851-Speed 9584.65 samples/sec Loss 5.8071 LearningRate 0.0275 Epoch: 9 Global Step: 158850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:56:36,949-Speed 9328.51 samples/sec Loss 6.0611 LearningRate 0.0275 Epoch: 9 Global Step: 158860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:38,043-Speed 9367.52 samples/sec Loss 5.9411 LearningRate 0.0275 Epoch: 9 Global Step: 158870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:39,127-Speed 9452.77 samples/sec Loss 5.9687 LearningRate 0.0275 Epoch: 9 Global Step: 158880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:56:40,245-Speed 9171.25 samples/sec Loss 5.8347 LearningRate 0.0275 Epoch: 9 Global Step: 158890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:56:41,410-Speed 8790.33 samples/sec Loss 6.0019 LearningRate 0.0275 Epoch: 9 Global Step: 158900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:56:42,514-Speed 9278.70 samples/sec Loss 5.9118 LearningRate 0.0275 Epoch: 9 Global Step: 158910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:56:43,636-Speed 9133.50 samples/sec Loss 5.9392 LearningRate 0.0275 Epoch: 9 Global Step: 158920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:56:44,733-Speed 9336.99 samples/sec Loss 5.8272 LearningRate 0.0274 Epoch: 9 Global Step: 158930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:56:45,792-Speed 9678.53 samples/sec Loss 5.8309 LearningRate 0.0274 Epoch: 9 Global Step: 158940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:56:46,906-Speed 9200.77 samples/sec Loss 6.0329 LearningRate 0.0274 Epoch: 9 Global Step: 158950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:56:47,971-Speed 9617.40 samples/sec Loss 5.8824 LearningRate 0.0274 Epoch: 9 Global Step: 158960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:56:49,095-Speed 9116.56 samples/sec Loss 5.9708 LearningRate 0.0274 Epoch: 9 Global Step: 158970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:56:50,220-Speed 9107.07 samples/sec Loss 5.9777 LearningRate 0.0274 Epoch: 9 Global Step: 158980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:51,316-Speed 9349.52 samples/sec Loss 5.9076 LearningRate 0.0274 Epoch: 9 Global Step: 158990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:52,423-Speed 9256.58 samples/sec Loss 5.9045 LearningRate 0.0274 Epoch: 9 Global Step: 159000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:53,534-Speed 9221.21 samples/sec Loss 5.9043 LearningRate 0.0274 Epoch: 9 Global Step: 159010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:54,610-Speed 9515.97 samples/sec Loss 5.9495 LearningRate 0.0274 Epoch: 9 Global Step: 159020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:55,657-Speed 9791.07 samples/sec Loss 5.8283 LearningRate 0.0274 Epoch: 9 Global Step: 159030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:56,756-Speed 9320.59 samples/sec Loss 5.9642 LearningRate 0.0274 Epoch: 9 Global Step: 159040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:57,859-Speed 9296.07 samples/sec Loss 5.8024 LearningRate 0.0274 Epoch: 9 Global Step: 159050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:56:58,961-Speed 9308.93 samples/sec Loss 5.9492 LearningRate 0.0274 Epoch: 9 Global Step: 159060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:00,032-Speed 9564.23 samples/sec Loss 5.9025 LearningRate 0.0274 Epoch: 9 Global Step: 159070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:01,135-Speed 9286.56 samples/sec Loss 5.8888 LearningRate 0.0274 Epoch: 9 Global Step: 159080 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:57:02,245-Speed 9227.76 samples/sec Loss 5.9502 LearningRate 0.0274 Epoch: 9 Global Step: 159090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:03,371-Speed 9107.68 samples/sec Loss 5.9699 LearningRate 0.0274 Epoch: 9 Global Step: 159100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:04,511-Speed 8987.70 samples/sec Loss 5.9227 LearningRate 0.0274 Epoch: 9 Global Step: 159110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:05,618-Speed 9253.99 samples/sec Loss 5.8507 LearningRate 0.0274 Epoch: 9 Global Step: 159120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:06,731-Speed 9208.05 samples/sec Loss 6.0030 LearningRate 0.0274 Epoch: 9 Global Step: 159130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:07,821-Speed 9396.89 samples/sec Loss 5.8850 LearningRate 0.0274 Epoch: 9 Global Step: 159140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:08,889-Speed 9597.57 samples/sec Loss 5.9234 LearningRate 0.0274 Epoch: 9 Global Step: 159150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:09,980-Speed 9391.70 samples/sec Loss 5.8530 LearningRate 0.0274 Epoch: 9 Global Step: 159160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:11,066-Speed 9434.98 samples/sec Loss 5.9157 LearningRate 0.0274 Epoch: 9 Global Step: 159170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:12,237-Speed 8746.94 samples/sec Loss 5.8919 LearningRate 0.0274 Epoch: 9 Global Step: 159180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:13,375-Speed 9009.01 samples/sec Loss 5.9414 LearningRate 0.0274 Epoch: 9 Global Step: 159190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:14,506-Speed 9056.48 samples/sec Loss 5.8085 LearningRate 0.0274 Epoch: 9 Global Step: 159200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:15,655-Speed 8921.36 samples/sec Loss 5.9389 LearningRate 0.0274 Epoch: 9 Global Step: 159210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:16,762-Speed 9258.88 samples/sec Loss 5.9328 LearningRate 0.0274 Epoch: 9 Global Step: 159220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:17,826-Speed 9638.90 samples/sec Loss 6.0833 LearningRate 0.0274 Epoch: 9 Global Step: 159230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:18,938-Speed 9208.99 samples/sec Loss 5.8935 LearningRate 0.0274 Epoch: 9 Global Step: 159240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:20,081-Speed 8963.99 samples/sec Loss 5.9416 LearningRate 0.0273 Epoch: 9 Global Step: 159250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:21,201-Speed 9149.65 samples/sec Loss 5.9913 LearningRate 0.0273 Epoch: 9 Global Step: 159260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:22,300-Speed 9326.11 samples/sec Loss 5.9865 LearningRate 0.0273 Epoch: 9 Global Step: 159270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:23,379-Speed 9497.35 samples/sec Loss 5.9678 LearningRate 0.0273 Epoch: 9 Global Step: 159280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:24,501-Speed 9133.77 samples/sec Loss 5.8669 LearningRate 0.0273 Epoch: 9 Global Step: 159290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:25,606-Speed 9271.90 samples/sec Loss 5.9620 LearningRate 0.0273 Epoch: 9 Global Step: 159300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:26,776-Speed 8749.60 samples/sec Loss 6.0761 LearningRate 0.0273 Epoch: 9 Global Step: 159310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:27,862-Speed 9441.75 samples/sec Loss 5.9898 LearningRate 0.0273 Epoch: 9 Global Step: 159320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 17:57:28,952-Speed 9395.07 samples/sec Loss 6.0254 LearningRate 0.0273 Epoch: 9 Global Step: 159330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:30,081-Speed 9073.67 samples/sec Loss 5.8649 LearningRate 0.0273 Epoch: 9 Global Step: 159340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:31,186-Speed 9278.68 samples/sec Loss 5.9819 LearningRate 0.0273 Epoch: 9 Global Step: 159350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:33,313-Speed 4815.28 samples/sec Loss 5.9221 LearningRate 0.0273 Epoch: 9 Global Step: 159360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:34,394-Speed 9483.67 samples/sec Loss 5.8872 LearningRate 0.0273 Epoch: 9 Global Step: 159370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:35,480-Speed 9433.83 samples/sec Loss 5.9778 LearningRate 0.0273 Epoch: 9 Global Step: 159380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:37,618-Speed 4791.21 samples/sec Loss 5.8963 LearningRate 0.0273 Epoch: 9 Global Step: 159390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:38,693-Speed 9529.67 samples/sec Loss 5.8703 LearningRate 0.0273 Epoch: 9 Global Step: 159400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:39,791-Speed 9335.31 samples/sec Loss 5.8877 LearningRate 0.0273 Epoch: 9 Global Step: 159410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:40,863-Speed 9558.27 samples/sec Loss 5.9371 LearningRate 0.0273 Epoch: 9 Global Step: 159420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:41,980-Speed 9170.78 samples/sec Loss 5.9302 LearningRate 0.0273 Epoch: 9 Global Step: 159430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:57:43,054-Speed 9545.90 samples/sec Loss 5.9594 LearningRate 0.0273 Epoch: 9 Global Step: 159440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:44,204-Speed 8907.07 samples/sec Loss 5.8202 LearningRate 0.0273 Epoch: 9 Global Step: 159450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:45,275-Speed 9565.25 samples/sec Loss 5.8808 LearningRate 0.0273 Epoch: 9 Global Step: 159460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:46,373-Speed 9331.57 samples/sec Loss 5.8861 LearningRate 0.0273 Epoch: 9 Global Step: 159470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:47,471-Speed 9328.48 samples/sec Loss 6.0327 LearningRate 0.0273 Epoch: 9 Global Step: 159480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:48,604-Speed 9047.93 samples/sec Loss 5.8787 LearningRate 0.0273 Epoch: 9 Global Step: 159490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:49,694-Speed 9397.87 samples/sec Loss 5.9619 LearningRate 0.0273 Epoch: 9 Global Step: 159500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:50,777-Speed 9463.08 samples/sec Loss 5.9455 LearningRate 0.0273 Epoch: 9 Global Step: 159510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:51,888-Speed 9221.89 samples/sec Loss 5.8150 LearningRate 0.0273 Epoch: 9 Global Step: 159520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:52,981-Speed 9377.94 samples/sec Loss 5.9696 LearningRate 0.0273 Epoch: 9 Global Step: 159530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:54,147-Speed 8786.31 samples/sec Loss 5.9708 LearningRate 0.0273 Epoch: 9 Global Step: 159540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:57:55,236-Speed 9404.43 samples/sec Loss 5.9078 LearningRate 0.0273 Epoch: 9 Global Step: 159550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:56,340-Speed 9284.17 samples/sec Loss 5.9727 LearningRate 0.0273 Epoch: 9 Global Step: 159560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:57,480-Speed 8983.33 samples/sec Loss 5.8741 LearningRate 0.0272 Epoch: 9 Global Step: 159570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:58,557-Speed 9515.51 samples/sec Loss 5.9512 LearningRate 0.0272 Epoch: 9 Global Step: 159580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:57:59,628-Speed 9563.94 samples/sec Loss 6.0808 LearningRate 0.0272 Epoch: 9 Global Step: 159590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:00,719-Speed 9391.10 samples/sec Loss 5.9403 LearningRate 0.0272 Epoch: 9 Global Step: 159600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:01,856-Speed 9009.51 samples/sec Loss 5.9612 LearningRate 0.0272 Epoch: 9 Global Step: 159610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:02,952-Speed 9353.16 samples/sec Loss 5.8508 LearningRate 0.0272 Epoch: 9 Global Step: 159620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:04,047-Speed 9361.22 samples/sec Loss 5.9589 LearningRate 0.0272 Epoch: 9 Global Step: 159630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:05,125-Speed 9504.95 samples/sec Loss 5.9104 LearningRate 0.0272 Epoch: 9 Global Step: 159640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:06,225-Speed 9314.54 samples/sec Loss 6.0066 LearningRate 0.0272 Epoch: 9 Global Step: 159650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:58:07,360-Speed 9026.20 samples/sec Loss 5.9488 LearningRate 0.0272 Epoch: 9 Global Step: 159660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:58:08,412-Speed 9735.99 samples/sec Loss 5.8730 LearningRate 0.0272 Epoch: 9 Global Step: 159670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:58:09,496-Speed 9452.57 samples/sec Loss 6.0368 LearningRate 0.0272 Epoch: 9 Global Step: 159680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:10,565-Speed 9590.88 samples/sec Loss 5.8799 LearningRate 0.0272 Epoch: 9 Global Step: 159690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:11,638-Speed 9550.53 samples/sec Loss 5.8175 LearningRate 0.0272 Epoch: 9 Global Step: 159700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:12,716-Speed 9501.02 samples/sec Loss 5.8941 LearningRate 0.0272 Epoch: 9 Global Step: 159710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:13,822-Speed 9266.54 samples/sec Loss 6.0406 LearningRate 0.0272 Epoch: 9 Global Step: 159720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:14,897-Speed 9533.09 samples/sec Loss 5.8889 LearningRate 0.0272 Epoch: 9 Global Step: 159730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:15,985-Speed 9415.29 samples/sec Loss 5.8154 LearningRate 0.0272 Epoch: 9 Global Step: 159740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:17,071-Speed 9430.27 samples/sec Loss 5.8810 LearningRate 0.0272 Epoch: 9 Global Step: 159750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:18,172-Speed 9304.62 samples/sec Loss 6.0255 LearningRate 0.0272 Epoch: 9 Global Step: 159760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:19,262-Speed 9405.36 samples/sec Loss 5.8867 LearningRate 0.0272 Epoch: 9 Global Step: 159770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:20,370-Speed 9242.01 samples/sec Loss 5.9761 LearningRate 0.0272 Epoch: 9 Global Step: 159780 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:58:21,468-Speed 9329.93 samples/sec Loss 5.9945 LearningRate 0.0272 Epoch: 9 Global Step: 159790 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:58:22,554-Speed 9433.90 samples/sec Loss 5.8302 LearningRate 0.0272 Epoch: 9 Global Step: 159800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:23,639-Speed 9448.36 samples/sec Loss 5.8534 LearningRate 0.0272 Epoch: 9 Global Step: 159810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:24,729-Speed 9406.18 samples/sec Loss 5.9841 LearningRate 0.0272 Epoch: 9 Global Step: 159820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:25,840-Speed 9217.20 samples/sec Loss 5.9189 LearningRate 0.0272 Epoch: 9 Global Step: 159830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:26,978-Speed 9007.43 samples/sec Loss 5.8775 LearningRate 0.0272 Epoch: 9 Global Step: 159840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:28,094-Speed 9173.10 samples/sec Loss 5.9585 LearningRate 0.0272 Epoch: 9 Global Step: 159850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:29,204-Speed 9235.48 samples/sec Loss 5.8668 LearningRate 0.0272 Epoch: 9 Global Step: 159860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:30,309-Speed 9270.52 samples/sec Loss 6.0936 LearningRate 0.0272 Epoch: 9 Global Step: 159870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:31,416-Speed 9255.64 samples/sec Loss 5.9518 LearningRate 0.0272 Epoch: 9 Global Step: 159880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:32,576-Speed 8832.94 samples/sec Loss 5.9730 LearningRate 0.0271 Epoch: 9 Global Step: 159890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:33,709-Speed 9048.67 samples/sec Loss 5.9891 LearningRate 0.0271 Epoch: 9 Global Step: 159900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:58:34,803-Speed 9371.16 samples/sec Loss 5.9969 LearningRate 0.0271 Epoch: 9 Global Step: 159910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:35,903-Speed 9311.50 samples/sec Loss 5.9711 LearningRate 0.0271 Epoch: 9 Global Step: 159920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:37,085-Speed 8667.74 samples/sec Loss 5.8456 LearningRate 0.0271 Epoch: 9 Global Step: 159930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:38,164-Speed 9494.28 samples/sec Loss 5.8630 LearningRate 0.0271 Epoch: 9 Global Step: 159940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:39,268-Speed 9279.19 samples/sec Loss 5.9433 LearningRate 0.0271 Epoch: 9 Global Step: 159950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:40,398-Speed 9074.26 samples/sec Loss 5.8462 LearningRate 0.0271 Epoch: 9 Global Step: 159960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:41,528-Speed 9061.64 samples/sec Loss 6.0143 LearningRate 0.0271 Epoch: 9 Global Step: 159970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:42,656-Speed 9084.40 samples/sec Loss 5.8203 LearningRate 0.0271 Epoch: 9 Global Step: 159980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:43,789-Speed 9042.41 samples/sec Loss 5.8705 LearningRate 0.0271 Epoch: 9 Global Step: 159990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:58:44,860-Speed 9566.80 samples/sec Loss 5.9423 LearningRate 0.0271 Epoch: 9 Global Step: 160000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:59:07,016-[lfw][160000]XNorm: 9.833157 Training: 2022-04-11 17:59:07,017-[lfw][160000]Accuracy-Flip: 0.99583+-0.00310 Training: 2022-04-11 17:59:07,017-[lfw][160000]Accuracy-Highest: 0.99683 Training: 2022-04-11 17:59:32,650-[cfp_fp][160000]XNorm: 8.339008 Training: 2022-04-11 17:59:32,652-[cfp_fp][160000]Accuracy-Flip: 0.96286+-0.01290 Training: 2022-04-11 17:59:32,652-[cfp_fp][160000]Accuracy-Highest: 0.96500 Training: 2022-04-11 17:59:54,660-[agedb_30][160000]XNorm: 9.510938 Training: 2022-04-11 17:59:54,661-[agedb_30][160000]Accuracy-Flip: 0.96583+-0.01086 Training: 2022-04-11 17:59:54,661-[agedb_30][160000]Accuracy-Highest: 0.96783 Training: 2022-04-11 17:59:55,726-Speed 144.50 samples/sec Loss 5.9835 LearningRate 0.0271 Epoch: 9 Global Step: 160010 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 17:59:56,788-Speed 9648.65 samples/sec Loss 5.9552 LearningRate 0.0271 Epoch: 9 Global Step: 160020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:59:57,865-Speed 9513.56 samples/sec Loss 5.9343 LearningRate 0.0271 Epoch: 9 Global Step: 160030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 17:59:58,986-Speed 9145.35 samples/sec Loss 5.8518 LearningRate 0.0271 Epoch: 9 Global Step: 160040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:00,104-Speed 9158.63 samples/sec Loss 5.8979 LearningRate 0.0271 Epoch: 9 Global Step: 160050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:01,213-Speed 9239.72 samples/sec Loss 5.8463 LearningRate 0.0271 Epoch: 9 Global Step: 160060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:02,319-Speed 9271.54 samples/sec Loss 5.9491 LearningRate 0.0271 Epoch: 9 Global Step: 160070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:03,456-Speed 9010.75 samples/sec Loss 6.0675 LearningRate 0.0271 Epoch: 9 Global Step: 160080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:04,538-Speed 9466.42 samples/sec Loss 5.9846 LearningRate 0.0271 Epoch: 9 Global Step: 160090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:05,607-Speed 9585.60 samples/sec Loss 5.9111 LearningRate 0.0271 Epoch: 9 Global Step: 160100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:06,737-Speed 9067.92 samples/sec Loss 5.8427 LearningRate 0.0271 Epoch: 9 Global Step: 160110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:07,869-Speed 9054.11 samples/sec Loss 5.9533 LearningRate 0.0271 Epoch: 9 Global Step: 160120 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:00:08,969-Speed 9311.84 samples/sec Loss 5.8713 LearningRate 0.0271 Epoch: 9 Global Step: 160130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:12,003-Speed 3376.80 samples/sec Loss 5.9451 LearningRate 0.0271 Epoch: 9 Global Step: 160140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:13,103-Speed 9314.81 samples/sec Loss 5.9724 LearningRate 0.0271 Epoch: 9 Global Step: 160150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:14,251-Speed 8927.79 samples/sec Loss 5.9078 LearningRate 0.0271 Epoch: 9 Global Step: 160160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:16,217-Speed 5210.51 samples/sec Loss 5.8344 LearningRate 0.0271 Epoch: 9 Global Step: 160170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:17,345-Speed 9081.16 samples/sec Loss 5.9749 LearningRate 0.0271 Epoch: 9 Global Step: 160180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:18,484-Speed 8992.84 samples/sec Loss 5.8728 LearningRate 0.0271 Epoch: 9 Global Step: 160190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:20,446-Speed 5222.13 samples/sec Loss 5.9465 LearningRate 0.0271 Epoch: 9 Global Step: 160200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:21,565-Speed 9157.80 samples/sec Loss 5.7677 LearningRate 0.0270 Epoch: 9 Global Step: 160210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:22,661-Speed 9344.06 samples/sec Loss 5.9722 LearningRate 0.0270 Epoch: 9 Global Step: 160220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:23,748-Speed 9426.45 samples/sec Loss 5.9939 LearningRate 0.0270 Epoch: 9 Global Step: 160230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:24,820-Speed 9559.28 samples/sec Loss 5.9906 LearningRate 0.0270 Epoch: 9 Global Step: 160240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:25,907-Speed 9423.30 samples/sec Loss 5.9428 LearningRate 0.0270 Epoch: 9 Global Step: 160250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:27,013-Speed 9270.99 samples/sec Loss 5.9757 LearningRate 0.0270 Epoch: 9 Global Step: 160260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:28,129-Speed 9173.02 samples/sec Loss 5.9642 LearningRate 0.0270 Epoch: 9 Global Step: 160270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:29,205-Speed 9528.34 samples/sec Loss 6.0267 LearningRate 0.0270 Epoch: 9 Global Step: 160280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:30,289-Speed 9452.16 samples/sec Loss 5.8585 LearningRate 0.0270 Epoch: 9 Global Step: 160290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:31,387-Speed 9334.87 samples/sec Loss 5.8245 LearningRate 0.0270 Epoch: 9 Global Step: 160300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:32,497-Speed 9225.22 samples/sec Loss 5.9022 LearningRate 0.0270 Epoch: 9 Global Step: 160310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:33,601-Speed 9289.22 samples/sec Loss 5.9509 LearningRate 0.0270 Epoch: 9 Global Step: 160320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:34,680-Speed 9492.39 samples/sec Loss 5.9883 LearningRate 0.0270 Epoch: 9 Global Step: 160330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:00:35,763-Speed 9457.65 samples/sec Loss 5.9996 LearningRate 0.0270 Epoch: 9 Global Step: 160340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:36,837-Speed 9542.43 samples/sec Loss 5.9563 LearningRate 0.0270 Epoch: 9 Global Step: 160350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:37,929-Speed 9385.65 samples/sec Loss 5.9223 LearningRate 0.0270 Epoch: 9 Global Step: 160360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:39,063-Speed 9028.12 samples/sec Loss 5.9930 LearningRate 0.0270 Epoch: 9 Global Step: 160370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:40,138-Speed 9533.92 samples/sec Loss 5.8638 LearningRate 0.0270 Epoch: 9 Global Step: 160380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:41,237-Speed 9325.60 samples/sec Loss 5.8926 LearningRate 0.0270 Epoch: 9 Global Step: 160390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:42,316-Speed 9491.27 samples/sec Loss 5.9085 LearningRate 0.0270 Epoch: 9 Global Step: 160400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:43,444-Speed 9087.76 samples/sec Loss 5.9380 LearningRate 0.0270 Epoch: 9 Global Step: 160410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:44,530-Speed 9436.84 samples/sec Loss 5.9313 LearningRate 0.0270 Epoch: 9 Global Step: 160420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:45,676-Speed 8938.52 samples/sec Loss 5.8535 LearningRate 0.0270 Epoch: 9 Global Step: 160430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:46,764-Speed 9420.26 samples/sec Loss 6.0042 LearningRate 0.0270 Epoch: 9 Global Step: 160440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:47,836-Speed 9557.18 samples/sec Loss 5.7801 LearningRate 0.0270 Epoch: 9 Global Step: 160450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:48,915-Speed 9499.98 samples/sec Loss 5.9026 LearningRate 0.0270 Epoch: 9 Global Step: 160460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:49,973-Speed 9686.05 samples/sec Loss 5.9380 LearningRate 0.0270 Epoch: 9 Global Step: 160470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:51,039-Speed 9609.85 samples/sec Loss 5.8993 LearningRate 0.0270 Epoch: 9 Global Step: 160480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:52,099-Speed 9662.88 samples/sec Loss 5.8236 LearningRate 0.0270 Epoch: 9 Global Step: 160490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:53,185-Speed 9434.98 samples/sec Loss 5.8462 LearningRate 0.0270 Epoch: 9 Global Step: 160500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:54,276-Speed 9392.18 samples/sec Loss 5.8931 LearningRate 0.0270 Epoch: 9 Global Step: 160510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:55,402-Speed 9102.82 samples/sec Loss 5.8550 LearningRate 0.0270 Epoch: 9 Global Step: 160520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:56,477-Speed 9532.83 samples/sec Loss 5.8510 LearningRate 0.0269 Epoch: 9 Global Step: 160530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:57,563-Speed 9433.43 samples/sec Loss 6.0072 LearningRate 0.0269 Epoch: 9 Global Step: 160540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:58,637-Speed 9539.33 samples/sec Loss 5.9535 LearningRate 0.0269 Epoch: 9 Global Step: 160550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:00:59,745-Speed 9249.62 samples/sec Loss 5.9462 LearningRate 0.0269 Epoch: 9 Global Step: 160560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:00,831-Speed 9434.56 samples/sec Loss 5.8900 LearningRate 0.0269 Epoch: 9 Global Step: 160570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:01,940-Speed 9237.13 samples/sec Loss 5.8807 LearningRate 0.0269 Epoch: 9 Global Step: 160580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:02,979-Speed 9865.63 samples/sec Loss 5.8935 LearningRate 0.0269 Epoch: 9 Global Step: 160590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:01:04,077-Speed 9323.64 samples/sec Loss 5.7780 LearningRate 0.0269 Epoch: 9 Global Step: 160600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:01:05,170-Speed 9381.05 samples/sec Loss 5.9942 LearningRate 0.0269 Epoch: 9 Global Step: 160610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:01:06,267-Speed 9337.32 samples/sec Loss 5.8786 LearningRate 0.0269 Epoch: 9 Global Step: 160620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:01:07,360-Speed 9373.11 samples/sec Loss 6.0266 LearningRate 0.0269 Epoch: 9 Global Step: 160630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:01:08,466-Speed 9268.82 samples/sec Loss 5.9228 LearningRate 0.0269 Epoch: 9 Global Step: 160640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:01:09,539-Speed 9551.02 samples/sec Loss 5.9943 LearningRate 0.0269 Epoch: 9 Global Step: 160650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:01:10,607-Speed 9595.69 samples/sec Loss 5.9533 LearningRate 0.0269 Epoch: 9 Global Step: 160660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:01:11,660-Speed 9726.36 samples/sec Loss 5.9569 LearningRate 0.0269 Epoch: 9 Global Step: 160670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:01:12,799-Speed 8994.35 samples/sec Loss 5.9729 LearningRate 0.0269 Epoch: 9 Global Step: 160680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:01:13,928-Speed 9076.73 samples/sec Loss 5.8819 LearningRate 0.0269 Epoch: 9 Global Step: 160690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:15,017-Speed 9410.99 samples/sec Loss 5.8751 LearningRate 0.0269 Epoch: 9 Global Step: 160700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:16,100-Speed 9462.37 samples/sec Loss 5.8964 LearningRate 0.0269 Epoch: 9 Global Step: 160710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:17,219-Speed 9155.44 samples/sec Loss 5.9711 LearningRate 0.0269 Epoch: 9 Global Step: 160720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:18,313-Speed 9363.68 samples/sec Loss 5.9461 LearningRate 0.0269 Epoch: 9 Global Step: 160730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:19,419-Speed 9267.29 samples/sec Loss 5.9222 LearningRate 0.0269 Epoch: 9 Global Step: 160740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:20,538-Speed 9154.38 samples/sec Loss 5.9885 LearningRate 0.0269 Epoch: 9 Global Step: 160750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:21,679-Speed 8975.96 samples/sec Loss 6.0529 LearningRate 0.0269 Epoch: 9 Global Step: 160760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:22,783-Speed 9284.34 samples/sec Loss 5.9215 LearningRate 0.0269 Epoch: 9 Global Step: 160770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:23,898-Speed 9189.52 samples/sec Loss 5.8960 LearningRate 0.0269 Epoch: 9 Global Step: 160780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:24,984-Speed 9433.97 samples/sec Loss 6.0418 LearningRate 0.0269 Epoch: 9 Global Step: 160790 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:01:26,048-Speed 9634.34 samples/sec Loss 5.8927 LearningRate 0.0269 Epoch: 9 Global Step: 160800 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:01:27,133-Speed 9452.25 samples/sec Loss 5.9010 LearningRate 0.0269 Epoch: 9 Global Step: 160810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:28,219-Speed 9435.27 samples/sec Loss 5.8098 LearningRate 0.0269 Epoch: 9 Global Step: 160820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:29,320-Speed 9298.99 samples/sec Loss 5.9276 LearningRate 0.0269 Epoch: 9 Global Step: 160830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:30,444-Speed 9117.89 samples/sec Loss 5.7365 LearningRate 0.0269 Epoch: 9 Global Step: 160840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:31,543-Speed 9326.99 samples/sec Loss 5.8567 LearningRate 0.0268 Epoch: 9 Global Step: 160850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:32,639-Speed 9346.65 samples/sec Loss 5.9245 LearningRate 0.0268 Epoch: 9 Global Step: 160860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:33,772-Speed 9045.21 samples/sec Loss 6.0074 LearningRate 0.0268 Epoch: 9 Global Step: 160870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:34,866-Speed 9365.33 samples/sec Loss 5.8513 LearningRate 0.0268 Epoch: 9 Global Step: 160880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:35,968-Speed 9296.16 samples/sec Loss 6.0039 LearningRate 0.0268 Epoch: 9 Global Step: 160890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:37,090-Speed 9129.78 samples/sec Loss 5.9664 LearningRate 0.0268 Epoch: 9 Global Step: 160900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:38,202-Speed 9219.04 samples/sec Loss 6.0558 LearningRate 0.0268 Epoch: 9 Global Step: 160910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:01:39,259-Speed 9691.60 samples/sec Loss 5.9460 LearningRate 0.0268 Epoch: 9 Global Step: 160920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:40,320-Speed 9660.55 samples/sec Loss 5.8349 LearningRate 0.0268 Epoch: 9 Global Step: 160930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:41,440-Speed 9141.73 samples/sec Loss 5.8948 LearningRate 0.0268 Epoch: 9 Global Step: 160940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:42,539-Speed 9322.65 samples/sec Loss 5.9168 LearningRate 0.0268 Epoch: 9 Global Step: 160950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:43,664-Speed 9109.96 samples/sec Loss 5.9156 LearningRate 0.0268 Epoch: 9 Global Step: 160960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:44,748-Speed 9455.90 samples/sec Loss 6.0211 LearningRate 0.0268 Epoch: 9 Global Step: 160970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:45,865-Speed 9166.08 samples/sec Loss 5.7559 LearningRate 0.0268 Epoch: 9 Global Step: 160980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:46,952-Speed 9434.80 samples/sec Loss 5.8521 LearningRate 0.0268 Epoch: 9 Global Step: 160990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:48,044-Speed 9382.21 samples/sec Loss 5.9291 LearningRate 0.0268 Epoch: 9 Global Step: 161000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:49,154-Speed 9234.22 samples/sec Loss 5.7871 LearningRate 0.0268 Epoch: 9 Global Step: 161010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:50,238-Speed 9448.65 samples/sec Loss 5.9291 LearningRate 0.0268 Epoch: 9 Global Step: 161020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:51,302-Speed 9634.40 samples/sec Loss 5.9816 LearningRate 0.0268 Epoch: 9 Global Step: 161030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:52,393-Speed 9390.68 samples/sec Loss 5.9825 LearningRate 0.0268 Epoch: 9 Global Step: 161040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:53,493-Speed 9314.13 samples/sec Loss 5.9572 LearningRate 0.0268 Epoch: 9 Global Step: 161050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:54,581-Speed 9422.38 samples/sec Loss 5.9017 LearningRate 0.0268 Epoch: 9 Global Step: 161060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:55,699-Speed 9164.10 samples/sec Loss 5.8804 LearningRate 0.0268 Epoch: 9 Global Step: 161070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:56,823-Speed 9112.63 samples/sec Loss 5.8317 LearningRate 0.0268 Epoch: 9 Global Step: 161080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:57,885-Speed 9653.14 samples/sec Loss 5.9277 LearningRate 0.0268 Epoch: 9 Global Step: 161090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:01:58,949-Speed 9621.50 samples/sec Loss 5.9194 LearningRate 0.0268 Epoch: 9 Global Step: 161100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:00,023-Speed 9543.80 samples/sec Loss 5.9435 LearningRate 0.0268 Epoch: 9 Global Step: 161110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:01,118-Speed 9355.55 samples/sec Loss 5.9317 LearningRate 0.0268 Epoch: 9 Global Step: 161120 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:02:02,222-Speed 9277.42 samples/sec Loss 5.8796 LearningRate 0.0268 Epoch: 9 Global Step: 161130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:03,302-Speed 9486.21 samples/sec Loss 5.9732 LearningRate 0.0268 Epoch: 9 Global Step: 161140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:04,392-Speed 9401.08 samples/sec Loss 5.9863 LearningRate 0.0268 Epoch: 9 Global Step: 161150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:05,472-Speed 9486.37 samples/sec Loss 5.9003 LearningRate 0.0268 Epoch: 9 Global Step: 161160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:06,593-Speed 9140.85 samples/sec Loss 6.0191 LearningRate 0.0267 Epoch: 9 Global Step: 161170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:07,682-Speed 9415.19 samples/sec Loss 6.0327 LearningRate 0.0267 Epoch: 9 Global Step: 161180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:08,768-Speed 9433.96 samples/sec Loss 5.9055 LearningRate 0.0267 Epoch: 9 Global Step: 161190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:09,917-Speed 8920.66 samples/sec Loss 5.9615 LearningRate 0.0267 Epoch: 9 Global Step: 161200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:11,022-Speed 9269.70 samples/sec Loss 6.0171 LearningRate 0.0267 Epoch: 9 Global Step: 161210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:12,131-Speed 9237.05 samples/sec Loss 5.8067 LearningRate 0.0267 Epoch: 9 Global Step: 161220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:13,233-Speed 9295.10 samples/sec Loss 5.8888 LearningRate 0.0267 Epoch: 9 Global Step: 161230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:02:14,351-Speed 9168.73 samples/sec Loss 5.8565 LearningRate 0.0267 Epoch: 9 Global Step: 161240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:02:15,416-Speed 9620.33 samples/sec Loss 5.9103 LearningRate 0.0267 Epoch: 9 Global Step: 161250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:16,540-Speed 9112.68 samples/sec Loss 5.8634 LearningRate 0.0267 Epoch: 9 Global Step: 161260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:02:17,635-Speed 9359.11 samples/sec Loss 5.9083 LearningRate 0.0267 Epoch: 9 Global Step: 161270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:02:18,719-Speed 9457.59 samples/sec Loss 5.9342 LearningRate 0.0267 Epoch: 9 Global Step: 161280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:02:19,846-Speed 9090.73 samples/sec Loss 6.0251 LearningRate 0.0267 Epoch: 9 Global Step: 161290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:02:20,964-Speed 9165.21 samples/sec Loss 5.9979 LearningRate 0.0267 Epoch: 9 Global Step: 161300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:02:22,018-Speed 9726.70 samples/sec Loss 5.9487 LearningRate 0.0267 Epoch: 9 Global Step: 161310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:02:23,103-Speed 9439.66 samples/sec Loss 5.9446 LearningRate 0.0267 Epoch: 9 Global Step: 161320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:02:24,218-Speed 9194.76 samples/sec Loss 6.0021 LearningRate 0.0267 Epoch: 9 Global Step: 161330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:02:25,373-Speed 8876.27 samples/sec Loss 5.9258 LearningRate 0.0267 Epoch: 9 Global Step: 161340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:02:26,505-Speed 9045.69 samples/sec Loss 5.9330 LearningRate 0.0267 Epoch: 9 Global Step: 161350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:02:27,582-Speed 9514.17 samples/sec Loss 5.8700 LearningRate 0.0267 Epoch: 9 Global Step: 161360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:28,664-Speed 9467.28 samples/sec Loss 5.8542 LearningRate 0.0267 Epoch: 9 Global Step: 161370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:29,755-Speed 9395.45 samples/sec Loss 5.9940 LearningRate 0.0267 Epoch: 9 Global Step: 161380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:30,832-Speed 9510.25 samples/sec Loss 5.9422 LearningRate 0.0267 Epoch: 9 Global Step: 161390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:31,909-Speed 9516.04 samples/sec Loss 5.9454 LearningRate 0.0267 Epoch: 9 Global Step: 161400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:33,026-Speed 9165.94 samples/sec Loss 5.8209 LearningRate 0.0267 Epoch: 9 Global Step: 161410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:34,107-Speed 9479.37 samples/sec Loss 5.9454 LearningRate 0.0267 Epoch: 9 Global Step: 161420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:35,199-Speed 9386.01 samples/sec Loss 6.0315 LearningRate 0.0267 Epoch: 9 Global Step: 161430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:36,318-Speed 9150.95 samples/sec Loss 6.0417 LearningRate 0.0267 Epoch: 9 Global Step: 161440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:37,423-Speed 9280.78 samples/sec Loss 5.8641 LearningRate 0.0267 Epoch: 9 Global Step: 161450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:38,510-Speed 9426.11 samples/sec Loss 5.9268 LearningRate 0.0267 Epoch: 9 Global Step: 161460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:02:39,607-Speed 9335.14 samples/sec Loss 5.9969 LearningRate 0.0267 Epoch: 9 Global Step: 161470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:40,698-Speed 9391.35 samples/sec Loss 5.9171 LearningRate 0.0267 Epoch: 9 Global Step: 161480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:41,848-Speed 8906.04 samples/sec Loss 5.9058 LearningRate 0.0266 Epoch: 9 Global Step: 161490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:42,939-Speed 9401.96 samples/sec Loss 5.7764 LearningRate 0.0266 Epoch: 9 Global Step: 161500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:44,027-Speed 9419.27 samples/sec Loss 5.9116 LearningRate 0.0266 Epoch: 9 Global Step: 161510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:45,140-Speed 9205.05 samples/sec Loss 5.8765 LearningRate 0.0266 Epoch: 9 Global Step: 161520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:46,254-Speed 9192.19 samples/sec Loss 5.9152 LearningRate 0.0266 Epoch: 9 Global Step: 161530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:47,312-Speed 9683.72 samples/sec Loss 5.9773 LearningRate 0.0266 Epoch: 9 Global Step: 161540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:48,391-Speed 9500.59 samples/sec Loss 5.9319 LearningRate 0.0266 Epoch: 9 Global Step: 161550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:49,512-Speed 9139.55 samples/sec Loss 5.9956 LearningRate 0.0266 Epoch: 9 Global Step: 161560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:50,606-Speed 9368.76 samples/sec Loss 5.8890 LearningRate 0.0266 Epoch: 9 Global Step: 161570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:51,732-Speed 9099.31 samples/sec Loss 5.9669 LearningRate 0.0266 Epoch: 9 Global Step: 161580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:52,802-Speed 9573.27 samples/sec Loss 5.9247 LearningRate 0.0266 Epoch: 9 Global Step: 161590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:53,884-Speed 9475.13 samples/sec Loss 5.8971 LearningRate 0.0266 Epoch: 9 Global Step: 161600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:54,984-Speed 9311.84 samples/sec Loss 5.9527 LearningRate 0.0266 Epoch: 9 Global Step: 161610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:56,081-Speed 9343.14 samples/sec Loss 5.9453 LearningRate 0.0266 Epoch: 9 Global Step: 161620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:57,189-Speed 9249.32 samples/sec Loss 5.9056 LearningRate 0.0266 Epoch: 9 Global Step: 161630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:58,297-Speed 9243.21 samples/sec Loss 5.9731 LearningRate 0.0266 Epoch: 9 Global Step: 161640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:02:59,381-Speed 9450.85 samples/sec Loss 5.8934 LearningRate 0.0266 Epoch: 9 Global Step: 161650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:00,476-Speed 9357.97 samples/sec Loss 5.9449 LearningRate 0.0266 Epoch: 9 Global Step: 161660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:01,583-Speed 9254.76 samples/sec Loss 5.8675 LearningRate 0.0266 Epoch: 9 Global Step: 161670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:03:02,725-Speed 8975.66 samples/sec Loss 5.8631 LearningRate 0.0266 Epoch: 9 Global Step: 161680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:03,793-Speed 9591.34 samples/sec Loss 5.8774 LearningRate 0.0266 Epoch: 9 Global Step: 161690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:04,882-Speed 9410.95 samples/sec Loss 5.8941 LearningRate 0.0266 Epoch: 9 Global Step: 161700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:05,984-Speed 9296.92 samples/sec Loss 5.9351 LearningRate 0.0266 Epoch: 9 Global Step: 161710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:07,083-Speed 9323.93 samples/sec Loss 5.9437 LearningRate 0.0266 Epoch: 9 Global Step: 161720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:08,179-Speed 9351.21 samples/sec Loss 5.9278 LearningRate 0.0266 Epoch: 9 Global Step: 161730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:09,292-Speed 9207.26 samples/sec Loss 5.9146 LearningRate 0.0266 Epoch: 9 Global Step: 161740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:10,404-Speed 9212.07 samples/sec Loss 6.0268 LearningRate 0.0266 Epoch: 9 Global Step: 161750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:11,518-Speed 9194.21 samples/sec Loss 5.9035 LearningRate 0.0266 Epoch: 9 Global Step: 161760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:12,627-Speed 9243.52 samples/sec Loss 5.8779 LearningRate 0.0266 Epoch: 9 Global Step: 161770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:13,737-Speed 9229.61 samples/sec Loss 5.8326 LearningRate 0.0266 Epoch: 9 Global Step: 161780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:14,813-Speed 9522.89 samples/sec Loss 5.8743 LearningRate 0.0266 Epoch: 9 Global Step: 161790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:15,854-Speed 9833.15 samples/sec Loss 5.8610 LearningRate 0.0266 Epoch: 9 Global Step: 161800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:16,924-Speed 9580.44 samples/sec Loss 5.8944 LearningRate 0.0266 Epoch: 9 Global Step: 161810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:17,989-Speed 9620.54 samples/sec Loss 6.0305 LearningRate 0.0265 Epoch: 9 Global Step: 161820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:19,119-Speed 9066.16 samples/sec Loss 5.8668 LearningRate 0.0265 Epoch: 9 Global Step: 161830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:20,185-Speed 9612.26 samples/sec Loss 6.0458 LearningRate 0.0265 Epoch: 9 Global Step: 161840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:21,277-Speed 9398.11 samples/sec Loss 6.0179 LearningRate 0.0265 Epoch: 9 Global Step: 161850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:22,417-Speed 8992.02 samples/sec Loss 5.9173 LearningRate 0.0265 Epoch: 9 Global Step: 161860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:23,529-Speed 9211.81 samples/sec Loss 5.9105 LearningRate 0.0265 Epoch: 9 Global Step: 161870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:24,621-Speed 9383.44 samples/sec Loss 5.9646 LearningRate 0.0265 Epoch: 9 Global Step: 161880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:25,725-Speed 9280.72 samples/sec Loss 5.8873 LearningRate 0.0265 Epoch: 9 Global Step: 161890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:26,825-Speed 9314.54 samples/sec Loss 5.8948 LearningRate 0.0265 Epoch: 9 Global Step: 161900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:27,919-Speed 9366.66 samples/sec Loss 6.0278 LearningRate 0.0265 Epoch: 9 Global Step: 161910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:28,992-Speed 9547.73 samples/sec Loss 5.9108 LearningRate 0.0265 Epoch: 9 Global Step: 161920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:30,090-Speed 9329.21 samples/sec Loss 5.9426 LearningRate 0.0265 Epoch: 9 Global Step: 161930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:31,219-Speed 9074.07 samples/sec Loss 5.9139 LearningRate 0.0265 Epoch: 9 Global Step: 161940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:32,326-Speed 9260.11 samples/sec Loss 5.9332 LearningRate 0.0265 Epoch: 9 Global Step: 161950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:33,396-Speed 9570.35 samples/sec Loss 6.0044 LearningRate 0.0265 Epoch: 9 Global Step: 161960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:34,494-Speed 9329.77 samples/sec Loss 5.8501 LearningRate 0.0265 Epoch: 9 Global Step: 161970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:35,646-Speed 8898.02 samples/sec Loss 5.9721 LearningRate 0.0265 Epoch: 9 Global Step: 161980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:36,739-Speed 9368.94 samples/sec Loss 6.0583 LearningRate 0.0265 Epoch: 9 Global Step: 161990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:03:37,858-Speed 9160.14 samples/sec Loss 5.9602 LearningRate 0.0265 Epoch: 9 Global Step: 162000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:03:59,866-[lfw][162000]XNorm: 9.783905 Training: 2022-04-11 18:03:59,868-[lfw][162000]Accuracy-Flip: 0.99600+-0.00260 Training: 2022-04-11 18:03:59,868-[lfw][162000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:04:25,707-[cfp_fp][162000]XNorm: 8.399252 Training: 2022-04-11 18:04:25,708-[cfp_fp][162000]Accuracy-Flip: 0.96300+-0.00993 Training: 2022-04-11 18:04:25,708-[cfp_fp][162000]Accuracy-Highest: 0.96500 Training: 2022-04-11 18:04:47,912-[agedb_30][162000]XNorm: 9.478708 Training: 2022-04-11 18:04:47,913-[agedb_30][162000]Accuracy-Flip: 0.96317+-0.00845 Training: 2022-04-11 18:04:47,913-[agedb_30][162000]Accuracy-Highest: 0.96783 Training: 2022-04-11 18:04:49,004-Speed 143.93 samples/sec Loss 5.9176 LearningRate 0.0265 Epoch: 9 Global Step: 162010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:04:50,122-Speed 9163.93 samples/sec Loss 5.8604 LearningRate 0.0265 Epoch: 9 Global Step: 162020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:04:51,237-Speed 9191.93 samples/sec Loss 5.9517 LearningRate 0.0265 Epoch: 9 Global Step: 162030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:04:52,317-Speed 9484.87 samples/sec Loss 5.9049 LearningRate 0.0265 Epoch: 9 Global Step: 162040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:04:53,404-Speed 9422.03 samples/sec Loss 5.9357 LearningRate 0.0265 Epoch: 9 Global Step: 162050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:04:54,462-Speed 9683.66 samples/sec Loss 5.8775 LearningRate 0.0265 Epoch: 9 Global Step: 162060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:04:55,517-Speed 9720.09 samples/sec Loss 5.8768 LearningRate 0.0265 Epoch: 9 Global Step: 162070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:04:56,575-Speed 9683.48 samples/sec Loss 6.0787 LearningRate 0.0265 Epoch: 9 Global Step: 162080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:04:57,660-Speed 9442.68 samples/sec Loss 5.9439 LearningRate 0.0265 Epoch: 9 Global Step: 162090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:04:58,741-Speed 9474.03 samples/sec Loss 5.8880 LearningRate 0.0265 Epoch: 9 Global Step: 162100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:04:59,860-Speed 9158.10 samples/sec Loss 5.8809 LearningRate 0.0265 Epoch: 9 Global Step: 162110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:00,901-Speed 9843.74 samples/sec Loss 5.9844 LearningRate 0.0265 Epoch: 9 Global Step: 162120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:01,973-Speed 9555.64 samples/sec Loss 5.8558 LearningRate 0.0265 Epoch: 9 Global Step: 162130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:03,074-Speed 9308.95 samples/sec Loss 5.8502 LearningRate 0.0264 Epoch: 9 Global Step: 162140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:04,150-Speed 9522.48 samples/sec Loss 5.8693 LearningRate 0.0264 Epoch: 9 Global Step: 162150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:05,204-Speed 9724.43 samples/sec Loss 5.8881 LearningRate 0.0264 Epoch: 9 Global Step: 162160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:06,247-Speed 9815.52 samples/sec Loss 5.8690 LearningRate 0.0264 Epoch: 9 Global Step: 162170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:07,309-Speed 9653.55 samples/sec Loss 5.9528 LearningRate 0.0264 Epoch: 9 Global Step: 162180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:08,380-Speed 9579.80 samples/sec Loss 5.9010 LearningRate 0.0264 Epoch: 9 Global Step: 162190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:09,421-Speed 9845.29 samples/sec Loss 5.8829 LearningRate 0.0264 Epoch: 9 Global Step: 162200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:10,503-Speed 9470.85 samples/sec Loss 5.9771 LearningRate 0.0264 Epoch: 9 Global Step: 162210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:11,593-Speed 9397.76 samples/sec Loss 5.9711 LearningRate 0.0264 Epoch: 9 Global Step: 162220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:12,700-Speed 9249.05 samples/sec Loss 5.8918 LearningRate 0.0264 Epoch: 9 Global Step: 162230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:13,774-Speed 9540.99 samples/sec Loss 5.9121 LearningRate 0.0264 Epoch: 9 Global Step: 162240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:14,852-Speed 9506.45 samples/sec Loss 5.9584 LearningRate 0.0264 Epoch: 9 Global Step: 162250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:15,934-Speed 9473.13 samples/sec Loss 5.9639 LearningRate 0.0264 Epoch: 9 Global Step: 162260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:17,005-Speed 9566.81 samples/sec Loss 5.9334 LearningRate 0.0264 Epoch: 9 Global Step: 162270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:18,110-Speed 9271.58 samples/sec Loss 5.8674 LearningRate 0.0264 Epoch: 9 Global Step: 162280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:19,193-Speed 9460.17 samples/sec Loss 5.9621 LearningRate 0.0264 Epoch: 9 Global Step: 162290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:20,276-Speed 9464.51 samples/sec Loss 5.7844 LearningRate 0.0264 Epoch: 9 Global Step: 162300 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:05:21,344-Speed 9586.83 samples/sec Loss 5.8612 LearningRate 0.0264 Epoch: 9 Global Step: 162310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:22,418-Speed 9541.16 samples/sec Loss 5.9553 LearningRate 0.0264 Epoch: 9 Global Step: 162320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:23,591-Speed 8733.63 samples/sec Loss 6.0495 LearningRate 0.0264 Epoch: 9 Global Step: 162330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:24,668-Speed 9518.64 samples/sec Loss 5.8874 LearningRate 0.0264 Epoch: 9 Global Step: 162340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:25,743-Speed 9535.45 samples/sec Loss 5.8848 LearningRate 0.0264 Epoch: 9 Global Step: 162350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:26,817-Speed 9546.14 samples/sec Loss 5.9468 LearningRate 0.0264 Epoch: 9 Global Step: 162360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:27,886-Speed 9582.93 samples/sec Loss 5.9522 LearningRate 0.0264 Epoch: 9 Global Step: 162370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:28,990-Speed 9278.35 samples/sec Loss 5.8791 LearningRate 0.0264 Epoch: 9 Global Step: 162380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:30,059-Speed 9589.57 samples/sec Loss 5.9011 LearningRate 0.0264 Epoch: 9 Global Step: 162390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:31,200-Speed 8976.64 samples/sec Loss 5.8676 LearningRate 0.0264 Epoch: 9 Global Step: 162400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:32,295-Speed 9361.25 samples/sec Loss 5.8696 LearningRate 0.0264 Epoch: 9 Global Step: 162410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:33,371-Speed 9522.02 samples/sec Loss 5.9419 LearningRate 0.0264 Epoch: 9 Global Step: 162420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:34,415-Speed 9814.73 samples/sec Loss 5.9922 LearningRate 0.0264 Epoch: 9 Global Step: 162430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:35,473-Speed 9683.36 samples/sec Loss 5.9360 LearningRate 0.0264 Epoch: 9 Global Step: 162440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:36,553-Speed 9484.42 samples/sec Loss 5.8773 LearningRate 0.0264 Epoch: 9 Global Step: 162450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:37,641-Speed 9420.68 samples/sec Loss 6.0161 LearningRate 0.0264 Epoch: 9 Global Step: 162460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:38,739-Speed 9336.74 samples/sec Loss 5.9564 LearningRate 0.0263 Epoch: 9 Global Step: 162470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:39,824-Speed 9434.55 samples/sec Loss 5.9338 LearningRate 0.0263 Epoch: 9 Global Step: 162480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:40,894-Speed 9580.52 samples/sec Loss 5.9539 LearningRate 0.0263 Epoch: 9 Global Step: 162490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:41,947-Speed 9734.91 samples/sec Loss 5.9718 LearningRate 0.0263 Epoch: 9 Global Step: 162500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:43,024-Speed 9513.19 samples/sec Loss 5.8982 LearningRate 0.0263 Epoch: 9 Global Step: 162510 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:05:44,089-Speed 9616.53 samples/sec Loss 5.9729 LearningRate 0.0263 Epoch: 9 Global Step: 162520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:45,188-Speed 9326.39 samples/sec Loss 5.8871 LearningRate 0.0263 Epoch: 9 Global Step: 162530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:46,301-Speed 9207.73 samples/sec Loss 6.0192 LearningRate 0.0263 Epoch: 9 Global Step: 162540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:47,407-Speed 9259.86 samples/sec Loss 5.9141 LearningRate 0.0263 Epoch: 9 Global Step: 162550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:48,518-Speed 9222.78 samples/sec Loss 5.9258 LearningRate 0.0263 Epoch: 9 Global Step: 162560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:49,590-Speed 9560.66 samples/sec Loss 5.9707 LearningRate 0.0263 Epoch: 9 Global Step: 162570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:05:50,694-Speed 9286.03 samples/sec Loss 5.8586 LearningRate 0.0263 Epoch: 9 Global Step: 162580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:51,792-Speed 9332.69 samples/sec Loss 5.9077 LearningRate 0.0263 Epoch: 9 Global Step: 162590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:52,932-Speed 8983.76 samples/sec Loss 5.8544 LearningRate 0.0263 Epoch: 9 Global Step: 162600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:54,004-Speed 9554.15 samples/sec Loss 5.8187 LearningRate 0.0263 Epoch: 9 Global Step: 162610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:55,073-Speed 9584.58 samples/sec Loss 5.9218 LearningRate 0.0263 Epoch: 9 Global Step: 162620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:56,162-Speed 9413.85 samples/sec Loss 5.8967 LearningRate 0.0263 Epoch: 9 Global Step: 162630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:57,240-Speed 9508.33 samples/sec Loss 5.9512 LearningRate 0.0263 Epoch: 9 Global Step: 162640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:58,364-Speed 9113.73 samples/sec Loss 5.9546 LearningRate 0.0263 Epoch: 9 Global Step: 162650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:05:59,423-Speed 9673.54 samples/sec Loss 6.0164 LearningRate 0.0263 Epoch: 9 Global Step: 162660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:00,509-Speed 9434.72 samples/sec Loss 5.9393 LearningRate 0.0263 Epoch: 9 Global Step: 162670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:01,611-Speed 9299.35 samples/sec Loss 6.0245 LearningRate 0.0263 Epoch: 9 Global Step: 162680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:02,714-Speed 9285.92 samples/sec Loss 6.0291 LearningRate 0.0263 Epoch: 9 Global Step: 162690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:03,796-Speed 9469.15 samples/sec Loss 5.8820 LearningRate 0.0263 Epoch: 9 Global Step: 162700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:04,868-Speed 9561.14 samples/sec Loss 5.8992 LearningRate 0.0263 Epoch: 9 Global Step: 162710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:05,974-Speed 9263.25 samples/sec Loss 6.0163 LearningRate 0.0263 Epoch: 9 Global Step: 162720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:07,106-Speed 9050.93 samples/sec Loss 6.0188 LearningRate 0.0263 Epoch: 9 Global Step: 162730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:08,187-Speed 9481.88 samples/sec Loss 5.9890 LearningRate 0.0263 Epoch: 9 Global Step: 162740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:09,281-Speed 9371.40 samples/sec Loss 5.8147 LearningRate 0.0263 Epoch: 9 Global Step: 162750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:10,346-Speed 9617.95 samples/sec Loss 5.7538 LearningRate 0.0263 Epoch: 9 Global Step: 162760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:11,419-Speed 9545.72 samples/sec Loss 5.8899 LearningRate 0.0263 Epoch: 9 Global Step: 162770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:12,514-Speed 9354.42 samples/sec Loss 5.8498 LearningRate 0.0263 Epoch: 9 Global Step: 162780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:13,613-Speed 9326.99 samples/sec Loss 5.8500 LearningRate 0.0262 Epoch: 9 Global Step: 162790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:14,708-Speed 9354.58 samples/sec Loss 5.8993 LearningRate 0.0262 Epoch: 9 Global Step: 162800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:06:15,733-Speed 9994.26 samples/sec Loss 5.9130 LearningRate 0.0262 Epoch: 9 Global Step: 162810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:16,815-Speed 9472.07 samples/sec Loss 5.9504 LearningRate 0.0262 Epoch: 9 Global Step: 162820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:17,933-Speed 9163.11 samples/sec Loss 5.9597 LearningRate 0.0262 Epoch: 9 Global Step: 162830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:18,993-Speed 9671.15 samples/sec Loss 6.0280 LearningRate 0.0262 Epoch: 9 Global Step: 162840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:20,080-Speed 9424.36 samples/sec Loss 5.9737 LearningRate 0.0262 Epoch: 9 Global Step: 162850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:21,158-Speed 9507.30 samples/sec Loss 5.9341 LearningRate 0.0262 Epoch: 9 Global Step: 162860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:22,241-Speed 9461.81 samples/sec Loss 5.8930 LearningRate 0.0262 Epoch: 9 Global Step: 162870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:23,310-Speed 9590.56 samples/sec Loss 6.0423 LearningRate 0.0262 Epoch: 9 Global Step: 162880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:24,421-Speed 9218.96 samples/sec Loss 5.9698 LearningRate 0.0262 Epoch: 9 Global Step: 162890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:25,548-Speed 9096.53 samples/sec Loss 5.8867 LearningRate 0.0262 Epoch: 9 Global Step: 162900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:26,631-Speed 9455.33 samples/sec Loss 5.8507 LearningRate 0.0262 Epoch: 9 Global Step: 162910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:06:27,694-Speed 9637.59 samples/sec Loss 6.0001 LearningRate 0.0262 Epoch: 9 Global Step: 162920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:28,735-Speed 9845.85 samples/sec Loss 5.8958 LearningRate 0.0262 Epoch: 9 Global Step: 162930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:29,811-Speed 9524.81 samples/sec Loss 5.9637 LearningRate 0.0262 Epoch: 9 Global Step: 162940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:30,875-Speed 9624.53 samples/sec Loss 5.8754 LearningRate 0.0262 Epoch: 9 Global Step: 162950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:31,969-Speed 9372.10 samples/sec Loss 5.9206 LearningRate 0.0262 Epoch: 9 Global Step: 162960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:33,063-Speed 9359.82 samples/sec Loss 5.8911 LearningRate 0.0262 Epoch: 9 Global Step: 162970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:34,197-Speed 9038.49 samples/sec Loss 5.9324 LearningRate 0.0262 Epoch: 9 Global Step: 162980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:35,301-Speed 9287.91 samples/sec Loss 5.9319 LearningRate 0.0262 Epoch: 9 Global Step: 162990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:36,388-Speed 9419.34 samples/sec Loss 5.8881 LearningRate 0.0262 Epoch: 9 Global Step: 163000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:37,477-Speed 9409.67 samples/sec Loss 6.0355 LearningRate 0.0262 Epoch: 9 Global Step: 163010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:38,525-Speed 9780.46 samples/sec Loss 5.8898 LearningRate 0.0262 Epoch: 9 Global Step: 163020 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:06:39,611-Speed 9440.18 samples/sec Loss 5.8553 LearningRate 0.0262 Epoch: 9 Global Step: 163030 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:06:40,719-Speed 9246.91 samples/sec Loss 5.8850 LearningRate 0.0262 Epoch: 9 Global Step: 163040 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:06:41,770-Speed 9743.02 samples/sec Loss 5.9459 LearningRate 0.0262 Epoch: 9 Global Step: 163050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:42,837-Speed 9601.12 samples/sec Loss 5.9146 LearningRate 0.0262 Epoch: 9 Global Step: 163060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:43,913-Speed 9526.51 samples/sec Loss 5.8856 LearningRate 0.0262 Epoch: 9 Global Step: 163070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:44,976-Speed 9636.69 samples/sec Loss 5.9233 LearningRate 0.0262 Epoch: 9 Global Step: 163080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:46,069-Speed 9372.30 samples/sec Loss 5.9132 LearningRate 0.0262 Epoch: 9 Global Step: 163090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:47,146-Speed 9511.91 samples/sec Loss 5.9218 LearningRate 0.0262 Epoch: 9 Global Step: 163100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:48,245-Speed 9320.95 samples/sec Loss 5.8748 LearningRate 0.0262 Epoch: 9 Global Step: 163110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:49,304-Speed 9683.47 samples/sec Loss 5.9837 LearningRate 0.0261 Epoch: 9 Global Step: 163120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:50,364-Speed 9670.95 samples/sec Loss 5.9352 LearningRate 0.0261 Epoch: 9 Global Step: 163130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:51,469-Speed 9274.66 samples/sec Loss 6.0269 LearningRate 0.0261 Epoch: 9 Global Step: 163140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:52,559-Speed 9392.84 samples/sec Loss 5.9287 LearningRate 0.0261 Epoch: 9 Global Step: 163150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:53,629-Speed 9579.91 samples/sec Loss 5.8570 LearningRate 0.0261 Epoch: 9 Global Step: 163160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:54,706-Speed 9515.49 samples/sec Loss 5.9203 LearningRate 0.0261 Epoch: 9 Global Step: 163170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:55,796-Speed 9405.76 samples/sec Loss 5.8926 LearningRate 0.0261 Epoch: 9 Global Step: 163180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:56,889-Speed 9370.60 samples/sec Loss 5.9010 LearningRate 0.0261 Epoch: 9 Global Step: 163190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:57,971-Speed 9474.19 samples/sec Loss 5.9776 LearningRate 0.0261 Epoch: 9 Global Step: 163200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:06:59,070-Speed 9322.79 samples/sec Loss 5.9564 LearningRate 0.0261 Epoch: 9 Global Step: 163210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:00,175-Speed 9264.51 samples/sec Loss 6.0440 LearningRate 0.0261 Epoch: 9 Global Step: 163220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:01,306-Speed 9067.65 samples/sec Loss 5.9447 LearningRate 0.0261 Epoch: 9 Global Step: 163230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:02,398-Speed 9375.55 samples/sec Loss 5.9015 LearningRate 0.0261 Epoch: 9 Global Step: 163240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:03,482-Speed 9456.99 samples/sec Loss 5.9052 LearningRate 0.0261 Epoch: 9 Global Step: 163250 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:07:04,565-Speed 9458.09 samples/sec Loss 5.8515 LearningRate 0.0261 Epoch: 9 Global Step: 163260 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:07:05,648-Speed 9460.27 samples/sec Loss 5.8582 LearningRate 0.0261 Epoch: 9 Global Step: 163270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:06,744-Speed 9351.50 samples/sec Loss 5.9623 LearningRate 0.0261 Epoch: 9 Global Step: 163280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:07,810-Speed 9618.07 samples/sec Loss 5.9207 LearningRate 0.0261 Epoch: 9 Global Step: 163290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:08,871-Speed 9652.12 samples/sec Loss 6.0042 LearningRate 0.0261 Epoch: 9 Global Step: 163300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:09,965-Speed 9364.50 samples/sec Loss 5.9665 LearningRate 0.0261 Epoch: 9 Global Step: 163310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:11,096-Speed 9058.71 samples/sec Loss 5.9465 LearningRate 0.0261 Epoch: 9 Global Step: 163320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:12,177-Speed 9478.09 samples/sec Loss 5.8812 LearningRate 0.0261 Epoch: 9 Global Step: 163330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:13,225-Speed 9785.87 samples/sec Loss 5.9646 LearningRate 0.0261 Epoch: 9 Global Step: 163340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:14,342-Speed 9168.13 samples/sec Loss 5.9719 LearningRate 0.0261 Epoch: 9 Global Step: 163350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:15,427-Speed 9446.33 samples/sec Loss 5.8737 LearningRate 0.0261 Epoch: 9 Global Step: 163360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:16,547-Speed 9144.71 samples/sec Loss 5.9143 LearningRate 0.0261 Epoch: 9 Global Step: 163370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:07:17,590-Speed 9823.46 samples/sec Loss 5.8650 LearningRate 0.0261 Epoch: 9 Global Step: 163380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:18,738-Speed 8928.71 samples/sec Loss 5.9015 LearningRate 0.0261 Epoch: 9 Global Step: 163390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:19,846-Speed 9241.08 samples/sec Loss 5.9610 LearningRate 0.0261 Epoch: 9 Global Step: 163400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:20,923-Speed 9517.09 samples/sec Loss 5.9090 LearningRate 0.0261 Epoch: 9 Global Step: 163410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:22,003-Speed 9483.35 samples/sec Loss 5.8501 LearningRate 0.0261 Epoch: 9 Global Step: 163420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:23,109-Speed 9268.44 samples/sec Loss 5.8965 LearningRate 0.0261 Epoch: 9 Global Step: 163430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:24,197-Speed 9421.00 samples/sec Loss 5.9201 LearningRate 0.0261 Epoch: 9 Global Step: 163440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:25,248-Speed 9748.93 samples/sec Loss 5.8711 LearningRate 0.0260 Epoch: 9 Global Step: 163450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:26,330-Speed 9467.54 samples/sec Loss 5.9575 LearningRate 0.0260 Epoch: 9 Global Step: 163460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:27,427-Speed 9340.93 samples/sec Loss 5.7976 LearningRate 0.0260 Epoch: 9 Global Step: 163470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:28,511-Speed 9444.47 samples/sec Loss 5.9160 LearningRate 0.0260 Epoch: 9 Global Step: 163480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:29,606-Speed 9366.42 samples/sec Loss 5.9756 LearningRate 0.0260 Epoch: 9 Global Step: 163490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:30,652-Speed 9798.01 samples/sec Loss 5.8944 LearningRate 0.0260 Epoch: 9 Global Step: 163500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:31,783-Speed 9062.15 samples/sec Loss 5.9275 LearningRate 0.0260 Epoch: 9 Global Step: 163510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:32,864-Speed 9476.23 samples/sec Loss 5.9547 LearningRate 0.0260 Epoch: 9 Global Step: 163520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:33,947-Speed 9457.57 samples/sec Loss 5.9631 LearningRate 0.0260 Epoch: 9 Global Step: 163530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:35,045-Speed 9333.58 samples/sec Loss 5.9464 LearningRate 0.0260 Epoch: 9 Global Step: 163540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:36,108-Speed 9643.08 samples/sec Loss 5.9746 LearningRate 0.0260 Epoch: 9 Global Step: 163550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:37,239-Speed 9056.55 samples/sec Loss 5.9509 LearningRate 0.0260 Epoch: 9 Global Step: 163560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:38,317-Speed 9510.88 samples/sec Loss 5.8642 LearningRate 0.0260 Epoch: 9 Global Step: 163570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:07:39,446-Speed 9073.07 samples/sec Loss 5.7726 LearningRate 0.0260 Epoch: 9 Global Step: 163580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:07:40,530-Speed 9452.91 samples/sec Loss 5.8908 LearningRate 0.0260 Epoch: 9 Global Step: 163590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:07:41,601-Speed 9562.41 samples/sec Loss 5.9214 LearningRate 0.0260 Epoch: 9 Global Step: 163600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:07:42,745-Speed 8956.08 samples/sec Loss 5.9346 LearningRate 0.0260 Epoch: 9 Global Step: 163610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:07:43,817-Speed 9566.42 samples/sec Loss 5.9601 LearningRate 0.0260 Epoch: 9 Global Step: 163620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:07:44,909-Speed 9378.77 samples/sec Loss 5.8462 LearningRate 0.0260 Epoch: 9 Global Step: 163630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:07:45,987-Speed 9510.72 samples/sec Loss 5.8044 LearningRate 0.0260 Epoch: 9 Global Step: 163640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:07:47,103-Speed 9178.23 samples/sec Loss 5.8167 LearningRate 0.0260 Epoch: 9 Global Step: 163650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:07:48,227-Speed 9119.52 samples/sec Loss 5.9216 LearningRate 0.0260 Epoch: 9 Global Step: 163660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:07:49,339-Speed 9212.92 samples/sec Loss 6.0230 LearningRate 0.0260 Epoch: 9 Global Step: 163670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:50,448-Speed 9237.38 samples/sec Loss 5.9021 LearningRate 0.0260 Epoch: 9 Global Step: 163680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:51,539-Speed 9390.30 samples/sec Loss 5.8825 LearningRate 0.0260 Epoch: 9 Global Step: 163690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:52,636-Speed 9338.80 samples/sec Loss 5.9489 LearningRate 0.0260 Epoch: 9 Global Step: 163700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:53,717-Speed 9481.52 samples/sec Loss 5.8688 LearningRate 0.0260 Epoch: 9 Global Step: 163710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:54,801-Speed 9450.66 samples/sec Loss 5.7623 LearningRate 0.0260 Epoch: 9 Global Step: 163720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:55,896-Speed 9363.67 samples/sec Loss 5.9372 LearningRate 0.0260 Epoch: 9 Global Step: 163730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:57,032-Speed 9019.14 samples/sec Loss 5.9479 LearningRate 0.0260 Epoch: 9 Global Step: 163740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:58,103-Speed 9558.26 samples/sec Loss 5.8438 LearningRate 0.0260 Epoch: 9 Global Step: 163750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:07:59,227-Speed 9118.43 samples/sec Loss 5.9012 LearningRate 0.0260 Epoch: 9 Global Step: 163760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:00,355-Speed 9086.06 samples/sec Loss 5.8360 LearningRate 0.0259 Epoch: 9 Global Step: 163770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:01,475-Speed 9151.41 samples/sec Loss 5.8203 LearningRate 0.0259 Epoch: 9 Global Step: 163780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:02,578-Speed 9290.14 samples/sec Loss 5.9140 LearningRate 0.0259 Epoch: 9 Global Step: 163790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:03,642-Speed 9634.92 samples/sec Loss 5.9521 LearningRate 0.0259 Epoch: 9 Global Step: 163800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:04,746-Speed 9276.33 samples/sec Loss 5.8654 LearningRate 0.0259 Epoch: 9 Global Step: 163810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:05,850-Speed 9281.27 samples/sec Loss 5.8787 LearningRate 0.0259 Epoch: 9 Global Step: 163820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:06,932-Speed 9465.20 samples/sec Loss 6.0258 LearningRate 0.0259 Epoch: 9 Global Step: 163830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:08,026-Speed 9368.97 samples/sec Loss 5.9080 LearningRate 0.0259 Epoch: 9 Global Step: 163840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:09,083-Speed 9695.94 samples/sec Loss 5.8091 LearningRate 0.0259 Epoch: 9 Global Step: 163850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:10,122-Speed 9858.37 samples/sec Loss 5.9609 LearningRate 0.0259 Epoch: 9 Global Step: 163860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:11,183-Speed 9655.61 samples/sec Loss 5.9418 LearningRate 0.0259 Epoch: 9 Global Step: 163870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:12,238-Speed 9708.98 samples/sec Loss 5.9772 LearningRate 0.0259 Epoch: 9 Global Step: 163880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:13,302-Speed 9645.81 samples/sec Loss 5.8880 LearningRate 0.0259 Epoch: 9 Global Step: 163890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:08:14,393-Speed 9395.03 samples/sec Loss 5.8692 LearningRate 0.0259 Epoch: 9 Global Step: 163900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:08:15,482-Speed 9404.47 samples/sec Loss 5.8398 LearningRate 0.0259 Epoch: 9 Global Step: 163910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:08:16,598-Speed 9186.54 samples/sec Loss 5.9693 LearningRate 0.0259 Epoch: 9 Global Step: 163920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:08:17,699-Speed 9299.80 samples/sec Loss 5.9155 LearningRate 0.0259 Epoch: 9 Global Step: 163930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:08:18,830-Speed 9060.61 samples/sec Loss 5.9472 LearningRate 0.0259 Epoch: 9 Global Step: 163940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:08:19,922-Speed 9380.30 samples/sec Loss 5.8606 LearningRate 0.0259 Epoch: 9 Global Step: 163950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:08:21,042-Speed 9152.74 samples/sec Loss 5.9330 LearningRate 0.0259 Epoch: 9 Global Step: 163960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:08:22,100-Speed 9687.81 samples/sec Loss 5.9384 LearningRate 0.0259 Epoch: 9 Global Step: 163970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:08:23,180-Speed 9481.06 samples/sec Loss 5.9945 LearningRate 0.0259 Epoch: 9 Global Step: 163980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:08:24,303-Speed 9132.13 samples/sec Loss 5.9950 LearningRate 0.0259 Epoch: 9 Global Step: 163990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:25,405-Speed 9297.39 samples/sec Loss 5.9197 LearningRate 0.0259 Epoch: 9 Global Step: 164000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:08:47,286-[lfw][164000]XNorm: 9.706120 Training: 2022-04-11 18:08:47,287-[lfw][164000]Accuracy-Flip: 0.99633+-0.00245 Training: 2022-04-11 18:08:47,288-[lfw][164000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:09:12,527-[cfp_fp][164000]XNorm: 8.224196 Training: 2022-04-11 18:09:12,528-[cfp_fp][164000]Accuracy-Flip: 0.96329+-0.00915 Training: 2022-04-11 18:09:12,528-[cfp_fp][164000]Accuracy-Highest: 0.96500 Training: 2022-04-11 18:09:34,292-[agedb_30][164000]XNorm: 9.287480 Training: 2022-04-11 18:09:34,293-[agedb_30][164000]Accuracy-Flip: 0.96683+-0.00867 Training: 2022-04-11 18:09:34,293-[agedb_30][164000]Accuracy-Highest: 0.96783 Training: 2022-04-11 18:09:35,378-Speed 146.34 samples/sec Loss 6.0574 LearningRate 0.0259 Epoch: 9 Global Step: 164010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:36,425-Speed 9783.55 samples/sec Loss 5.9910 LearningRate 0.0259 Epoch: 9 Global Step: 164020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:37,488-Speed 9640.83 samples/sec Loss 5.9975 LearningRate 0.0259 Epoch: 9 Global Step: 164030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:38,579-Speed 9389.67 samples/sec Loss 5.8989 LearningRate 0.0259 Epoch: 9 Global Step: 164040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:39,677-Speed 9329.44 samples/sec Loss 5.8852 LearningRate 0.0259 Epoch: 9 Global Step: 164050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:40,737-Speed 9670.97 samples/sec Loss 5.9226 LearningRate 0.0259 Epoch: 9 Global Step: 164060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:41,804-Speed 9600.50 samples/sec Loss 5.9054 LearningRate 0.0259 Epoch: 9 Global Step: 164070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:42,936-Speed 9052.53 samples/sec Loss 5.7961 LearningRate 0.0259 Epoch: 9 Global Step: 164080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:44,015-Speed 9499.03 samples/sec Loss 5.9436 LearningRate 0.0259 Epoch: 9 Global Step: 164090 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:09:45,065-Speed 9758.46 samples/sec Loss 5.8503 LearningRate 0.0258 Epoch: 9 Global Step: 164100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:46,139-Speed 9539.47 samples/sec Loss 5.8996 LearningRate 0.0258 Epoch: 9 Global Step: 164110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:47,239-Speed 9311.92 samples/sec Loss 5.9592 LearningRate 0.0258 Epoch: 9 Global Step: 164120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:48,343-Speed 9283.64 samples/sec Loss 5.8228 LearningRate 0.0258 Epoch: 9 Global Step: 164130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:49,425-Speed 9464.45 samples/sec Loss 5.7555 LearningRate 0.0258 Epoch: 9 Global Step: 164140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:50,496-Speed 9567.99 samples/sec Loss 5.9029 LearningRate 0.0258 Epoch: 9 Global Step: 164150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:51,594-Speed 9335.63 samples/sec Loss 5.9848 LearningRate 0.0258 Epoch: 9 Global Step: 164160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:52,721-Speed 9084.09 samples/sec Loss 6.0556 LearningRate 0.0258 Epoch: 9 Global Step: 164170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:53,808-Speed 9425.76 samples/sec Loss 5.7831 LearningRate 0.0258 Epoch: 9 Global Step: 164180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:54,883-Speed 9536.16 samples/sec Loss 6.0165 LearningRate 0.0258 Epoch: 9 Global Step: 164190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:56,006-Speed 9122.28 samples/sec Loss 5.9326 LearningRate 0.0258 Epoch: 9 Global Step: 164200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:09:57,126-Speed 9153.07 samples/sec Loss 6.0523 LearningRate 0.0258 Epoch: 9 Global Step: 164210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:58,238-Speed 9209.49 samples/sec Loss 5.8954 LearningRate 0.0258 Epoch: 9 Global Step: 164220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:09:59,342-Speed 9280.51 samples/sec Loss 5.8282 LearningRate 0.0258 Epoch: 9 Global Step: 164230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:00,426-Speed 9454.68 samples/sec Loss 5.8973 LearningRate 0.0258 Epoch: 9 Global Step: 164240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:01,513-Speed 9423.23 samples/sec Loss 5.8881 LearningRate 0.0258 Epoch: 9 Global Step: 164250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:02,673-Speed 8834.64 samples/sec Loss 5.9333 LearningRate 0.0258 Epoch: 9 Global Step: 164260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:03,804-Speed 9058.65 samples/sec Loss 5.9191 LearningRate 0.0258 Epoch: 9 Global Step: 164270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:04,909-Speed 9271.21 samples/sec Loss 5.8133 LearningRate 0.0258 Epoch: 9 Global Step: 164280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:06,069-Speed 8833.11 samples/sec Loss 5.9556 LearningRate 0.0258 Epoch: 9 Global Step: 164290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:07,154-Speed 9436.11 samples/sec Loss 5.9118 LearningRate 0.0258 Epoch: 9 Global Step: 164300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:08,255-Speed 9309.71 samples/sec Loss 5.9785 LearningRate 0.0258 Epoch: 9 Global Step: 164310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:10:09,368-Speed 9215.04 samples/sec Loss 5.8567 LearningRate 0.0258 Epoch: 9 Global Step: 164320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:10:10,443-Speed 9527.50 samples/sec Loss 5.9648 LearningRate 0.0258 Epoch: 9 Global Step: 164330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:10:11,519-Speed 9525.28 samples/sec Loss 5.9445 LearningRate 0.0258 Epoch: 9 Global Step: 164340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:12,555-Speed 9884.25 samples/sec Loss 5.9460 LearningRate 0.0258 Epoch: 9 Global Step: 164350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:13,652-Speed 9341.51 samples/sec Loss 5.9566 LearningRate 0.0258 Epoch: 9 Global Step: 164360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:14,782-Speed 9064.25 samples/sec Loss 5.8587 LearningRate 0.0258 Epoch: 9 Global Step: 164370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:15,863-Speed 9486.46 samples/sec Loss 5.9266 LearningRate 0.0258 Epoch: 9 Global Step: 164380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:16,955-Speed 9378.49 samples/sec Loss 5.8495 LearningRate 0.0258 Epoch: 9 Global Step: 164390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:18,068-Speed 9209.74 samples/sec Loss 6.0284 LearningRate 0.0258 Epoch: 9 Global Step: 164400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:19,109-Speed 9834.63 samples/sec Loss 6.0356 LearningRate 0.0258 Epoch: 9 Global Step: 164410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:20,224-Speed 9190.52 samples/sec Loss 5.8939 LearningRate 0.0258 Epoch: 9 Global Step: 164420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:21,274-Speed 9759.26 samples/sec Loss 5.8194 LearningRate 0.0257 Epoch: 9 Global Step: 164430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:22,402-Speed 9080.20 samples/sec Loss 5.9434 LearningRate 0.0257 Epoch: 9 Global Step: 164440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:10:23,461-Speed 9676.48 samples/sec Loss 5.9892 LearningRate 0.0257 Epoch: 9 Global Step: 164450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:24,549-Speed 9420.24 samples/sec Loss 5.8571 LearningRate 0.0257 Epoch: 9 Global Step: 164460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:25,665-Speed 9178.09 samples/sec Loss 5.9742 LearningRate 0.0257 Epoch: 9 Global Step: 164470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:26,761-Speed 9351.02 samples/sec Loss 5.8930 LearningRate 0.0257 Epoch: 9 Global Step: 164480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:27,875-Speed 9199.58 samples/sec Loss 5.8738 LearningRate 0.0257 Epoch: 9 Global Step: 164490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:28,959-Speed 9450.17 samples/sec Loss 5.9547 LearningRate 0.0257 Epoch: 9 Global Step: 164500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:30,047-Speed 9423.27 samples/sec Loss 5.8944 LearningRate 0.0257 Epoch: 9 Global Step: 164510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:31,114-Speed 9603.41 samples/sec Loss 5.8555 LearningRate 0.0257 Epoch: 9 Global Step: 164520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:32,186-Speed 9549.52 samples/sec Loss 5.9001 LearningRate 0.0257 Epoch: 9 Global Step: 164530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:33,296-Speed 9232.17 samples/sec Loss 5.9194 LearningRate 0.0257 Epoch: 9 Global Step: 164540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:34,413-Speed 9173.30 samples/sec Loss 5.9073 LearningRate 0.0257 Epoch: 9 Global Step: 164550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:35,542-Speed 9072.47 samples/sec Loss 5.9442 LearningRate 0.0257 Epoch: 9 Global Step: 164560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:36,645-Speed 9287.57 samples/sec Loss 5.7810 LearningRate 0.0257 Epoch: 9 Global Step: 164570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:37,758-Speed 9208.60 samples/sec Loss 6.0063 LearningRate 0.0257 Epoch: 9 Global Step: 164580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:38,889-Speed 9065.51 samples/sec Loss 5.8971 LearningRate 0.0257 Epoch: 9 Global Step: 164590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:39,953-Speed 9627.78 samples/sec Loss 5.9557 LearningRate 0.0257 Epoch: 9 Global Step: 164600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:41,071-Speed 9163.53 samples/sec Loss 5.8787 LearningRate 0.0257 Epoch: 9 Global Step: 164610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:42,152-Speed 9481.53 samples/sec Loss 5.8447 LearningRate 0.0257 Epoch: 9 Global Step: 164620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:43,277-Speed 9105.34 samples/sec Loss 5.7997 LearningRate 0.0257 Epoch: 9 Global Step: 164630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:44,333-Speed 9704.20 samples/sec Loss 5.9896 LearningRate 0.0257 Epoch: 9 Global Step: 164640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:45,415-Speed 9470.18 samples/sec Loss 5.8611 LearningRate 0.0257 Epoch: 9 Global Step: 164650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:10:46,474-Speed 9675.99 samples/sec Loss 5.8943 LearningRate 0.0257 Epoch: 9 Global Step: 164660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:47,561-Speed 9426.70 samples/sec Loss 6.0483 LearningRate 0.0257 Epoch: 9 Global Step: 164670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:48,626-Speed 9621.13 samples/sec Loss 5.9170 LearningRate 0.0257 Epoch: 9 Global Step: 164680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:49,726-Speed 9317.74 samples/sec Loss 5.8932 LearningRate 0.0257 Epoch: 9 Global Step: 164690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:50,821-Speed 9360.80 samples/sec Loss 5.8988 LearningRate 0.0257 Epoch: 9 Global Step: 164700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:51,925-Speed 9278.37 samples/sec Loss 5.9341 LearningRate 0.0257 Epoch: 9 Global Step: 164710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:52,968-Speed 9825.83 samples/sec Loss 5.8776 LearningRate 0.0257 Epoch: 9 Global Step: 164720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:54,051-Speed 9459.57 samples/sec Loss 5.8715 LearningRate 0.0257 Epoch: 9 Global Step: 164730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:55,087-Speed 9884.63 samples/sec Loss 5.8421 LearningRate 0.0257 Epoch: 9 Global Step: 164740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:56,159-Speed 9559.20 samples/sec Loss 5.9389 LearningRate 0.0257 Epoch: 9 Global Step: 164750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:57,272-Speed 9211.45 samples/sec Loss 5.9079 LearningRate 0.0256 Epoch: 9 Global Step: 164760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:10:58,358-Speed 9437.25 samples/sec Loss 5.9814 LearningRate 0.0256 Epoch: 9 Global Step: 164770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:10:59,443-Speed 9437.29 samples/sec Loss 5.9274 LearningRate 0.0256 Epoch: 9 Global Step: 164780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:00,558-Speed 9189.14 samples/sec Loss 5.8610 LearningRate 0.0256 Epoch: 9 Global Step: 164790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:01,635-Speed 9516.31 samples/sec Loss 5.9884 LearningRate 0.0256 Epoch: 9 Global Step: 164800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:02,714-Speed 9498.05 samples/sec Loss 5.8615 LearningRate 0.0256 Epoch: 9 Global Step: 164810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:03,782-Speed 9593.14 samples/sec Loss 5.9325 LearningRate 0.0256 Epoch: 9 Global Step: 164820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:04,853-Speed 9563.13 samples/sec Loss 5.8847 LearningRate 0.0256 Epoch: 9 Global Step: 164830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:05,931-Speed 9509.74 samples/sec Loss 5.8787 LearningRate 0.0256 Epoch: 9 Global Step: 164840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:06,990-Speed 9667.94 samples/sec Loss 5.8908 LearningRate 0.0256 Epoch: 9 Global Step: 164850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:08,068-Speed 9511.84 samples/sec Loss 5.8910 LearningRate 0.0256 Epoch: 9 Global Step: 164860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:09,125-Speed 9692.21 samples/sec Loss 5.9251 LearningRate 0.0256 Epoch: 9 Global Step: 164870 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:11:10,229-Speed 9284.52 samples/sec Loss 5.8429 LearningRate 0.0256 Epoch: 9 Global Step: 164880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:11,323-Speed 9360.42 samples/sec Loss 5.7410 LearningRate 0.0256 Epoch: 9 Global Step: 164890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:12,434-Speed 9221.27 samples/sec Loss 5.9411 LearningRate 0.0256 Epoch: 9 Global Step: 164900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:13,512-Speed 9506.28 samples/sec Loss 5.8348 LearningRate 0.0256 Epoch: 9 Global Step: 164910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:14,621-Speed 9236.54 samples/sec Loss 5.8630 LearningRate 0.0256 Epoch: 9 Global Step: 164920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:15,730-Speed 9240.14 samples/sec Loss 5.8116 LearningRate 0.0256 Epoch: 9 Global Step: 164930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:16,816-Speed 9436.56 samples/sec Loss 5.9330 LearningRate 0.0256 Epoch: 9 Global Step: 164940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:17,875-Speed 9675.38 samples/sec Loss 5.9433 LearningRate 0.0256 Epoch: 9 Global Step: 164950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:18,988-Speed 9206.76 samples/sec Loss 5.8092 LearningRate 0.0256 Epoch: 9 Global Step: 164960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:20,023-Speed 9898.61 samples/sec Loss 5.8061 LearningRate 0.0256 Epoch: 9 Global Step: 164970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:21,064-Speed 9841.85 samples/sec Loss 5.9395 LearningRate 0.0256 Epoch: 9 Global Step: 164980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:11:22,190-Speed 9096.55 samples/sec Loss 5.8621 LearningRate 0.0256 Epoch: 9 Global Step: 164990 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:11:23,281-Speed 9392.94 samples/sec Loss 5.9264 LearningRate 0.0256 Epoch: 9 Global Step: 165000 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:11:24,319-Speed 9870.67 samples/sec Loss 5.9146 LearningRate 0.0256 Epoch: 9 Global Step: 165010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:25,387-Speed 9599.83 samples/sec Loss 5.8605 LearningRate 0.0256 Epoch: 9 Global Step: 165020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:26,488-Speed 9307.09 samples/sec Loss 5.8688 LearningRate 0.0256 Epoch: 9 Global Step: 165030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:27,604-Speed 9178.37 samples/sec Loss 6.0102 LearningRate 0.0256 Epoch: 9 Global Step: 165040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:28,706-Speed 9298.07 samples/sec Loss 5.8560 LearningRate 0.0256 Epoch: 9 Global Step: 165050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:29,782-Speed 9521.89 samples/sec Loss 5.9698 LearningRate 0.0256 Epoch: 9 Global Step: 165060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:30,864-Speed 9475.13 samples/sec Loss 5.9270 LearningRate 0.0256 Epoch: 9 Global Step: 165070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:31,978-Speed 9197.39 samples/sec Loss 5.9127 LearningRate 0.0256 Epoch: 9 Global Step: 165080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:33,085-Speed 9248.29 samples/sec Loss 5.9316 LearningRate 0.0255 Epoch: 9 Global Step: 165090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:34,208-Speed 9128.35 samples/sec Loss 6.0013 LearningRate 0.0255 Epoch: 9 Global Step: 165100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:35,307-Speed 9321.28 samples/sec Loss 5.8655 LearningRate 0.0255 Epoch: 9 Global Step: 165110 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:11:36,431-Speed 9115.62 samples/sec Loss 5.8646 LearningRate 0.0255 Epoch: 9 Global Step: 165120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:37,543-Speed 9216.11 samples/sec Loss 5.9191 LearningRate 0.0255 Epoch: 9 Global Step: 165130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:38,663-Speed 9151.60 samples/sec Loss 5.8667 LearningRate 0.0255 Epoch: 9 Global Step: 165140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:39,753-Speed 9397.03 samples/sec Loss 5.8606 LearningRate 0.0255 Epoch: 9 Global Step: 165150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:40,843-Speed 9401.84 samples/sec Loss 5.8218 LearningRate 0.0255 Epoch: 9 Global Step: 165160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:41,918-Speed 9533.68 samples/sec Loss 5.9373 LearningRate 0.0255 Epoch: 9 Global Step: 165170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:43,022-Speed 9275.08 samples/sec Loss 5.9657 LearningRate 0.0255 Epoch: 9 Global Step: 165180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:44,091-Speed 9590.07 samples/sec Loss 5.8549 LearningRate 0.0255 Epoch: 9 Global Step: 165190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:45,183-Speed 9384.06 samples/sec Loss 5.8981 LearningRate 0.0255 Epoch: 9 Global Step: 165200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:46,242-Speed 9673.59 samples/sec Loss 5.9040 LearningRate 0.0255 Epoch: 9 Global Step: 165210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:47,348-Speed 9262.94 samples/sec Loss 5.9662 LearningRate 0.0255 Epoch: 9 Global Step: 165220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:11:48,430-Speed 9471.44 samples/sec Loss 5.8930 LearningRate 0.0255 Epoch: 9 Global Step: 165230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:49,478-Speed 9775.08 samples/sec Loss 5.8879 LearningRate 0.0255 Epoch: 9 Global Step: 165240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:50,572-Speed 9366.63 samples/sec Loss 5.9527 LearningRate 0.0255 Epoch: 9 Global Step: 165250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:51,661-Speed 9405.49 samples/sec Loss 5.8842 LearningRate 0.0255 Epoch: 9 Global Step: 165260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:52,782-Speed 9143.57 samples/sec Loss 5.8845 LearningRate 0.0255 Epoch: 9 Global Step: 165270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:53,922-Speed 8989.56 samples/sec Loss 5.9047 LearningRate 0.0255 Epoch: 9 Global Step: 165280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:55,034-Speed 9212.99 samples/sec Loss 5.8638 LearningRate 0.0255 Epoch: 9 Global Step: 165290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:56,145-Speed 9224.62 samples/sec Loss 5.9164 LearningRate 0.0255 Epoch: 9 Global Step: 165300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:57,259-Speed 9199.49 samples/sec Loss 5.9272 LearningRate 0.0255 Epoch: 9 Global Step: 165310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:58,362-Speed 9288.29 samples/sec Loss 5.8728 LearningRate 0.0255 Epoch: 9 Global Step: 165320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:11:59,461-Speed 9320.18 samples/sec Loss 5.9023 LearningRate 0.0255 Epoch: 9 Global Step: 165330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:12:00,563-Speed 9297.64 samples/sec Loss 5.8933 LearningRate 0.0255 Epoch: 9 Global Step: 165340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:01,642-Speed 9498.28 samples/sec Loss 5.9278 LearningRate 0.0255 Epoch: 9 Global Step: 165350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:02,711-Speed 9585.84 samples/sec Loss 5.9825 LearningRate 0.0255 Epoch: 9 Global Step: 165360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:03,766-Speed 9710.79 samples/sec Loss 5.9706 LearningRate 0.0255 Epoch: 9 Global Step: 165370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:04,848-Speed 9468.85 samples/sec Loss 5.9181 LearningRate 0.0255 Epoch: 9 Global Step: 165380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:05,968-Speed 9151.51 samples/sec Loss 5.8622 LearningRate 0.0255 Epoch: 9 Global Step: 165390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:07,071-Speed 9288.43 samples/sec Loss 5.7887 LearningRate 0.0255 Epoch: 9 Global Step: 165400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:08,119-Speed 9770.33 samples/sec Loss 5.9216 LearningRate 0.0255 Epoch: 9 Global Step: 165410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:09,256-Speed 9016.37 samples/sec Loss 5.8504 LearningRate 0.0254 Epoch: 9 Global Step: 165420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:10,350-Speed 9366.67 samples/sec Loss 6.0476 LearningRate 0.0254 Epoch: 9 Global Step: 165430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:11,458-Speed 9247.11 samples/sec Loss 5.7943 LearningRate 0.0254 Epoch: 9 Global Step: 165440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:12,556-Speed 9339.37 samples/sec Loss 5.8876 LearningRate 0.0254 Epoch: 9 Global Step: 165450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:13,655-Speed 9323.89 samples/sec Loss 5.8076 LearningRate 0.0254 Epoch: 9 Global Step: 165460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:14,733-Speed 9498.90 samples/sec Loss 5.9038 LearningRate 0.0254 Epoch: 9 Global Step: 165470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:15,805-Speed 9568.48 samples/sec Loss 5.9081 LearningRate 0.0254 Epoch: 9 Global Step: 165480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:16,891-Speed 9427.40 samples/sec Loss 5.9197 LearningRate 0.0254 Epoch: 9 Global Step: 165490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:17,988-Speed 9343.84 samples/sec Loss 5.8729 LearningRate 0.0254 Epoch: 9 Global Step: 165500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:19,083-Speed 9354.23 samples/sec Loss 5.8012 LearningRate 0.0254 Epoch: 9 Global Step: 165510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:20,212-Speed 9077.90 samples/sec Loss 5.8234 LearningRate 0.0254 Epoch: 9 Global Step: 165520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:21,286-Speed 9535.36 samples/sec Loss 5.8658 LearningRate 0.0254 Epoch: 9 Global Step: 165530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:22,343-Speed 9692.00 samples/sec Loss 5.9717 LearningRate 0.0254 Epoch: 9 Global Step: 165540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:23,446-Speed 9291.59 samples/sec Loss 5.9305 LearningRate 0.0254 Epoch: 9 Global Step: 165550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:24,586-Speed 8986.86 samples/sec Loss 5.8857 LearningRate 0.0254 Epoch: 9 Global Step: 165560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:25,645-Speed 9670.36 samples/sec Loss 5.8350 LearningRate 0.0254 Epoch: 9 Global Step: 165570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:26,762-Speed 9173.83 samples/sec Loss 5.8769 LearningRate 0.0254 Epoch: 9 Global Step: 165580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:27,875-Speed 9213.90 samples/sec Loss 5.7870 LearningRate 0.0254 Epoch: 9 Global Step: 165590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:28,943-Speed 9587.11 samples/sec Loss 5.8762 LearningRate 0.0254 Epoch: 9 Global Step: 165600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:30,032-Speed 9409.46 samples/sec Loss 5.9378 LearningRate 0.0254 Epoch: 9 Global Step: 165610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:31,102-Speed 9580.64 samples/sec Loss 5.8914 LearningRate 0.0254 Epoch: 9 Global Step: 165620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:32,203-Speed 9305.53 samples/sec Loss 5.9375 LearningRate 0.0254 Epoch: 9 Global Step: 165630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:12:33,300-Speed 9342.43 samples/sec Loss 5.8770 LearningRate 0.0254 Epoch: 9 Global Step: 165640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:12:34,363-Speed 9634.02 samples/sec Loss 5.8822 LearningRate 0.0254 Epoch: 9 Global Step: 165650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:12:35,467-Speed 9281.87 samples/sec Loss 5.8425 LearningRate 0.0254 Epoch: 9 Global Step: 165660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:12:36,548-Speed 9475.38 samples/sec Loss 5.9054 LearningRate 0.0254 Epoch: 9 Global Step: 165670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:12:37,630-Speed 9475.05 samples/sec Loss 5.8368 LearningRate 0.0254 Epoch: 9 Global Step: 165680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:12:38,734-Speed 9284.10 samples/sec Loss 5.9286 LearningRate 0.0254 Epoch: 9 Global Step: 165690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:12:39,822-Speed 9412.77 samples/sec Loss 5.8556 LearningRate 0.0254 Epoch: 9 Global Step: 165700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:12:40,919-Speed 9340.26 samples/sec Loss 5.9127 LearningRate 0.0254 Epoch: 9 Global Step: 165710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:12:41,999-Speed 9490.59 samples/sec Loss 5.8677 LearningRate 0.0254 Epoch: 9 Global Step: 165720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:12:43,156-Speed 8852.30 samples/sec Loss 5.9347 LearningRate 0.0254 Epoch: 9 Global Step: 165730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:44,235-Speed 9492.15 samples/sec Loss 5.8702 LearningRate 0.0254 Epoch: 9 Global Step: 165740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:45,331-Speed 9354.28 samples/sec Loss 5.8393 LearningRate 0.0253 Epoch: 9 Global Step: 165750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:46,416-Speed 9438.93 samples/sec Loss 5.8917 LearningRate 0.0253 Epoch: 9 Global Step: 165760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:47,504-Speed 9420.96 samples/sec Loss 5.8945 LearningRate 0.0253 Epoch: 9 Global Step: 165770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:48,600-Speed 9350.70 samples/sec Loss 5.8820 LearningRate 0.0253 Epoch: 9 Global Step: 165780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:49,664-Speed 9623.92 samples/sec Loss 5.9001 LearningRate 0.0253 Epoch: 9 Global Step: 165790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:50,732-Speed 9594.65 samples/sec Loss 5.9396 LearningRate 0.0253 Epoch: 9 Global Step: 165800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:51,817-Speed 9445.69 samples/sec Loss 6.0209 LearningRate 0.0253 Epoch: 9 Global Step: 165810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:52,912-Speed 9358.43 samples/sec Loss 5.8924 LearningRate 0.0253 Epoch: 9 Global Step: 165820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:54,038-Speed 9103.37 samples/sec Loss 5.8916 LearningRate 0.0253 Epoch: 9 Global Step: 165830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:55,125-Speed 9421.45 samples/sec Loss 5.8866 LearningRate 0.0253 Epoch: 9 Global Step: 165840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:56,192-Speed 9599.44 samples/sec Loss 5.8961 LearningRate 0.0253 Epoch: 9 Global Step: 165850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:57,278-Speed 9443.07 samples/sec Loss 5.9520 LearningRate 0.0253 Epoch: 9 Global Step: 165860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:58,338-Speed 9677.23 samples/sec Loss 5.9311 LearningRate 0.0253 Epoch: 9 Global Step: 165870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:12:59,413-Speed 9526.81 samples/sec Loss 5.8190 LearningRate 0.0253 Epoch: 9 Global Step: 165880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:00,500-Speed 9432.00 samples/sec Loss 5.8917 LearningRate 0.0253 Epoch: 9 Global Step: 165890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:01,590-Speed 9392.13 samples/sec Loss 5.8793 LearningRate 0.0253 Epoch: 9 Global Step: 165900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:02,701-Speed 9226.42 samples/sec Loss 5.8633 LearningRate 0.0253 Epoch: 9 Global Step: 165910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:03,756-Speed 9715.57 samples/sec Loss 5.8656 LearningRate 0.0253 Epoch: 9 Global Step: 165920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:04,891-Speed 9024.08 samples/sec Loss 5.8728 LearningRate 0.0253 Epoch: 9 Global Step: 165930 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:13:05,954-Speed 9637.72 samples/sec Loss 5.9037 LearningRate 0.0253 Epoch: 9 Global Step: 165940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:07,061-Speed 9261.03 samples/sec Loss 5.9488 LearningRate 0.0253 Epoch: 9 Global Step: 165950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:08,181-Speed 9143.33 samples/sec Loss 5.9126 LearningRate 0.0253 Epoch: 9 Global Step: 165960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:09,247-Speed 9621.53 samples/sec Loss 5.7684 LearningRate 0.0253 Epoch: 9 Global Step: 165970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:10,371-Speed 9120.58 samples/sec Loss 5.9243 LearningRate 0.0253 Epoch: 9 Global Step: 165980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:11,453-Speed 9463.59 samples/sec Loss 5.8786 LearningRate 0.0253 Epoch: 9 Global Step: 165990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:12,552-Speed 9322.42 samples/sec Loss 5.9531 LearningRate 0.0253 Epoch: 9 Global Step: 166000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:13:34,715-[lfw][166000]XNorm: 9.513423 Training: 2022-04-11 18:13:34,716-[lfw][166000]Accuracy-Flip: 0.99633+-0.00277 Training: 2022-04-11 18:13:34,716-[lfw][166000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:14:00,306-[cfp_fp][166000]XNorm: 8.136522 Training: 2022-04-11 18:14:00,307-[cfp_fp][166000]Accuracy-Flip: 0.95800+-0.01123 Training: 2022-04-11 18:14:00,307-[cfp_fp][166000]Accuracy-Highest: 0.96500 Training: 2022-04-11 18:14:22,199-[agedb_30][166000]XNorm: 9.208214 Training: 2022-04-11 18:14:22,200-[agedb_30][166000]Accuracy-Flip: 0.96917+-0.00935 Training: 2022-04-11 18:14:22,201-[agedb_30][166000]Accuracy-Highest: 0.96917 Training: 2022-04-11 18:14:23,266-Speed 144.81 samples/sec Loss 5.8095 LearningRate 0.0253 Epoch: 9 Global Step: 166010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:24,370-Speed 9278.17 samples/sec Loss 5.9448 LearningRate 0.0253 Epoch: 9 Global Step: 166020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:25,462-Speed 9393.04 samples/sec Loss 5.8429 LearningRate 0.0253 Epoch: 9 Global Step: 166030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:26,552-Speed 9394.16 samples/sec Loss 5.8523 LearningRate 0.0253 Epoch: 9 Global Step: 166040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:27,627-Speed 9530.39 samples/sec Loss 5.9441 LearningRate 0.0253 Epoch: 9 Global Step: 166050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:28,741-Speed 9201.36 samples/sec Loss 5.7388 LearningRate 0.0253 Epoch: 9 Global Step: 166060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:29,835-Speed 9372.90 samples/sec Loss 5.8465 LearningRate 0.0253 Epoch: 9 Global Step: 166070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:30,951-Speed 9179.73 samples/sec Loss 5.8922 LearningRate 0.0252 Epoch: 9 Global Step: 166080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:32,072-Speed 9139.48 samples/sec Loss 5.7825 LearningRate 0.0252 Epoch: 9 Global Step: 166090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:33,183-Speed 9221.59 samples/sec Loss 5.9767 LearningRate 0.0252 Epoch: 9 Global Step: 166100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:34,229-Speed 9797.49 samples/sec Loss 5.8932 LearningRate 0.0252 Epoch: 9 Global Step: 166110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:35,311-Speed 9468.18 samples/sec Loss 5.8979 LearningRate 0.0252 Epoch: 9 Global Step: 166120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:36,398-Speed 9428.23 samples/sec Loss 5.7572 LearningRate 0.0252 Epoch: 9 Global Step: 166130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:37,532-Speed 9030.84 samples/sec Loss 5.8867 LearningRate 0.0252 Epoch: 9 Global Step: 166140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:38,644-Speed 9213.46 samples/sec Loss 5.7936 LearningRate 0.0252 Epoch: 9 Global Step: 166150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:39,749-Speed 9278.12 samples/sec Loss 5.8863 LearningRate 0.0252 Epoch: 9 Global Step: 166160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:40,797-Speed 9779.92 samples/sec Loss 5.9576 LearningRate 0.0252 Epoch: 9 Global Step: 166170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:14:41,896-Speed 9316.53 samples/sec Loss 5.9263 LearningRate 0.0252 Epoch: 9 Global Step: 166180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:14:42,987-Speed 9391.48 samples/sec Loss 5.9036 LearningRate 0.0252 Epoch: 9 Global Step: 166190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:14:44,088-Speed 9308.98 samples/sec Loss 5.8762 LearningRate 0.0252 Epoch: 9 Global Step: 166200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:14:45,202-Speed 9193.38 samples/sec Loss 5.9392 LearningRate 0.0252 Epoch: 9 Global Step: 166210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:14:46,279-Speed 9514.42 samples/sec Loss 5.8296 LearningRate 0.0252 Epoch: 9 Global Step: 166220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:14:47,370-Speed 9396.28 samples/sec Loss 5.9175 LearningRate 0.0252 Epoch: 9 Global Step: 166230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:14:48,417-Speed 9786.54 samples/sec Loss 5.8820 LearningRate 0.0252 Epoch: 9 Global Step: 166240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:14:49,520-Speed 9290.22 samples/sec Loss 5.8808 LearningRate 0.0252 Epoch: 9 Global Step: 166250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:14:50,627-Speed 9260.05 samples/sec Loss 5.7958 LearningRate 0.0252 Epoch: 9 Global Step: 166260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:14:51,717-Speed 9398.01 samples/sec Loss 5.9093 LearningRate 0.0252 Epoch: 9 Global Step: 166270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:52,794-Speed 9514.85 samples/sec Loss 5.9142 LearningRate 0.0252 Epoch: 9 Global Step: 166280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:53,901-Speed 9261.17 samples/sec Loss 5.9629 LearningRate 0.0252 Epoch: 9 Global Step: 166290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:55,001-Speed 9313.09 samples/sec Loss 5.7782 LearningRate 0.0252 Epoch: 9 Global Step: 166300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:56,094-Speed 9375.71 samples/sec Loss 5.7942 LearningRate 0.0252 Epoch: 9 Global Step: 166310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:57,175-Speed 9479.16 samples/sec Loss 5.9499 LearningRate 0.0252 Epoch: 9 Global Step: 166320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:58,290-Speed 9185.50 samples/sec Loss 5.9464 LearningRate 0.0252 Epoch: 9 Global Step: 166330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:14:59,402-Speed 9213.97 samples/sec Loss 5.8531 LearningRate 0.0252 Epoch: 9 Global Step: 166340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:00,506-Speed 9280.48 samples/sec Loss 5.8510 LearningRate 0.0252 Epoch: 9 Global Step: 166350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:01,631-Speed 9112.54 samples/sec Loss 5.9123 LearningRate 0.0252 Epoch: 9 Global Step: 166360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:02,764-Speed 9043.42 samples/sec Loss 5.8198 LearningRate 0.0252 Epoch: 9 Global Step: 166370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:15:03,829-Speed 9618.52 samples/sec Loss 5.7588 LearningRate 0.0252 Epoch: 9 Global Step: 166380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:04,885-Speed 9705.77 samples/sec Loss 5.9118 LearningRate 0.0252 Epoch: 9 Global Step: 166390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:05,982-Speed 9334.61 samples/sec Loss 5.8590 LearningRate 0.0252 Epoch: 9 Global Step: 166400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:07,079-Speed 9339.79 samples/sec Loss 5.9101 LearningRate 0.0252 Epoch: 9 Global Step: 166410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:08,178-Speed 9325.09 samples/sec Loss 5.8672 LearningRate 0.0251 Epoch: 9 Global Step: 166420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:09,244-Speed 9615.87 samples/sec Loss 6.0134 LearningRate 0.0251 Epoch: 9 Global Step: 166430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:10,330-Speed 9437.58 samples/sec Loss 6.0142 LearningRate 0.0251 Epoch: 9 Global Step: 166440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:11,427-Speed 9339.75 samples/sec Loss 6.0240 LearningRate 0.0251 Epoch: 9 Global Step: 166450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:12,539-Speed 9209.05 samples/sec Loss 5.8037 LearningRate 0.0251 Epoch: 9 Global Step: 166460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:13,636-Speed 9342.80 samples/sec Loss 5.8457 LearningRate 0.0251 Epoch: 9 Global Step: 166470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:14,745-Speed 9242.15 samples/sec Loss 5.9621 LearningRate 0.0251 Epoch: 9 Global Step: 166480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:15:15,845-Speed 9315.62 samples/sec Loss 5.8552 LearningRate 0.0251 Epoch: 9 Global Step: 166490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:16,898-Speed 9730.10 samples/sec Loss 5.8914 LearningRate 0.0251 Epoch: 9 Global Step: 166500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:17,992-Speed 9363.87 samples/sec Loss 5.8500 LearningRate 0.0251 Epoch: 9 Global Step: 166510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:19,120-Speed 9082.81 samples/sec Loss 5.8672 LearningRate 0.0251 Epoch: 9 Global Step: 166520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:20,213-Speed 9380.15 samples/sec Loss 5.8990 LearningRate 0.0251 Epoch: 9 Global Step: 166530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:21,289-Speed 9515.47 samples/sec Loss 5.8733 LearningRate 0.0251 Epoch: 9 Global Step: 166540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:22,378-Speed 9414.74 samples/sec Loss 5.8285 LearningRate 0.0251 Epoch: 9 Global Step: 166550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:23,420-Speed 9830.75 samples/sec Loss 5.8353 LearningRate 0.0251 Epoch: 9 Global Step: 166560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:24,512-Speed 9379.95 samples/sec Loss 5.9064 LearningRate 0.0251 Epoch: 9 Global Step: 166570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:25,612-Speed 9309.21 samples/sec Loss 5.9381 LearningRate 0.0251 Epoch: 9 Global Step: 166580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:26,671-Speed 9682.24 samples/sec Loss 5.9156 LearningRate 0.0251 Epoch: 9 Global Step: 166590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:27,737-Speed 9610.86 samples/sec Loss 5.8524 LearningRate 0.0251 Epoch: 9 Global Step: 166600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:28,825-Speed 9418.85 samples/sec Loss 5.9040 LearningRate 0.0251 Epoch: 9 Global Step: 166610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:29,899-Speed 9544.52 samples/sec Loss 5.8779 LearningRate 0.0251 Epoch: 9 Global Step: 166620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:30,995-Speed 9343.66 samples/sec Loss 5.9213 LearningRate 0.0251 Epoch: 9 Global Step: 166630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:32,086-Speed 9388.94 samples/sec Loss 5.9098 LearningRate 0.0251 Epoch: 9 Global Step: 166640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:33,217-Speed 9065.44 samples/sec Loss 5.8763 LearningRate 0.0251 Epoch: 9 Global Step: 166650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:34,292-Speed 9533.51 samples/sec Loss 5.7162 LearningRate 0.0251 Epoch: 9 Global Step: 166660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:35,396-Speed 9274.88 samples/sec Loss 5.8495 LearningRate 0.0251 Epoch: 9 Global Step: 166670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:36,517-Speed 9145.24 samples/sec Loss 5.8802 LearningRate 0.0251 Epoch: 9 Global Step: 166680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:37,616-Speed 9320.22 samples/sec Loss 5.9357 LearningRate 0.0251 Epoch: 9 Global Step: 166690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:15:38,725-Speed 9237.56 samples/sec Loss 5.8838 LearningRate 0.0251 Epoch: 9 Global Step: 166700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:39,805-Speed 9488.82 samples/sec Loss 5.7465 LearningRate 0.0251 Epoch: 9 Global Step: 166710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:40,926-Speed 9138.10 samples/sec Loss 5.8707 LearningRate 0.0251 Epoch: 9 Global Step: 166720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:42,012-Speed 9433.76 samples/sec Loss 5.9502 LearningRate 0.0251 Epoch: 9 Global Step: 166730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:43,121-Speed 9240.33 samples/sec Loss 5.8728 LearningRate 0.0251 Epoch: 9 Global Step: 166740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:44,223-Speed 9299.59 samples/sec Loss 5.8689 LearningRate 0.0250 Epoch: 9 Global Step: 166750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:45,337-Speed 9192.64 samples/sec Loss 5.9342 LearningRate 0.0250 Epoch: 9 Global Step: 166760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:46,405-Speed 9607.08 samples/sec Loss 5.9013 LearningRate 0.0250 Epoch: 9 Global Step: 166770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:47,505-Speed 9313.83 samples/sec Loss 5.8277 LearningRate 0.0250 Epoch: 9 Global Step: 166780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:48,693-Speed 8627.09 samples/sec Loss 5.8954 LearningRate 0.0250 Epoch: 9 Global Step: 166790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:49,764-Speed 9564.48 samples/sec Loss 5.8911 LearningRate 0.0250 Epoch: 9 Global Step: 166800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:50,844-Speed 9484.46 samples/sec Loss 5.8022 LearningRate 0.0250 Epoch: 9 Global Step: 166810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:51,908-Speed 9628.92 samples/sec Loss 5.8033 LearningRate 0.0250 Epoch: 9 Global Step: 166820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:53,007-Speed 9328.20 samples/sec Loss 5.7493 LearningRate 0.0250 Epoch: 9 Global Step: 166830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:54,143-Speed 9013.98 samples/sec Loss 5.9195 LearningRate 0.0250 Epoch: 9 Global Step: 166840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:55,230-Speed 9426.91 samples/sec Loss 5.8502 LearningRate 0.0250 Epoch: 9 Global Step: 166850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:56,305-Speed 9536.30 samples/sec Loss 5.9175 LearningRate 0.0250 Epoch: 9 Global Step: 166860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:57,377-Speed 9556.24 samples/sec Loss 5.8958 LearningRate 0.0250 Epoch: 9 Global Step: 166870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:58,495-Speed 9165.66 samples/sec Loss 5.8182 LearningRate 0.0250 Epoch: 9 Global Step: 166880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:15:59,560-Speed 9617.27 samples/sec Loss 5.7994 LearningRate 0.0250 Epoch: 9 Global Step: 166890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:01,138-Speed 6493.50 samples/sec Loss 5.9078 LearningRate 0.0250 Epoch: 9 Global Step: 166900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:16:02,148-Speed 10146.12 samples/sec Loss 5.8624 LearningRate 0.0250 Epoch: 9 Global Step: 166910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:45,735-Speed 234.94 samples/sec Loss 5.0620 LearningRate 0.0250 Epoch: 10 Global Step: 166920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:47,021-Speed 7974.88 samples/sec Loss 4.9782 LearningRate 0.0250 Epoch: 10 Global Step: 166930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:48,267-Speed 8223.60 samples/sec Loss 5.0605 LearningRate 0.0250 Epoch: 10 Global Step: 166940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:49,615-Speed 7599.32 samples/sec Loss 5.1017 LearningRate 0.0250 Epoch: 10 Global Step: 166950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:50,705-Speed 9402.98 samples/sec Loss 5.1200 LearningRate 0.0250 Epoch: 10 Global Step: 166960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:52,018-Speed 7802.50 samples/sec Loss 5.0776 LearningRate 0.0250 Epoch: 10 Global Step: 166970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:53,376-Speed 7545.21 samples/sec Loss 5.1151 LearningRate 0.0250 Epoch: 10 Global Step: 166980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:54,472-Speed 9349.27 samples/sec Loss 5.1404 LearningRate 0.0250 Epoch: 10 Global Step: 166990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:55,837-Speed 7502.82 samples/sec Loss 5.0729 LearningRate 0.0250 Epoch: 10 Global Step: 167000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:56,929-Speed 9385.10 samples/sec Loss 5.0419 LearningRate 0.0250 Epoch: 10 Global Step: 167010 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:16:58,181-Speed 8186.63 samples/sec Loss 5.1430 LearningRate 0.0250 Epoch: 10 Global Step: 167020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:16:59,286-Speed 9268.62 samples/sec Loss 5.1004 LearningRate 0.0250 Epoch: 10 Global Step: 167030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:00,590-Speed 7859.99 samples/sec Loss 5.0390 LearningRate 0.0250 Epoch: 10 Global Step: 167040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:01,737-Speed 8927.16 samples/sec Loss 5.1679 LearningRate 0.0250 Epoch: 10 Global Step: 167050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:02,852-Speed 9191.86 samples/sec Loss 5.0737 LearningRate 0.0250 Epoch: 10 Global Step: 167060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:03,974-Speed 9130.25 samples/sec Loss 5.1314 LearningRate 0.0250 Epoch: 10 Global Step: 167070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:05,062-Speed 9426.30 samples/sec Loss 5.0584 LearningRate 0.0249 Epoch: 10 Global Step: 167080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:06,139-Speed 9520.16 samples/sec Loss 5.1344 LearningRate 0.0249 Epoch: 10 Global Step: 167090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:07,248-Speed 9239.22 samples/sec Loss 5.2323 LearningRate 0.0249 Epoch: 10 Global Step: 167100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:08,371-Speed 9119.92 samples/sec Loss 5.0975 LearningRate 0.0249 Epoch: 10 Global Step: 167110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:09,437-Speed 9608.18 samples/sec Loss 5.2038 LearningRate 0.0249 Epoch: 10 Global Step: 167120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:10,538-Speed 9310.25 samples/sec Loss 5.2337 LearningRate 0.0249 Epoch: 10 Global Step: 167130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:11,682-Speed 8951.93 samples/sec Loss 5.0483 LearningRate 0.0249 Epoch: 10 Global Step: 167140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:12,770-Speed 9419.06 samples/sec Loss 5.0497 LearningRate 0.0249 Epoch: 10 Global Step: 167150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:13,868-Speed 9331.20 samples/sec Loss 5.1521 LearningRate 0.0249 Epoch: 10 Global Step: 167160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:14,916-Speed 9774.63 samples/sec Loss 5.1220 LearningRate 0.0249 Epoch: 10 Global Step: 167170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:16,002-Speed 9433.57 samples/sec Loss 5.1695 LearningRate 0.0249 Epoch: 10 Global Step: 167180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:17,091-Speed 9412.06 samples/sec Loss 5.0713 LearningRate 0.0249 Epoch: 10 Global Step: 167190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:18,214-Speed 9125.61 samples/sec Loss 5.1136 LearningRate 0.0249 Epoch: 10 Global Step: 167200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:19,256-Speed 9825.98 samples/sec Loss 5.0867 LearningRate 0.0249 Epoch: 10 Global Step: 167210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:20,343-Speed 9431.87 samples/sec Loss 5.0777 LearningRate 0.0249 Epoch: 10 Global Step: 167220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:21,436-Speed 9377.68 samples/sec Loss 5.1284 LearningRate 0.0249 Epoch: 10 Global Step: 167230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:22,534-Speed 9332.25 samples/sec Loss 5.0525 LearningRate 0.0249 Epoch: 10 Global Step: 167240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:23,619-Speed 9435.25 samples/sec Loss 5.2087 LearningRate 0.0249 Epoch: 10 Global Step: 167250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:24,705-Speed 9435.84 samples/sec Loss 5.1105 LearningRate 0.0249 Epoch: 10 Global Step: 167260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:26,292-Speed 6457.04 samples/sec Loss 5.2287 LearningRate 0.0249 Epoch: 10 Global Step: 167270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:27,426-Speed 9036.02 samples/sec Loss 5.1159 LearningRate 0.0249 Epoch: 10 Global Step: 167280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:28,550-Speed 9129.53 samples/sec Loss 5.2041 LearningRate 0.0249 Epoch: 10 Global Step: 167290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:29,664-Speed 9199.40 samples/sec Loss 5.1309 LearningRate 0.0249 Epoch: 10 Global Step: 167300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:30,753-Speed 9404.14 samples/sec Loss 5.1192 LearningRate 0.0249 Epoch: 10 Global Step: 167310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:31,820-Speed 9605.87 samples/sec Loss 5.2199 LearningRate 0.0249 Epoch: 10 Global Step: 167320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:32,901-Speed 9478.13 samples/sec Loss 5.1778 LearningRate 0.0249 Epoch: 10 Global Step: 167330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:33,955-Speed 9726.00 samples/sec Loss 5.1627 LearningRate 0.0249 Epoch: 10 Global Step: 167340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:35,048-Speed 9370.12 samples/sec Loss 5.1853 LearningRate 0.0249 Epoch: 10 Global Step: 167350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:36,170-Speed 9132.38 samples/sec Loss 5.1712 LearningRate 0.0249 Epoch: 10 Global Step: 167360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:37,273-Speed 9289.39 samples/sec Loss 5.1429 LearningRate 0.0249 Epoch: 10 Global Step: 167370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:38,383-Speed 9228.49 samples/sec Loss 5.1911 LearningRate 0.0249 Epoch: 10 Global Step: 167380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:39,503-Speed 9153.81 samples/sec Loss 5.2340 LearningRate 0.0249 Epoch: 10 Global Step: 167390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:40,565-Speed 9645.80 samples/sec Loss 5.1838 LearningRate 0.0249 Epoch: 10 Global Step: 167400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:41,835-Speed 8068.68 samples/sec Loss 5.1238 LearningRate 0.0249 Epoch: 10 Global Step: 167410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:42,918-Speed 9463.89 samples/sec Loss 5.1397 LearningRate 0.0248 Epoch: 10 Global Step: 167420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:43,991-Speed 9549.35 samples/sec Loss 5.2485 LearningRate 0.0248 Epoch: 10 Global Step: 167430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:17:45,055-Speed 9624.47 samples/sec Loss 5.2123 LearningRate 0.0248 Epoch: 10 Global Step: 167440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:46,137-Speed 9473.46 samples/sec Loss 5.2199 LearningRate 0.0248 Epoch: 10 Global Step: 167450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:47,245-Speed 9247.32 samples/sec Loss 5.2207 LearningRate 0.0248 Epoch: 10 Global Step: 167460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:48,298-Speed 9730.19 samples/sec Loss 5.2042 LearningRate 0.0248 Epoch: 10 Global Step: 167470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:49,372-Speed 9533.65 samples/sec Loss 5.1202 LearningRate 0.0248 Epoch: 10 Global Step: 167480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:50,440-Speed 9598.15 samples/sec Loss 5.1673 LearningRate 0.0248 Epoch: 10 Global Step: 167490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:51,531-Speed 9388.60 samples/sec Loss 5.1878 LearningRate 0.0248 Epoch: 10 Global Step: 167500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:52,613-Speed 9474.95 samples/sec Loss 5.2823 LearningRate 0.0248 Epoch: 10 Global Step: 167510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:53,674-Speed 9654.32 samples/sec Loss 5.2317 LearningRate 0.0248 Epoch: 10 Global Step: 167520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:54,772-Speed 9326.64 samples/sec Loss 5.2116 LearningRate 0.0248 Epoch: 10 Global Step: 167530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:55,857-Speed 9450.66 samples/sec Loss 5.3045 LearningRate 0.0248 Epoch: 10 Global Step: 167540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:56,952-Speed 9352.90 samples/sec Loss 5.1906 LearningRate 0.0248 Epoch: 10 Global Step: 167550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:58,003-Speed 9748.30 samples/sec Loss 5.2753 LearningRate 0.0248 Epoch: 10 Global Step: 167560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:17:59,105-Speed 9299.88 samples/sec Loss 5.2408 LearningRate 0.0248 Epoch: 10 Global Step: 167570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:00,159-Speed 9719.72 samples/sec Loss 5.1254 LearningRate 0.0248 Epoch: 10 Global Step: 167580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:01,263-Speed 9282.83 samples/sec Loss 5.2685 LearningRate 0.0248 Epoch: 10 Global Step: 167590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:02,328-Speed 9623.23 samples/sec Loss 5.2200 LearningRate 0.0248 Epoch: 10 Global Step: 167600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:03,473-Speed 8949.78 samples/sec Loss 5.1797 LearningRate 0.0248 Epoch: 10 Global Step: 167610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:04,534-Speed 9649.68 samples/sec Loss 5.2819 LearningRate 0.0248 Epoch: 10 Global Step: 167620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:05,594-Speed 9666.89 samples/sec Loss 5.1665 LearningRate 0.0248 Epoch: 10 Global Step: 167630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:06,638-Speed 9817.51 samples/sec Loss 5.2388 LearningRate 0.0248 Epoch: 10 Global Step: 167640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:18:07,718-Speed 9487.23 samples/sec Loss 5.2124 LearningRate 0.0248 Epoch: 10 Global Step: 167650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:08,780-Speed 9648.18 samples/sec Loss 5.2094 LearningRate 0.0248 Epoch: 10 Global Step: 167660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:09,852-Speed 9554.98 samples/sec Loss 5.3279 LearningRate 0.0248 Epoch: 10 Global Step: 167670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:10,940-Speed 9422.41 samples/sec Loss 5.2714 LearningRate 0.0248 Epoch: 10 Global Step: 167680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:12,087-Speed 8932.97 samples/sec Loss 5.2366 LearningRate 0.0248 Epoch: 10 Global Step: 167690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:13,158-Speed 9564.47 samples/sec Loss 5.3398 LearningRate 0.0248 Epoch: 10 Global Step: 167700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:14,272-Speed 9198.51 samples/sec Loss 5.2836 LearningRate 0.0248 Epoch: 10 Global Step: 167710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:15,362-Speed 9400.02 samples/sec Loss 5.2453 LearningRate 0.0248 Epoch: 10 Global Step: 167720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:16,454-Speed 9377.15 samples/sec Loss 5.1898 LearningRate 0.0248 Epoch: 10 Global Step: 167730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:17,531-Speed 9514.85 samples/sec Loss 5.0917 LearningRate 0.0248 Epoch: 10 Global Step: 167740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:18,601-Speed 9574.66 samples/sec Loss 5.1835 LearningRate 0.0247 Epoch: 10 Global Step: 167750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:18:19,667-Speed 9622.07 samples/sec Loss 5.3050 LearningRate 0.0247 Epoch: 10 Global Step: 167760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:20,742-Speed 9532.86 samples/sec Loss 5.3639 LearningRate 0.0247 Epoch: 10 Global Step: 167770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:21,788-Speed 9793.94 samples/sec Loss 5.1965 LearningRate 0.0247 Epoch: 10 Global Step: 167780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:22,823-Speed 9896.17 samples/sec Loss 5.2122 LearningRate 0.0247 Epoch: 10 Global Step: 167790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:23,907-Speed 9456.28 samples/sec Loss 5.2064 LearningRate 0.0247 Epoch: 10 Global Step: 167800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:24,945-Speed 9865.02 samples/sec Loss 5.1710 LearningRate 0.0247 Epoch: 10 Global Step: 167810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:26,015-Speed 9576.23 samples/sec Loss 5.1772 LearningRate 0.0247 Epoch: 10 Global Step: 167820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:27,108-Speed 9380.79 samples/sec Loss 5.3354 LearningRate 0.0247 Epoch: 10 Global Step: 167830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:28,207-Speed 9321.91 samples/sec Loss 5.2450 LearningRate 0.0247 Epoch: 10 Global Step: 167840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:29,276-Speed 9585.84 samples/sec Loss 5.3122 LearningRate 0.0247 Epoch: 10 Global Step: 167850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:30,341-Speed 9613.37 samples/sec Loss 5.2733 LearningRate 0.0247 Epoch: 10 Global Step: 167860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:18:31,447-Speed 9266.35 samples/sec Loss 5.2587 LearningRate 0.0247 Epoch: 10 Global Step: 167870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:32,569-Speed 9130.25 samples/sec Loss 5.2847 LearningRate 0.0247 Epoch: 10 Global Step: 167880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:33,668-Speed 9322.37 samples/sec Loss 5.3011 LearningRate 0.0247 Epoch: 10 Global Step: 167890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:34,793-Speed 9106.22 samples/sec Loss 5.2757 LearningRate 0.0247 Epoch: 10 Global Step: 167900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:35,879-Speed 9439.21 samples/sec Loss 5.2776 LearningRate 0.0247 Epoch: 10 Global Step: 167910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:36,955-Speed 9525.76 samples/sec Loss 5.3514 LearningRate 0.0247 Epoch: 10 Global Step: 167920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:38,025-Speed 9578.43 samples/sec Loss 5.2502 LearningRate 0.0247 Epoch: 10 Global Step: 167930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:39,135-Speed 9231.43 samples/sec Loss 5.2821 LearningRate 0.0247 Epoch: 10 Global Step: 167940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:40,230-Speed 9353.04 samples/sec Loss 5.2944 LearningRate 0.0247 Epoch: 10 Global Step: 167950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:41,289-Speed 9677.61 samples/sec Loss 5.1908 LearningRate 0.0247 Epoch: 10 Global Step: 167960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:18:42,366-Speed 9513.34 samples/sec Loss 5.2223 LearningRate 0.0247 Epoch: 10 Global Step: 167970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:18:43,453-Speed 9424.58 samples/sec Loss 5.2910 LearningRate 0.0247 Epoch: 10 Global Step: 167980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:18:44,543-Speed 9399.65 samples/sec Loss 5.2275 LearningRate 0.0247 Epoch: 10 Global Step: 167990 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:18:45,630-Speed 9421.69 samples/sec Loss 5.2949 LearningRate 0.0247 Epoch: 10 Global Step: 168000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:19:07,804-[lfw][168000]XNorm: 9.483930 Training: 2022-04-11 18:19:07,805-[lfw][168000]Accuracy-Flip: 0.99683+-0.00252 Training: 2022-04-11 18:19:07,805-[lfw][168000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:19:33,417-[cfp_fp][168000]XNorm: 8.157596 Training: 2022-04-11 18:19:33,418-[cfp_fp][168000]Accuracy-Flip: 0.96214+-0.00848 Training: 2022-04-11 18:19:33,418-[cfp_fp][168000]Accuracy-Highest: 0.96500 Training: 2022-04-11 18:19:55,590-[agedb_30][168000]XNorm: 9.209012 Training: 2022-04-11 18:19:55,591-[agedb_30][168000]Accuracy-Flip: 0.96817+-0.01020 Training: 2022-04-11 18:19:55,591-[agedb_30][168000]Accuracy-Highest: 0.96917 Training: 2022-04-11 18:19:56,665-Speed 144.16 samples/sec Loss 5.2859 LearningRate 0.0247 Epoch: 10 Global Step: 168010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:19:57,780-Speed 9197.69 samples/sec Loss 5.3488 LearningRate 0.0247 Epoch: 10 Global Step: 168020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:19:58,887-Speed 9254.63 samples/sec Loss 5.1983 LearningRate 0.0247 Epoch: 10 Global Step: 168030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:19:59,991-Speed 9282.23 samples/sec Loss 5.2899 LearningRate 0.0247 Epoch: 10 Global Step: 168040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:01,096-Speed 9265.39 samples/sec Loss 5.2744 LearningRate 0.0247 Epoch: 10 Global Step: 168050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:02,166-Speed 9579.13 samples/sec Loss 5.2534 LearningRate 0.0247 Epoch: 10 Global Step: 168060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:03,256-Speed 9398.66 samples/sec Loss 5.2242 LearningRate 0.0247 Epoch: 10 Global Step: 168070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:04,371-Speed 9191.61 samples/sec Loss 5.3337 LearningRate 0.0247 Epoch: 10 Global Step: 168080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:05,448-Speed 9510.99 samples/sec Loss 5.2215 LearningRate 0.0246 Epoch: 10 Global Step: 168090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:06,516-Speed 9597.87 samples/sec Loss 5.2530 LearningRate 0.0246 Epoch: 10 Global Step: 168100 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:20:07,621-Speed 9269.84 samples/sec Loss 5.2041 LearningRate 0.0246 Epoch: 10 Global Step: 168110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:08,721-Speed 9316.22 samples/sec Loss 5.2245 LearningRate 0.0246 Epoch: 10 Global Step: 168120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:09,787-Speed 9611.13 samples/sec Loss 5.3240 LearningRate 0.0246 Epoch: 10 Global Step: 168130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:10,827-Speed 9849.29 samples/sec Loss 5.2707 LearningRate 0.0246 Epoch: 10 Global Step: 168140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:11,883-Speed 9701.08 samples/sec Loss 5.2481 LearningRate 0.0246 Epoch: 10 Global Step: 168150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:13,027-Speed 8959.82 samples/sec Loss 5.2366 LearningRate 0.0246 Epoch: 10 Global Step: 168160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:14,112-Speed 9438.99 samples/sec Loss 5.2910 LearningRate 0.0246 Epoch: 10 Global Step: 168170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:15,185-Speed 9552.99 samples/sec Loss 5.3139 LearningRate 0.0246 Epoch: 10 Global Step: 168180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:16,273-Speed 9421.37 samples/sec Loss 5.2650 LearningRate 0.0246 Epoch: 10 Global Step: 168190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:17,361-Speed 9411.38 samples/sec Loss 5.2570 LearningRate 0.0246 Epoch: 10 Global Step: 168200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:18,410-Speed 9769.18 samples/sec Loss 5.2845 LearningRate 0.0246 Epoch: 10 Global Step: 168210 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:20:19,495-Speed 9442.08 samples/sec Loss 5.3163 LearningRate 0.0246 Epoch: 10 Global Step: 168220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:20,544-Speed 9767.43 samples/sec Loss 5.3817 LearningRate 0.0246 Epoch: 10 Global Step: 168230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:21,609-Speed 9624.83 samples/sec Loss 5.3540 LearningRate 0.0246 Epoch: 10 Global Step: 168240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:22,671-Speed 9649.62 samples/sec Loss 5.3474 LearningRate 0.0246 Epoch: 10 Global Step: 168250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:23,769-Speed 9324.75 samples/sec Loss 5.2647 LearningRate 0.0246 Epoch: 10 Global Step: 168260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:24,862-Speed 9377.01 samples/sec Loss 5.2524 LearningRate 0.0246 Epoch: 10 Global Step: 168270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:25,963-Speed 9310.23 samples/sec Loss 5.3232 LearningRate 0.0246 Epoch: 10 Global Step: 168280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:26,995-Speed 9936.16 samples/sec Loss 5.2763 LearningRate 0.0246 Epoch: 10 Global Step: 168290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:28,079-Speed 9447.28 samples/sec Loss 5.2829 LearningRate 0.0246 Epoch: 10 Global Step: 168300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:29,160-Speed 9479.50 samples/sec Loss 5.2648 LearningRate 0.0246 Epoch: 10 Global Step: 168310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:30,201-Speed 9844.45 samples/sec Loss 5.2528 LearningRate 0.0246 Epoch: 10 Global Step: 168320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:31,297-Speed 9346.65 samples/sec Loss 5.3067 LearningRate 0.0246 Epoch: 10 Global Step: 168330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:32,386-Speed 9404.11 samples/sec Loss 5.3759 LearningRate 0.0246 Epoch: 10 Global Step: 168340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:33,468-Speed 9473.06 samples/sec Loss 5.3329 LearningRate 0.0246 Epoch: 10 Global Step: 168350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:34,593-Speed 9106.88 samples/sec Loss 5.2947 LearningRate 0.0246 Epoch: 10 Global Step: 168360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:35,683-Speed 9393.12 samples/sec Loss 5.4253 LearningRate 0.0246 Epoch: 10 Global Step: 168370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:36,776-Speed 9374.87 samples/sec Loss 5.2042 LearningRate 0.0246 Epoch: 10 Global Step: 168380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:37,851-Speed 9533.02 samples/sec Loss 5.4474 LearningRate 0.0246 Epoch: 10 Global Step: 168390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:38,925-Speed 9539.40 samples/sec Loss 5.2012 LearningRate 0.0246 Epoch: 10 Global Step: 168400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:40,040-Speed 9189.27 samples/sec Loss 5.2820 LearningRate 0.0246 Epoch: 10 Global Step: 168410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:41,170-Speed 9068.15 samples/sec Loss 5.2867 LearningRate 0.0245 Epoch: 10 Global Step: 168420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:42,264-Speed 9371.79 samples/sec Loss 5.4292 LearningRate 0.0245 Epoch: 10 Global Step: 168430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:43,330-Speed 9613.06 samples/sec Loss 5.3466 LearningRate 0.0245 Epoch: 10 Global Step: 168440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:44,443-Speed 9205.72 samples/sec Loss 5.3270 LearningRate 0.0245 Epoch: 10 Global Step: 168450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:45,488-Speed 9799.32 samples/sec Loss 5.3528 LearningRate 0.0245 Epoch: 10 Global Step: 168460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:46,609-Speed 9147.02 samples/sec Loss 5.3870 LearningRate 0.0245 Epoch: 10 Global Step: 168470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:20:47,690-Speed 9474.32 samples/sec Loss 5.4249 LearningRate 0.0245 Epoch: 10 Global Step: 168480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:48,757-Speed 9600.20 samples/sec Loss 5.3230 LearningRate 0.0245 Epoch: 10 Global Step: 168490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:49,844-Speed 9423.35 samples/sec Loss 5.4125 LearningRate 0.0245 Epoch: 10 Global Step: 168500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:50,950-Speed 9274.42 samples/sec Loss 5.3698 LearningRate 0.0245 Epoch: 10 Global Step: 168510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:52,064-Speed 9194.11 samples/sec Loss 5.4017 LearningRate 0.0245 Epoch: 10 Global Step: 168520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:53,181-Speed 9169.11 samples/sec Loss 5.3738 LearningRate 0.0245 Epoch: 10 Global Step: 168530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:54,270-Speed 9416.25 samples/sec Loss 5.3930 LearningRate 0.0245 Epoch: 10 Global Step: 168540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:20:55,342-Speed 9556.72 samples/sec Loss 5.4226 LearningRate 0.0245 Epoch: 10 Global Step: 168550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:56,442-Speed 9308.43 samples/sec Loss 5.3782 LearningRate 0.0245 Epoch: 10 Global Step: 168560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:57,539-Speed 9345.95 samples/sec Loss 5.3596 LearningRate 0.0245 Epoch: 10 Global Step: 168570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:58,635-Speed 9353.50 samples/sec Loss 5.2218 LearningRate 0.0245 Epoch: 10 Global Step: 168580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:20:59,732-Speed 9332.19 samples/sec Loss 5.2361 LearningRate 0.0245 Epoch: 10 Global Step: 168590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:00,873-Speed 8986.14 samples/sec Loss 5.3454 LearningRate 0.0245 Epoch: 10 Global Step: 168600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:01,938-Speed 9620.16 samples/sec Loss 5.3233 LearningRate 0.0245 Epoch: 10 Global Step: 168610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:03,060-Speed 9127.38 samples/sec Loss 5.4084 LearningRate 0.0245 Epoch: 10 Global Step: 168620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:04,177-Speed 9175.41 samples/sec Loss 5.3442 LearningRate 0.0245 Epoch: 10 Global Step: 168630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:05,235-Speed 9689.28 samples/sec Loss 5.3875 LearningRate 0.0245 Epoch: 10 Global Step: 168640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:06,306-Speed 9561.17 samples/sec Loss 5.3704 LearningRate 0.0245 Epoch: 10 Global Step: 168650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:07,397-Speed 9393.90 samples/sec Loss 5.3091 LearningRate 0.0245 Epoch: 10 Global Step: 168660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:08,481-Speed 9449.16 samples/sec Loss 5.3237 LearningRate 0.0245 Epoch: 10 Global Step: 168670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:09,584-Speed 9292.60 samples/sec Loss 5.3110 LearningRate 0.0245 Epoch: 10 Global Step: 168680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:10,664-Speed 9487.42 samples/sec Loss 5.3930 LearningRate 0.0245 Epoch: 10 Global Step: 168690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:11,748-Speed 9451.79 samples/sec Loss 5.3628 LearningRate 0.0245 Epoch: 10 Global Step: 168700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:12,869-Speed 9142.25 samples/sec Loss 5.4420 LearningRate 0.0245 Epoch: 10 Global Step: 168710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:13,963-Speed 9360.20 samples/sec Loss 5.3186 LearningRate 0.0245 Epoch: 10 Global Step: 168720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:15,046-Speed 9465.89 samples/sec Loss 5.3080 LearningRate 0.0245 Epoch: 10 Global Step: 168730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:16,191-Speed 8942.85 samples/sec Loss 5.3076 LearningRate 0.0245 Epoch: 10 Global Step: 168740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:17,249-Speed 9692.41 samples/sec Loss 5.3993 LearningRate 0.0245 Epoch: 10 Global Step: 168750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:18,314-Speed 9614.24 samples/sec Loss 5.4824 LearningRate 0.0244 Epoch: 10 Global Step: 168760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:19,399-Speed 9442.14 samples/sec Loss 5.4362 LearningRate 0.0244 Epoch: 10 Global Step: 168770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:20,485-Speed 9440.83 samples/sec Loss 5.3503 LearningRate 0.0244 Epoch: 10 Global Step: 168780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:21,546-Speed 9654.75 samples/sec Loss 5.3975 LearningRate 0.0244 Epoch: 10 Global Step: 168790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:22,589-Speed 9827.48 samples/sec Loss 5.4326 LearningRate 0.0244 Epoch: 10 Global Step: 168800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:23,662-Speed 9549.18 samples/sec Loss 5.4167 LearningRate 0.0244 Epoch: 10 Global Step: 168810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:24,750-Speed 9419.01 samples/sec Loss 5.3536 LearningRate 0.0244 Epoch: 10 Global Step: 168820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:25,816-Speed 9605.45 samples/sec Loss 5.3717 LearningRate 0.0244 Epoch: 10 Global Step: 168830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:26,900-Speed 9458.98 samples/sec Loss 5.4050 LearningRate 0.0244 Epoch: 10 Global Step: 168840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:27,973-Speed 9547.93 samples/sec Loss 5.3482 LearningRate 0.0244 Epoch: 10 Global Step: 168850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:21:29,035-Speed 9647.88 samples/sec Loss 5.3023 LearningRate 0.0244 Epoch: 10 Global Step: 168860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:21:30,125-Speed 9402.60 samples/sec Loss 5.3436 LearningRate 0.0244 Epoch: 10 Global Step: 168870 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:21:31,186-Speed 9652.34 samples/sec Loss 5.3733 LearningRate 0.0244 Epoch: 10 Global Step: 168880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:32,226-Speed 9854.32 samples/sec Loss 5.4107 LearningRate 0.0244 Epoch: 10 Global Step: 168890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:33,320-Speed 9363.11 samples/sec Loss 5.3256 LearningRate 0.0244 Epoch: 10 Global Step: 168900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:34,430-Speed 9234.53 samples/sec Loss 5.3054 LearningRate 0.0244 Epoch: 10 Global Step: 168910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:35,509-Speed 9495.10 samples/sec Loss 5.4668 LearningRate 0.0244 Epoch: 10 Global Step: 168920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:36,584-Speed 9531.03 samples/sec Loss 5.4098 LearningRate 0.0244 Epoch: 10 Global Step: 168930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:37,641-Speed 9686.12 samples/sec Loss 5.5060 LearningRate 0.0244 Epoch: 10 Global Step: 168940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:38,737-Speed 9357.04 samples/sec Loss 5.3821 LearningRate 0.0244 Epoch: 10 Global Step: 168950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:39,843-Speed 9275.66 samples/sec Loss 5.3673 LearningRate 0.0244 Epoch: 10 Global Step: 168960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:40,913-Speed 9568.51 samples/sec Loss 5.2791 LearningRate 0.0244 Epoch: 10 Global Step: 168970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:41,997-Speed 9454.80 samples/sec Loss 5.3772 LearningRate 0.0244 Epoch: 10 Global Step: 168980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:21:43,137-Speed 8991.43 samples/sec Loss 5.4713 LearningRate 0.0244 Epoch: 10 Global Step: 168990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:44,211-Speed 9540.46 samples/sec Loss 5.3793 LearningRate 0.0244 Epoch: 10 Global Step: 169000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:45,286-Speed 9524.96 samples/sec Loss 5.3593 LearningRate 0.0244 Epoch: 10 Global Step: 169010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:46,425-Speed 8999.33 samples/sec Loss 5.3428 LearningRate 0.0244 Epoch: 10 Global Step: 169020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:47,540-Speed 9187.01 samples/sec Loss 5.4039 LearningRate 0.0244 Epoch: 10 Global Step: 169030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:48,601-Speed 9658.48 samples/sec Loss 5.2987 LearningRate 0.0244 Epoch: 10 Global Step: 169040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:49,679-Speed 9507.34 samples/sec Loss 5.3563 LearningRate 0.0244 Epoch: 10 Global Step: 169050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:50,774-Speed 9358.86 samples/sec Loss 5.4067 LearningRate 0.0244 Epoch: 10 Global Step: 169060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:51,842-Speed 9595.54 samples/sec Loss 5.4830 LearningRate 0.0244 Epoch: 10 Global Step: 169070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:21:52,934-Speed 9383.05 samples/sec Loss 5.3854 LearningRate 0.0244 Epoch: 10 Global Step: 169080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:53,982-Speed 9782.04 samples/sec Loss 5.3903 LearningRate 0.0244 Epoch: 10 Global Step: 169090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:55,099-Speed 9175.27 samples/sec Loss 5.4448 LearningRate 0.0243 Epoch: 10 Global Step: 169100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:56,175-Speed 9520.21 samples/sec Loss 5.4781 LearningRate 0.0243 Epoch: 10 Global Step: 169110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:57,245-Speed 9584.02 samples/sec Loss 5.4930 LearningRate 0.0243 Epoch: 10 Global Step: 169120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:58,333-Speed 9419.05 samples/sec Loss 5.4283 LearningRate 0.0243 Epoch: 10 Global Step: 169130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:21:59,455-Speed 9130.85 samples/sec Loss 5.4095 LearningRate 0.0243 Epoch: 10 Global Step: 169140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:00,503-Speed 9779.88 samples/sec Loss 5.4865 LearningRate 0.0243 Epoch: 10 Global Step: 169150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:01,572-Speed 9582.09 samples/sec Loss 5.4153 LearningRate 0.0243 Epoch: 10 Global Step: 169160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:02,668-Speed 9354.71 samples/sec Loss 5.3437 LearningRate 0.0243 Epoch: 10 Global Step: 169170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:03,756-Speed 9409.82 samples/sec Loss 5.4118 LearningRate 0.0243 Epoch: 10 Global Step: 169180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:04,821-Speed 9626.87 samples/sec Loss 5.5152 LearningRate 0.0243 Epoch: 10 Global Step: 169190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:05,935-Speed 9193.62 samples/sec Loss 5.4058 LearningRate 0.0243 Epoch: 10 Global Step: 169200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:07,051-Speed 9184.75 samples/sec Loss 5.4124 LearningRate 0.0243 Epoch: 10 Global Step: 169210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:08,101-Speed 9756.13 samples/sec Loss 5.4187 LearningRate 0.0243 Epoch: 10 Global Step: 169220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:09,182-Speed 9473.24 samples/sec Loss 5.3629 LearningRate 0.0243 Epoch: 10 Global Step: 169230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:10,289-Speed 9264.69 samples/sec Loss 5.5154 LearningRate 0.0243 Epoch: 10 Global Step: 169240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:11,356-Speed 9595.11 samples/sec Loss 5.4875 LearningRate 0.0243 Epoch: 10 Global Step: 169250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:12,490-Speed 9037.56 samples/sec Loss 5.3914 LearningRate 0.0243 Epoch: 10 Global Step: 169260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:13,604-Speed 9200.08 samples/sec Loss 5.3951 LearningRate 0.0243 Epoch: 10 Global Step: 169270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:14,699-Speed 9350.27 samples/sec Loss 5.4108 LearningRate 0.0243 Epoch: 10 Global Step: 169280 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:22:15,780-Speed 9477.11 samples/sec Loss 5.4209 LearningRate 0.0243 Epoch: 10 Global Step: 169290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:16,855-Speed 9536.35 samples/sec Loss 5.3728 LearningRate 0.0243 Epoch: 10 Global Step: 169300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:17,936-Speed 9475.30 samples/sec Loss 5.3770 LearningRate 0.0243 Epoch: 10 Global Step: 169310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:19,023-Speed 9433.69 samples/sec Loss 5.3828 LearningRate 0.0243 Epoch: 10 Global Step: 169320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:20,085-Speed 9640.64 samples/sec Loss 5.4742 LearningRate 0.0243 Epoch: 10 Global Step: 169330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:21,173-Speed 9426.29 samples/sec Loss 5.4912 LearningRate 0.0243 Epoch: 10 Global Step: 169340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:22,236-Speed 9635.06 samples/sec Loss 5.5194 LearningRate 0.0243 Epoch: 10 Global Step: 169350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:23,311-Speed 9533.17 samples/sec Loss 5.4849 LearningRate 0.0243 Epoch: 10 Global Step: 169360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:24,381-Speed 9577.17 samples/sec Loss 5.4957 LearningRate 0.0243 Epoch: 10 Global Step: 169370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:25,456-Speed 9530.18 samples/sec Loss 5.4946 LearningRate 0.0243 Epoch: 10 Global Step: 169380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:22:26,560-Speed 9273.31 samples/sec Loss 5.4763 LearningRate 0.0243 Epoch: 10 Global Step: 169390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:27,685-Speed 9112.99 samples/sec Loss 5.4273 LearningRate 0.0243 Epoch: 10 Global Step: 169400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:28,794-Speed 9240.05 samples/sec Loss 5.3479 LearningRate 0.0243 Epoch: 10 Global Step: 169410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:29,900-Speed 9259.03 samples/sec Loss 5.4222 LearningRate 0.0243 Epoch: 10 Global Step: 169420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:30,996-Speed 9354.10 samples/sec Loss 5.4416 LearningRate 0.0243 Epoch: 10 Global Step: 169430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:32,104-Speed 9240.67 samples/sec Loss 5.4436 LearningRate 0.0242 Epoch: 10 Global Step: 169440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:33,251-Speed 8939.36 samples/sec Loss 5.3906 LearningRate 0.0242 Epoch: 10 Global Step: 169450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:34,311-Speed 9659.30 samples/sec Loss 5.4408 LearningRate 0.0242 Epoch: 10 Global Step: 169460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:35,424-Speed 9207.45 samples/sec Loss 5.4166 LearningRate 0.0242 Epoch: 10 Global Step: 169470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:36,502-Speed 9507.99 samples/sec Loss 5.4078 LearningRate 0.0242 Epoch: 10 Global Step: 169480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:37,607-Speed 9273.54 samples/sec Loss 5.4280 LearningRate 0.0242 Epoch: 10 Global Step: 169490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:22:38,664-Speed 9689.53 samples/sec Loss 5.5519 LearningRate 0.0242 Epoch: 10 Global Step: 169500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:22:39,739-Speed 9536.80 samples/sec Loss 5.3574 LearningRate 0.0242 Epoch: 10 Global Step: 169510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:40,779-Speed 9851.94 samples/sec Loss 5.4226 LearningRate 0.0242 Epoch: 10 Global Step: 169520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:41,922-Speed 8965.98 samples/sec Loss 5.3986 LearningRate 0.0242 Epoch: 10 Global Step: 169530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:43,015-Speed 9372.80 samples/sec Loss 5.4868 LearningRate 0.0242 Epoch: 10 Global Step: 169540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:44,138-Speed 9127.80 samples/sec Loss 5.4303 LearningRate 0.0242 Epoch: 10 Global Step: 169550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:45,237-Speed 9319.00 samples/sec Loss 5.3804 LearningRate 0.0242 Epoch: 10 Global Step: 169560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:46,304-Speed 9600.89 samples/sec Loss 5.3266 LearningRate 0.0242 Epoch: 10 Global Step: 169570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:47,426-Speed 9138.62 samples/sec Loss 5.4250 LearningRate 0.0242 Epoch: 10 Global Step: 169580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:48,498-Speed 9554.44 samples/sec Loss 5.4349 LearningRate 0.0242 Epoch: 10 Global Step: 169590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:49,574-Speed 9519.68 samples/sec Loss 5.4848 LearningRate 0.0242 Epoch: 10 Global Step: 169600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:50,681-Speed 9257.92 samples/sec Loss 5.4949 LearningRate 0.0242 Epoch: 10 Global Step: 169610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:51,748-Speed 9606.89 samples/sec Loss 5.5159 LearningRate 0.0242 Epoch: 10 Global Step: 169620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:52,838-Speed 9400.23 samples/sec Loss 5.4174 LearningRate 0.0242 Epoch: 10 Global Step: 169630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:53,954-Speed 9176.03 samples/sec Loss 5.3982 LearningRate 0.0242 Epoch: 10 Global Step: 169640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:55,045-Speed 9390.41 samples/sec Loss 5.4251 LearningRate 0.0242 Epoch: 10 Global Step: 169650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:56,136-Speed 9393.20 samples/sec Loss 5.4170 LearningRate 0.0242 Epoch: 10 Global Step: 169660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:57,211-Speed 9535.70 samples/sec Loss 5.4258 LearningRate 0.0242 Epoch: 10 Global Step: 169670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:58,308-Speed 9343.07 samples/sec Loss 5.4101 LearningRate 0.0242 Epoch: 10 Global Step: 169680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:22:59,390-Speed 9467.55 samples/sec Loss 5.5128 LearningRate 0.0242 Epoch: 10 Global Step: 169690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:00,483-Speed 9372.99 samples/sec Loss 5.4370 LearningRate 0.0242 Epoch: 10 Global Step: 169700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:01,590-Speed 9259.83 samples/sec Loss 5.4809 LearningRate 0.0242 Epoch: 10 Global Step: 169710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:23:02,660-Speed 9576.21 samples/sec Loss 5.4753 LearningRate 0.0242 Epoch: 10 Global Step: 169720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:03,753-Speed 9374.17 samples/sec Loss 5.4760 LearningRate 0.0242 Epoch: 10 Global Step: 169730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:04,811-Speed 9680.59 samples/sec Loss 5.4206 LearningRate 0.0242 Epoch: 10 Global Step: 169740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:05,861-Speed 9755.93 samples/sec Loss 5.4535 LearningRate 0.0242 Epoch: 10 Global Step: 169750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:06,935-Speed 9543.00 samples/sec Loss 5.4249 LearningRate 0.0242 Epoch: 10 Global Step: 169760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:08,020-Speed 9437.07 samples/sec Loss 5.3977 LearningRate 0.0242 Epoch: 10 Global Step: 169770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:09,151-Speed 9063.90 samples/sec Loss 5.4157 LearningRate 0.0241 Epoch: 10 Global Step: 169780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:10,223-Speed 9561.48 samples/sec Loss 5.4111 LearningRate 0.0241 Epoch: 10 Global Step: 169790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:11,264-Speed 9836.30 samples/sec Loss 5.4512 LearningRate 0.0241 Epoch: 10 Global Step: 169800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:12,369-Speed 9275.13 samples/sec Loss 5.5286 LearningRate 0.0241 Epoch: 10 Global Step: 169810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:13,529-Speed 8833.54 samples/sec Loss 5.4829 LearningRate 0.0241 Epoch: 10 Global Step: 169820 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:23:14,635-Speed 9266.43 samples/sec Loss 5.4663 LearningRate 0.0241 Epoch: 10 Global Step: 169830 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:23:15,754-Speed 9162.57 samples/sec Loss 5.5038 LearningRate 0.0241 Epoch: 10 Global Step: 169840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:16,858-Speed 9279.39 samples/sec Loss 5.3786 LearningRate 0.0241 Epoch: 10 Global Step: 169850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:18,011-Speed 8890.60 samples/sec Loss 5.4285 LearningRate 0.0241 Epoch: 10 Global Step: 169860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:19,105-Speed 9365.71 samples/sec Loss 5.5124 LearningRate 0.0241 Epoch: 10 Global Step: 169870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:23:20,211-Speed 9261.14 samples/sec Loss 5.4090 LearningRate 0.0241 Epoch: 10 Global Step: 169880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:23:21,304-Speed 9375.45 samples/sec Loss 5.5194 LearningRate 0.0241 Epoch: 10 Global Step: 169890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:23:22,367-Speed 9636.44 samples/sec Loss 5.6298 LearningRate 0.0241 Epoch: 10 Global Step: 169900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:23:23,441-Speed 9541.20 samples/sec Loss 5.5117 LearningRate 0.0241 Epoch: 10 Global Step: 169910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:23:24,607-Speed 8783.74 samples/sec Loss 5.4981 LearningRate 0.0241 Epoch: 10 Global Step: 169920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:23:25,675-Speed 9601.13 samples/sec Loss 5.4676 LearningRate 0.0241 Epoch: 10 Global Step: 169930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:23:26,715-Speed 9855.05 samples/sec Loss 5.5257 LearningRate 0.0241 Epoch: 10 Global Step: 169940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 18:23:27,789-Speed 9543.35 samples/sec Loss 5.5225 LearningRate 0.0241 Epoch: 10 Global Step: 169950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 18:23:28,881-Speed 9379.32 samples/sec Loss 5.4860 LearningRate 0.0241 Epoch: 10 Global Step: 169960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 18:23:29,945-Speed 9633.17 samples/sec Loss 5.4784 LearningRate 0.0241 Epoch: 10 Global Step: 169970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 18:23:31,030-Speed 9441.28 samples/sec Loss 5.4852 LearningRate 0.0241 Epoch: 10 Global Step: 169980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 18:23:32,118-Speed 9421.77 samples/sec Loss 5.4222 LearningRate 0.0241 Epoch: 10 Global Step: 169990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 18:23:33,247-Speed 9069.67 samples/sec Loss 5.4706 LearningRate 0.0241 Epoch: 10 Global Step: 170000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 18:23:55,361-[lfw][170000]XNorm: 9.365612 Training: 2022-04-11 18:23:55,362-[lfw][170000]Accuracy-Flip: 0.99683+-0.00241 Training: 2022-04-11 18:23:55,362-[lfw][170000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:24:20,886-[cfp_fp][170000]XNorm: 7.998890 Training: 2022-04-11 18:24:20,887-[cfp_fp][170000]Accuracy-Flip: 0.96343+-0.00876 Training: 2022-04-11 18:24:20,887-[cfp_fp][170000]Accuracy-Highest: 0.96500 Training: 2022-04-11 18:24:42,948-[agedb_30][170000]XNorm: 9.086145 Training: 2022-04-11 18:24:42,948-[agedb_30][170000]Accuracy-Flip: 0.96600+-0.01073 Training: 2022-04-11 18:24:42,949-[agedb_30][170000]Accuracy-Highest: 0.96917 Training: 2022-04-11 18:24:44,007-Speed 144.72 samples/sec Loss 5.4968 LearningRate 0.0241 Epoch: 10 Global Step: 170010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 18:24:45,050-Speed 9822.35 samples/sec Loss 5.4034 LearningRate 0.0241 Epoch: 10 Global Step: 170020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 18:24:46,159-Speed 9234.98 samples/sec Loss 5.5051 LearningRate 0.0241 Epoch: 10 Global Step: 170030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 18:24:47,243-Speed 9448.28 samples/sec Loss 5.5866 LearningRate 0.0241 Epoch: 10 Global Step: 170040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:24:48,328-Speed 9448.99 samples/sec Loss 5.5471 LearningRate 0.0241 Epoch: 10 Global Step: 170050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:24:49,417-Speed 9402.68 samples/sec Loss 5.5502 LearningRate 0.0241 Epoch: 10 Global Step: 170060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:24:50,497-Speed 9493.88 samples/sec Loss 5.5402 LearningRate 0.0241 Epoch: 10 Global Step: 170070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:24:51,601-Speed 9277.84 samples/sec Loss 5.5368 LearningRate 0.0241 Epoch: 10 Global Step: 170080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:24:52,686-Speed 9443.47 samples/sec Loss 5.5246 LearningRate 0.0241 Epoch: 10 Global Step: 170090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:24:53,849-Speed 8808.09 samples/sec Loss 5.5093 LearningRate 0.0241 Epoch: 10 Global Step: 170100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:24:54,951-Speed 9297.74 samples/sec Loss 5.4802 LearningRate 0.0241 Epoch: 10 Global Step: 170110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:24:56,020-Speed 9599.99 samples/sec Loss 5.5595 LearningRate 0.0240 Epoch: 10 Global Step: 170120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:24:57,115-Speed 9350.97 samples/sec Loss 5.3908 LearningRate 0.0240 Epoch: 10 Global Step: 170130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:24:58,253-Speed 9003.60 samples/sec Loss 5.3814 LearningRate 0.0240 Epoch: 10 Global Step: 170140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:24:59,299-Speed 9802.33 samples/sec Loss 5.5369 LearningRate 0.0240 Epoch: 10 Global Step: 170150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:00,378-Speed 9493.46 samples/sec Loss 5.4974 LearningRate 0.0240 Epoch: 10 Global Step: 170160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:01,482-Speed 9279.41 samples/sec Loss 5.4403 LearningRate 0.0240 Epoch: 10 Global Step: 170170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:02,598-Speed 9191.55 samples/sec Loss 5.4680 LearningRate 0.0240 Epoch: 10 Global Step: 170180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:03,680-Speed 9467.40 samples/sec Loss 5.5410 LearningRate 0.0240 Epoch: 10 Global Step: 170190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:04,763-Speed 9456.46 samples/sec Loss 5.4962 LearningRate 0.0240 Epoch: 10 Global Step: 170200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:05,815-Speed 9744.23 samples/sec Loss 5.4674 LearningRate 0.0240 Epoch: 10 Global Step: 170210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:06,917-Speed 9293.03 samples/sec Loss 5.5122 LearningRate 0.0240 Epoch: 10 Global Step: 170220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:07,976-Speed 9680.01 samples/sec Loss 5.4450 LearningRate 0.0240 Epoch: 10 Global Step: 170230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:09,088-Speed 9209.02 samples/sec Loss 5.3993 LearningRate 0.0240 Epoch: 10 Global Step: 170240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:10,200-Speed 9214.01 samples/sec Loss 5.4117 LearningRate 0.0240 Epoch: 10 Global Step: 170250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:11,288-Speed 9419.12 samples/sec Loss 5.4890 LearningRate 0.0240 Epoch: 10 Global Step: 170260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:12,348-Speed 9666.97 samples/sec Loss 5.3565 LearningRate 0.0240 Epoch: 10 Global Step: 170270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:13,490-Speed 8966.45 samples/sec Loss 5.5264 LearningRate 0.0240 Epoch: 10 Global Step: 170280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:14,599-Speed 9241.83 samples/sec Loss 5.4491 LearningRate 0.0240 Epoch: 10 Global Step: 170290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:15,699-Speed 9320.92 samples/sec Loss 5.4747 LearningRate 0.0240 Epoch: 10 Global Step: 170300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:16,760-Speed 9653.37 samples/sec Loss 5.5500 LearningRate 0.0240 Epoch: 10 Global Step: 170310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:17,851-Speed 9394.87 samples/sec Loss 5.4447 LearningRate 0.0240 Epoch: 10 Global Step: 170320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:18,926-Speed 9532.12 samples/sec Loss 5.5045 LearningRate 0.0240 Epoch: 10 Global Step: 170330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:19,993-Speed 9601.57 samples/sec Loss 5.4673 LearningRate 0.0240 Epoch: 10 Global Step: 170340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:21,109-Speed 9175.54 samples/sec Loss 5.5769 LearningRate 0.0240 Epoch: 10 Global Step: 170350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:22,197-Speed 9423.24 samples/sec Loss 5.5412 LearningRate 0.0240 Epoch: 10 Global Step: 170360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:23,323-Speed 9098.94 samples/sec Loss 5.5018 LearningRate 0.0240 Epoch: 10 Global Step: 170370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:24,412-Speed 9405.23 samples/sec Loss 5.5100 LearningRate 0.0240 Epoch: 10 Global Step: 170380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:25,474-Speed 9647.10 samples/sec Loss 5.5816 LearningRate 0.0240 Epoch: 10 Global Step: 170390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:25:26,576-Speed 9298.08 samples/sec Loss 5.5155 LearningRate 0.0240 Epoch: 10 Global Step: 170400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:25:27,701-Speed 9107.87 samples/sec Loss 5.4877 LearningRate 0.0240 Epoch: 10 Global Step: 170410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:25:28,781-Speed 9487.47 samples/sec Loss 5.5453 LearningRate 0.0240 Epoch: 10 Global Step: 170420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:25:29,892-Speed 9224.38 samples/sec Loss 5.4191 LearningRate 0.0240 Epoch: 10 Global Step: 170430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:25:31,007-Speed 9184.33 samples/sec Loss 5.5528 LearningRate 0.0240 Epoch: 10 Global Step: 170440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:25:32,074-Speed 9601.12 samples/sec Loss 5.4337 LearningRate 0.0240 Epoch: 10 Global Step: 170450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:25:33,232-Speed 8857.88 samples/sec Loss 5.4608 LearningRate 0.0239 Epoch: 10 Global Step: 170460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:25:34,360-Speed 9079.40 samples/sec Loss 5.3941 LearningRate 0.0239 Epoch: 10 Global Step: 170470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:25:35,418-Speed 9679.72 samples/sec Loss 5.5694 LearningRate 0.0239 Epoch: 10 Global Step: 170480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:25:36,533-Speed 9192.21 samples/sec Loss 5.5346 LearningRate 0.0239 Epoch: 10 Global Step: 170490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:37,633-Speed 9319.21 samples/sec Loss 5.5142 LearningRate 0.0239 Epoch: 10 Global Step: 170500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:38,757-Speed 9110.51 samples/sec Loss 5.5642 LearningRate 0.0239 Epoch: 10 Global Step: 170510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:39,871-Speed 9202.34 samples/sec Loss 5.5447 LearningRate 0.0239 Epoch: 10 Global Step: 170520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:40,955-Speed 9453.91 samples/sec Loss 5.4551 LearningRate 0.0239 Epoch: 10 Global Step: 170530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:42,056-Speed 9304.82 samples/sec Loss 5.5259 LearningRate 0.0239 Epoch: 10 Global Step: 170540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:43,137-Speed 9475.25 samples/sec Loss 5.4545 LearningRate 0.0239 Epoch: 10 Global Step: 170550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:44,223-Speed 9438.36 samples/sec Loss 5.5004 LearningRate 0.0239 Epoch: 10 Global Step: 170560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:45,319-Speed 9350.80 samples/sec Loss 5.4674 LearningRate 0.0239 Epoch: 10 Global Step: 170570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:46,380-Speed 9652.45 samples/sec Loss 5.4419 LearningRate 0.0239 Epoch: 10 Global Step: 170580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:47,456-Speed 9525.27 samples/sec Loss 5.5694 LearningRate 0.0239 Epoch: 10 Global Step: 170590 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:25:48,559-Speed 9287.44 samples/sec Loss 5.5242 LearningRate 0.0239 Epoch: 10 Global Step: 170600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:49,653-Speed 9362.16 samples/sec Loss 5.5136 LearningRate 0.0239 Epoch: 10 Global Step: 170610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:50,737-Speed 9459.94 samples/sec Loss 5.6318 LearningRate 0.0239 Epoch: 10 Global Step: 170620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:51,818-Speed 9473.32 samples/sec Loss 5.5814 LearningRate 0.0239 Epoch: 10 Global Step: 170630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:52,946-Speed 9090.10 samples/sec Loss 5.5015 LearningRate 0.0239 Epoch: 10 Global Step: 170640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:54,085-Speed 8993.78 samples/sec Loss 5.6197 LearningRate 0.0239 Epoch: 10 Global Step: 170650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:55,190-Speed 9275.74 samples/sec Loss 5.5353 LearningRate 0.0239 Epoch: 10 Global Step: 170660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:56,284-Speed 9364.07 samples/sec Loss 5.6055 LearningRate 0.0239 Epoch: 10 Global Step: 170670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:57,386-Speed 9296.84 samples/sec Loss 5.6092 LearningRate 0.0239 Epoch: 10 Global Step: 170680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:58,482-Speed 9351.29 samples/sec Loss 5.6942 LearningRate 0.0239 Epoch: 10 Global Step: 170690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:25:59,601-Speed 9156.45 samples/sec Loss 5.5025 LearningRate 0.0239 Epoch: 10 Global Step: 170700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:26:00,681-Speed 9486.94 samples/sec Loss 5.4927 LearningRate 0.0239 Epoch: 10 Global Step: 170710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:01,830-Speed 8926.96 samples/sec Loss 5.5628 LearningRate 0.0239 Epoch: 10 Global Step: 170720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:02,921-Speed 9390.70 samples/sec Loss 5.4806 LearningRate 0.0239 Epoch: 10 Global Step: 170730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:03,975-Speed 9720.08 samples/sec Loss 5.5553 LearningRate 0.0239 Epoch: 10 Global Step: 170740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:05,139-Speed 8805.20 samples/sec Loss 5.4719 LearningRate 0.0239 Epoch: 10 Global Step: 170750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:06,237-Speed 9327.01 samples/sec Loss 5.4800 LearningRate 0.0239 Epoch: 10 Global Step: 170760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:07,348-Speed 9224.20 samples/sec Loss 5.4833 LearningRate 0.0239 Epoch: 10 Global Step: 170770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:08,460-Speed 9221.33 samples/sec Loss 5.5335 LearningRate 0.0239 Epoch: 10 Global Step: 170780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:09,582-Speed 9128.30 samples/sec Loss 5.5435 LearningRate 0.0239 Epoch: 10 Global Step: 170790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:10,683-Speed 9302.14 samples/sec Loss 5.5220 LearningRate 0.0238 Epoch: 10 Global Step: 170800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:11,769-Speed 9438.87 samples/sec Loss 5.4829 LearningRate 0.0238 Epoch: 10 Global Step: 170810 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:26:12,854-Speed 9443.10 samples/sec Loss 5.5398 LearningRate 0.0238 Epoch: 10 Global Step: 170820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:13,925-Speed 9564.81 samples/sec Loss 5.4062 LearningRate 0.0238 Epoch: 10 Global Step: 170830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:15,008-Speed 9457.89 samples/sec Loss 5.4543 LearningRate 0.0238 Epoch: 10 Global Step: 170840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:16,050-Speed 9833.47 samples/sec Loss 5.4955 LearningRate 0.0238 Epoch: 10 Global Step: 170850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:17,130-Speed 9489.90 samples/sec Loss 5.4974 LearningRate 0.0238 Epoch: 10 Global Step: 170860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:18,233-Speed 9287.61 samples/sec Loss 5.4537 LearningRate 0.0238 Epoch: 10 Global Step: 170870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:19,346-Speed 9203.60 samples/sec Loss 5.4648 LearningRate 0.0238 Epoch: 10 Global Step: 170880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:20,444-Speed 9338.48 samples/sec Loss 5.4849 LearningRate 0.0238 Epoch: 10 Global Step: 170890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:21,529-Speed 9438.49 samples/sec Loss 5.5564 LearningRate 0.0238 Epoch: 10 Global Step: 170900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:22,654-Speed 9112.66 samples/sec Loss 5.5614 LearningRate 0.0238 Epoch: 10 Global Step: 170910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:23,741-Speed 9425.43 samples/sec Loss 5.5351 LearningRate 0.0238 Epoch: 10 Global Step: 170920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:26:24,804-Speed 9643.99 samples/sec Loss 5.5650 LearningRate 0.0238 Epoch: 10 Global Step: 170930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:26:25,860-Speed 9694.32 samples/sec Loss 5.5257 LearningRate 0.0238 Epoch: 10 Global Step: 170940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:26:26,927-Speed 9602.57 samples/sec Loss 5.5267 LearningRate 0.0238 Epoch: 10 Global Step: 170950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:26:28,023-Speed 9351.25 samples/sec Loss 5.4834 LearningRate 0.0238 Epoch: 10 Global Step: 170960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:26:29,106-Speed 9458.46 samples/sec Loss 5.5675 LearningRate 0.0238 Epoch: 10 Global Step: 170970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:26:30,186-Speed 9492.81 samples/sec Loss 5.5020 LearningRate 0.0238 Epoch: 10 Global Step: 170980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:26:31,256-Speed 9585.87 samples/sec Loss 5.5465 LearningRate 0.0238 Epoch: 10 Global Step: 170990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:26:32,396-Speed 8983.43 samples/sec Loss 5.5343 LearningRate 0.0238 Epoch: 10 Global Step: 171000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:26:33,487-Speed 9388.58 samples/sec Loss 5.5768 LearningRate 0.0238 Epoch: 10 Global Step: 171010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:26:34,551-Speed 9629.77 samples/sec Loss 5.5232 LearningRate 0.0238 Epoch: 10 Global Step: 171020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:26:35,669-Speed 9168.18 samples/sec Loss 5.5397 LearningRate 0.0238 Epoch: 10 Global Step: 171030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:36,771-Speed 9292.12 samples/sec Loss 5.4636 LearningRate 0.0238 Epoch: 10 Global Step: 171040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:37,872-Speed 9311.95 samples/sec Loss 5.5524 LearningRate 0.0238 Epoch: 10 Global Step: 171050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:38,937-Speed 9620.09 samples/sec Loss 5.5558 LearningRate 0.0238 Epoch: 10 Global Step: 171060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:40,022-Speed 9440.18 samples/sec Loss 5.6030 LearningRate 0.0238 Epoch: 10 Global Step: 171070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:41,150-Speed 9085.16 samples/sec Loss 5.5794 LearningRate 0.0238 Epoch: 10 Global Step: 171080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:42,233-Speed 9466.86 samples/sec Loss 5.4805 LearningRate 0.0238 Epoch: 10 Global Step: 171090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:43,313-Speed 9489.74 samples/sec Loss 5.5381 LearningRate 0.0238 Epoch: 10 Global Step: 171100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:44,478-Speed 8790.06 samples/sec Loss 5.5403 LearningRate 0.0238 Epoch: 10 Global Step: 171110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:45,568-Speed 9403.97 samples/sec Loss 5.6051 LearningRate 0.0238 Epoch: 10 Global Step: 171120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:46,656-Speed 9409.92 samples/sec Loss 5.5571 LearningRate 0.0238 Epoch: 10 Global Step: 171130 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:26:47,715-Speed 9682.35 samples/sec Loss 5.4747 LearningRate 0.0237 Epoch: 10 Global Step: 171140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:48,829-Speed 9193.05 samples/sec Loss 5.4703 LearningRate 0.0237 Epoch: 10 Global Step: 171150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:49,906-Speed 9521.05 samples/sec Loss 5.4527 LearningRate 0.0237 Epoch: 10 Global Step: 171160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:51,011-Speed 9271.61 samples/sec Loss 5.5124 LearningRate 0.0237 Epoch: 10 Global Step: 171170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:52,139-Speed 9080.13 samples/sec Loss 5.6144 LearningRate 0.0237 Epoch: 10 Global Step: 171180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:53,221-Speed 9472.73 samples/sec Loss 5.5612 LearningRate 0.0237 Epoch: 10 Global Step: 171190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:54,320-Speed 9320.00 samples/sec Loss 5.4660 LearningRate 0.0237 Epoch: 10 Global Step: 171200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:55,379-Speed 9676.58 samples/sec Loss 5.5171 LearningRate 0.0237 Epoch: 10 Global Step: 171210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:56,504-Speed 9106.97 samples/sec Loss 5.6502 LearningRate 0.0237 Epoch: 10 Global Step: 171220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:57,589-Speed 9441.13 samples/sec Loss 5.5783 LearningRate 0.0237 Epoch: 10 Global Step: 171230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:26:58,682-Speed 9373.42 samples/sec Loss 5.6178 LearningRate 0.0237 Epoch: 10 Global Step: 171240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:26:59,788-Speed 9262.33 samples/sec Loss 5.4744 LearningRate 0.0237 Epoch: 10 Global Step: 171250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:00,895-Speed 9264.90 samples/sec Loss 5.5239 LearningRate 0.0237 Epoch: 10 Global Step: 171260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:01,992-Speed 9341.27 samples/sec Loss 5.5629 LearningRate 0.0237 Epoch: 10 Global Step: 171270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:03,127-Speed 9021.99 samples/sec Loss 5.6025 LearningRate 0.0237 Epoch: 10 Global Step: 171280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:04,223-Speed 9346.69 samples/sec Loss 5.6275 LearningRate 0.0237 Epoch: 10 Global Step: 171290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:05,346-Speed 9123.45 samples/sec Loss 5.4352 LearningRate 0.0237 Epoch: 10 Global Step: 171300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:06,459-Speed 9203.02 samples/sec Loss 5.4872 LearningRate 0.0237 Epoch: 10 Global Step: 171310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:07,585-Speed 9105.88 samples/sec Loss 5.5624 LearningRate 0.0237 Epoch: 10 Global Step: 171320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:08,672-Speed 9424.25 samples/sec Loss 5.5347 LearningRate 0.0237 Epoch: 10 Global Step: 171330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:09,739-Speed 9598.88 samples/sec Loss 5.6009 LearningRate 0.0237 Epoch: 10 Global Step: 171340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:10,799-Speed 9666.17 samples/sec Loss 5.4960 LearningRate 0.0237 Epoch: 10 Global Step: 171350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:11,897-Speed 9336.86 samples/sec Loss 5.5358 LearningRate 0.0237 Epoch: 10 Global Step: 171360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:12,968-Speed 9564.48 samples/sec Loss 5.4872 LearningRate 0.0237 Epoch: 10 Global Step: 171370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:14,039-Speed 9562.88 samples/sec Loss 5.5519 LearningRate 0.0237 Epoch: 10 Global Step: 171380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:15,145-Speed 9265.43 samples/sec Loss 5.4932 LearningRate 0.0237 Epoch: 10 Global Step: 171390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:16,204-Speed 9672.82 samples/sec Loss 5.4654 LearningRate 0.0237 Epoch: 10 Global Step: 171400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:17,344-Speed 8989.02 samples/sec Loss 5.6305 LearningRate 0.0237 Epoch: 10 Global Step: 171410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:18,426-Speed 9468.63 samples/sec Loss 5.5840 LearningRate 0.0237 Epoch: 10 Global Step: 171420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:19,515-Speed 9419.67 samples/sec Loss 5.5152 LearningRate 0.0237 Epoch: 10 Global Step: 171430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:20,651-Speed 9019.27 samples/sec Loss 5.5365 LearningRate 0.0237 Epoch: 10 Global Step: 171440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:21,773-Speed 9134.72 samples/sec Loss 5.5302 LearningRate 0.0237 Epoch: 10 Global Step: 171450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:27:22,858-Speed 9443.72 samples/sec Loss 5.6631 LearningRate 0.0237 Epoch: 10 Global Step: 171460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:23,924-Speed 9614.88 samples/sec Loss 5.5453 LearningRate 0.0237 Epoch: 10 Global Step: 171470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:24,998-Speed 9540.76 samples/sec Loss 5.5755 LearningRate 0.0236 Epoch: 10 Global Step: 171480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:26,104-Speed 9257.47 samples/sec Loss 5.5644 LearningRate 0.0236 Epoch: 10 Global Step: 171490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:27,210-Speed 9270.96 samples/sec Loss 5.5750 LearningRate 0.0236 Epoch: 10 Global Step: 171500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:28,275-Speed 9612.29 samples/sec Loss 5.5567 LearningRate 0.0236 Epoch: 10 Global Step: 171510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:29,404-Speed 9081.26 samples/sec Loss 5.5473 LearningRate 0.0236 Epoch: 10 Global Step: 171520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:30,471-Speed 9600.71 samples/sec Loss 5.4623 LearningRate 0.0236 Epoch: 10 Global Step: 171530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:31,590-Speed 9154.59 samples/sec Loss 5.4775 LearningRate 0.0236 Epoch: 10 Global Step: 171540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:32,712-Speed 9132.77 samples/sec Loss 5.5077 LearningRate 0.0236 Epoch: 10 Global Step: 171550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:33,818-Speed 9263.26 samples/sec Loss 5.5119 LearningRate 0.0236 Epoch: 10 Global Step: 171560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:34,891-Speed 9546.19 samples/sec Loss 5.4713 LearningRate 0.0236 Epoch: 10 Global Step: 171570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:35,954-Speed 9644.69 samples/sec Loss 5.4834 LearningRate 0.0236 Epoch: 10 Global Step: 171580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:37,073-Speed 9150.80 samples/sec Loss 5.5795 LearningRate 0.0236 Epoch: 10 Global Step: 171590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:38,245-Speed 8747.79 samples/sec Loss 5.5465 LearningRate 0.0236 Epoch: 10 Global Step: 171600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:39,350-Speed 9269.70 samples/sec Loss 5.5165 LearningRate 0.0236 Epoch: 10 Global Step: 171610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:40,428-Speed 9508.84 samples/sec Loss 5.4851 LearningRate 0.0236 Epoch: 10 Global Step: 171620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:41,528-Speed 9319.69 samples/sec Loss 5.4647 LearningRate 0.0236 Epoch: 10 Global Step: 171630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:42,623-Speed 9356.10 samples/sec Loss 5.5020 LearningRate 0.0236 Epoch: 10 Global Step: 171640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:43,721-Speed 9328.15 samples/sec Loss 5.5178 LearningRate 0.0236 Epoch: 10 Global Step: 171650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:44,843-Speed 9129.96 samples/sec Loss 5.4866 LearningRate 0.0236 Epoch: 10 Global Step: 171660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:27:45,909-Speed 9611.13 samples/sec Loss 5.4920 LearningRate 0.0236 Epoch: 10 Global Step: 171670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:47,007-Speed 9334.11 samples/sec Loss 5.6501 LearningRate 0.0236 Epoch: 10 Global Step: 171680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:48,100-Speed 9372.69 samples/sec Loss 5.6029 LearningRate 0.0236 Epoch: 10 Global Step: 171690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:49,203-Speed 9291.18 samples/sec Loss 5.5947 LearningRate 0.0236 Epoch: 10 Global Step: 171700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:50,317-Speed 9202.53 samples/sec Loss 5.4646 LearningRate 0.0236 Epoch: 10 Global Step: 171710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:51,405-Speed 9412.84 samples/sec Loss 5.5771 LearningRate 0.0236 Epoch: 10 Global Step: 171720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:52,504-Speed 9320.97 samples/sec Loss 5.4662 LearningRate 0.0236 Epoch: 10 Global Step: 171730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:53,601-Speed 9345.85 samples/sec Loss 5.5848 LearningRate 0.0236 Epoch: 10 Global Step: 171740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:54,655-Speed 9718.85 samples/sec Loss 5.5786 LearningRate 0.0236 Epoch: 10 Global Step: 171750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:55,736-Speed 9479.51 samples/sec Loss 5.5367 LearningRate 0.0236 Epoch: 10 Global Step: 171760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:56,816-Speed 9487.14 samples/sec Loss 5.5408 LearningRate 0.0236 Epoch: 10 Global Step: 171770 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:27:57,927-Speed 9225.62 samples/sec Loss 5.5885 LearningRate 0.0236 Epoch: 10 Global Step: 171780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:27:58,999-Speed 9553.59 samples/sec Loss 5.5421 LearningRate 0.0236 Epoch: 10 Global Step: 171790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:00,052-Speed 9727.00 samples/sec Loss 5.5887 LearningRate 0.0236 Epoch: 10 Global Step: 171800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:01,145-Speed 9385.14 samples/sec Loss 5.5864 LearningRate 0.0236 Epoch: 10 Global Step: 171810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:02,190-Speed 9806.39 samples/sec Loss 5.5235 LearningRate 0.0236 Epoch: 10 Global Step: 171820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:28:03,316-Speed 9095.31 samples/sec Loss 5.5889 LearningRate 0.0235 Epoch: 10 Global Step: 171830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:28:04,411-Speed 9355.09 samples/sec Loss 5.5935 LearningRate 0.0235 Epoch: 10 Global Step: 171840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:28:05,535-Speed 9114.28 samples/sec Loss 5.6051 LearningRate 0.0235 Epoch: 10 Global Step: 171850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:28:06,628-Speed 9376.34 samples/sec Loss 5.4631 LearningRate 0.0235 Epoch: 10 Global Step: 171860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:28:07,693-Speed 9622.81 samples/sec Loss 5.5725 LearningRate 0.0235 Epoch: 10 Global Step: 171870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:28:08,763-Speed 9581.33 samples/sec Loss 5.6211 LearningRate 0.0235 Epoch: 10 Global Step: 171880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:28:09,848-Speed 9442.56 samples/sec Loss 5.6272 LearningRate 0.0235 Epoch: 10 Global Step: 171890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:28:10,921-Speed 9545.96 samples/sec Loss 5.6065 LearningRate 0.0235 Epoch: 10 Global Step: 171900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:28:11,991-Speed 9583.77 samples/sec Loss 5.4647 LearningRate 0.0235 Epoch: 10 Global Step: 171910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 18:28:13,159-Speed 8770.48 samples/sec Loss 5.6828 LearningRate 0.0235 Epoch: 10 Global Step: 171920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:14,211-Speed 9734.65 samples/sec Loss 5.6659 LearningRate 0.0235 Epoch: 10 Global Step: 171930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:15,240-Speed 9963.98 samples/sec Loss 5.6464 LearningRate 0.0235 Epoch: 10 Global Step: 171940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:16,284-Speed 9813.64 samples/sec Loss 5.5492 LearningRate 0.0235 Epoch: 10 Global Step: 171950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:17,331-Speed 9786.93 samples/sec Loss 5.4665 LearningRate 0.0235 Epoch: 10 Global Step: 171960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:18,396-Speed 9620.33 samples/sec Loss 5.6234 LearningRate 0.0235 Epoch: 10 Global Step: 171970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:19,455-Speed 9670.38 samples/sec Loss 5.5691 LearningRate 0.0235 Epoch: 10 Global Step: 171980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:20,535-Speed 9498.46 samples/sec Loss 5.5662 LearningRate 0.0235 Epoch: 10 Global Step: 171990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:21,631-Speed 9347.35 samples/sec Loss 5.7322 LearningRate 0.0235 Epoch: 10 Global Step: 172000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:28:43,747-[lfw][172000]XNorm: 9.457694 Training: 2022-04-11 18:28:43,748-[lfw][172000]Accuracy-Flip: 0.99683+-0.00252 Training: 2022-04-11 18:28:43,748-[lfw][172000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:29:09,279-[cfp_fp][172000]XNorm: 8.043288 Training: 2022-04-11 18:29:09,280-[cfp_fp][172000]Accuracy-Flip: 0.96257+-0.00983 Training: 2022-04-11 18:29:09,280-[cfp_fp][172000]Accuracy-Highest: 0.96500 Training: 2022-04-11 18:29:31,196-[agedb_30][172000]XNorm: 9.145506 Training: 2022-04-11 18:29:31,196-[agedb_30][172000]Accuracy-Flip: 0.96583+-0.00857 Training: 2022-04-11 18:29:31,197-[agedb_30][172000]Accuracy-Highest: 0.96917 Training: 2022-04-11 18:29:32,274-Speed 144.95 samples/sec Loss 5.5827 LearningRate 0.0235 Epoch: 10 Global Step: 172010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:33,332-Speed 9686.39 samples/sec Loss 5.6046 LearningRate 0.0235 Epoch: 10 Global Step: 172020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:34,468-Speed 9015.90 samples/sec Loss 5.6371 LearningRate 0.0235 Epoch: 10 Global Step: 172030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:35,549-Speed 9476.75 samples/sec Loss 5.6314 LearningRate 0.0235 Epoch: 10 Global Step: 172040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:36,618-Speed 9588.36 samples/sec Loss 5.6463 LearningRate 0.0235 Epoch: 10 Global Step: 172050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:37,723-Speed 9274.64 samples/sec Loss 5.5457 LearningRate 0.0235 Epoch: 10 Global Step: 172060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:38,848-Speed 9107.41 samples/sec Loss 5.5555 LearningRate 0.0235 Epoch: 10 Global Step: 172070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:39,925-Speed 9516.42 samples/sec Loss 5.6453 LearningRate 0.0235 Epoch: 10 Global Step: 172080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:40,997-Speed 9554.79 samples/sec Loss 5.5321 LearningRate 0.0235 Epoch: 10 Global Step: 172090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:42,050-Speed 9734.93 samples/sec Loss 5.5404 LearningRate 0.0235 Epoch: 10 Global Step: 172100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:43,122-Speed 9560.50 samples/sec Loss 5.5767 LearningRate 0.0235 Epoch: 10 Global Step: 172110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:44,191-Speed 9587.19 samples/sec Loss 5.5760 LearningRate 0.0235 Epoch: 10 Global Step: 172120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:45,261-Speed 9574.35 samples/sec Loss 5.5661 LearningRate 0.0235 Epoch: 10 Global Step: 172130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:46,336-Speed 9526.42 samples/sec Loss 5.6389 LearningRate 0.0235 Epoch: 10 Global Step: 172140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:47,430-Speed 9370.37 samples/sec Loss 5.6453 LearningRate 0.0235 Epoch: 10 Global Step: 172150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:48,500-Speed 9574.25 samples/sec Loss 5.4544 LearningRate 0.0235 Epoch: 10 Global Step: 172160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:49,588-Speed 9412.57 samples/sec Loss 5.4998 LearningRate 0.0234 Epoch: 10 Global Step: 172170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:50,653-Speed 9624.72 samples/sec Loss 5.6535 LearningRate 0.0234 Epoch: 10 Global Step: 172180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:51,727-Speed 9547.13 samples/sec Loss 5.5529 LearningRate 0.0234 Epoch: 10 Global Step: 172190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:52,845-Speed 9158.74 samples/sec Loss 5.5317 LearningRate 0.0234 Epoch: 10 Global Step: 172200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:53,941-Speed 9351.80 samples/sec Loss 5.5066 LearningRate 0.0234 Epoch: 10 Global Step: 172210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:55,008-Speed 9595.67 samples/sec Loss 5.6454 LearningRate 0.0234 Epoch: 10 Global Step: 172220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:56,064-Speed 9703.56 samples/sec Loss 5.6063 LearningRate 0.0234 Epoch: 10 Global Step: 172230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:57,130-Speed 9610.63 samples/sec Loss 5.5804 LearningRate 0.0234 Epoch: 10 Global Step: 172240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:58,215-Speed 9448.26 samples/sec Loss 5.5378 LearningRate 0.0234 Epoch: 10 Global Step: 172250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:29:59,334-Speed 9156.06 samples/sec Loss 5.5594 LearningRate 0.0234 Epoch: 10 Global Step: 172260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:00,438-Speed 9280.07 samples/sec Loss 5.5884 LearningRate 0.0234 Epoch: 10 Global Step: 172270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:01,528-Speed 9400.71 samples/sec Loss 5.4831 LearningRate 0.0234 Epoch: 10 Global Step: 172280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:02,636-Speed 9252.25 samples/sec Loss 5.6298 LearningRate 0.0234 Epoch: 10 Global Step: 172290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:03,736-Speed 9308.93 samples/sec Loss 5.5594 LearningRate 0.0234 Epoch: 10 Global Step: 172300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:04,800-Speed 9633.95 samples/sec Loss 5.5671 LearningRate 0.0234 Epoch: 10 Global Step: 172310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:05,888-Speed 9414.05 samples/sec Loss 5.5861 LearningRate 0.0234 Epoch: 10 Global Step: 172320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:30:06,954-Speed 9614.80 samples/sec Loss 5.5587 LearningRate 0.0234 Epoch: 10 Global Step: 172330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:08,043-Speed 9413.80 samples/sec Loss 5.4507 LearningRate 0.0234 Epoch: 10 Global Step: 172340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:09,113-Speed 9573.21 samples/sec Loss 5.5378 LearningRate 0.0234 Epoch: 10 Global Step: 172350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:10,193-Speed 9487.64 samples/sec Loss 5.5647 LearningRate 0.0234 Epoch: 10 Global Step: 172360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:11,271-Speed 9507.76 samples/sec Loss 5.5757 LearningRate 0.0234 Epoch: 10 Global Step: 172370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:12,382-Speed 9216.37 samples/sec Loss 5.5760 LearningRate 0.0234 Epoch: 10 Global Step: 172380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:13,535-Speed 8885.81 samples/sec Loss 5.5871 LearningRate 0.0234 Epoch: 10 Global Step: 172390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:14,639-Speed 9284.89 samples/sec Loss 5.5465 LearningRate 0.0234 Epoch: 10 Global Step: 172400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:15,741-Speed 9298.57 samples/sec Loss 5.5674 LearningRate 0.0234 Epoch: 10 Global Step: 172410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:16,826-Speed 9443.17 samples/sec Loss 5.5120 LearningRate 0.0234 Epoch: 10 Global Step: 172420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:17,906-Speed 9486.90 samples/sec Loss 5.6068 LearningRate 0.0234 Epoch: 10 Global Step: 172430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:30:18,995-Speed 9404.19 samples/sec Loss 5.4996 LearningRate 0.0234 Epoch: 10 Global Step: 172440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:20,066-Speed 9570.80 samples/sec Loss 5.5614 LearningRate 0.0234 Epoch: 10 Global Step: 172450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:21,162-Speed 9347.74 samples/sec Loss 5.5789 LearningRate 0.0234 Epoch: 10 Global Step: 172460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:22,271-Speed 9241.49 samples/sec Loss 5.5680 LearningRate 0.0234 Epoch: 10 Global Step: 172470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:23,397-Speed 9096.51 samples/sec Loss 5.5016 LearningRate 0.0234 Epoch: 10 Global Step: 172480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:24,456-Speed 9676.49 samples/sec Loss 5.5561 LearningRate 0.0234 Epoch: 10 Global Step: 172490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:25,549-Speed 9376.34 samples/sec Loss 5.5426 LearningRate 0.0234 Epoch: 10 Global Step: 172500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:26,614-Speed 9617.28 samples/sec Loss 5.5731 LearningRate 0.0234 Epoch: 10 Global Step: 172510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:27,709-Speed 9358.28 samples/sec Loss 5.5701 LearningRate 0.0233 Epoch: 10 Global Step: 172520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:28,802-Speed 9377.93 samples/sec Loss 5.5519 LearningRate 0.0233 Epoch: 10 Global Step: 172530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:29,871-Speed 9583.01 samples/sec Loss 5.6179 LearningRate 0.0233 Epoch: 10 Global Step: 172540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:30,930-Speed 9672.70 samples/sec Loss 5.5403 LearningRate 0.0233 Epoch: 10 Global Step: 172550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:32,020-Speed 9404.85 samples/sec Loss 5.5642 LearningRate 0.0233 Epoch: 10 Global Step: 172560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:33,080-Speed 9664.84 samples/sec Loss 5.4467 LearningRate 0.0233 Epoch: 10 Global Step: 172570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:34,149-Speed 9583.50 samples/sec Loss 5.5802 LearningRate 0.0233 Epoch: 10 Global Step: 172580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:35,260-Speed 9222.42 samples/sec Loss 5.5901 LearningRate 0.0233 Epoch: 10 Global Step: 172590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:36,351-Speed 9388.64 samples/sec Loss 5.5232 LearningRate 0.0233 Epoch: 10 Global Step: 172600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:37,430-Speed 9499.58 samples/sec Loss 5.5933 LearningRate 0.0233 Epoch: 10 Global Step: 172610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:38,531-Speed 9306.44 samples/sec Loss 5.5855 LearningRate 0.0233 Epoch: 10 Global Step: 172620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:39,630-Speed 9321.33 samples/sec Loss 5.6434 LearningRate 0.0233 Epoch: 10 Global Step: 172630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:40,716-Speed 9437.41 samples/sec Loss 5.5760 LearningRate 0.0233 Epoch: 10 Global Step: 172640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 18:30:41,777-Speed 9654.62 samples/sec Loss 5.5378 LearningRate 0.0233 Epoch: 10 Global Step: 172650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:42,857-Speed 9489.05 samples/sec Loss 5.5836 LearningRate 0.0233 Epoch: 10 Global Step: 172660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:43,922-Speed 9620.02 samples/sec Loss 5.6738 LearningRate 0.0233 Epoch: 10 Global Step: 172670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 18:30:45,000-Speed 9504.05 samples/sec Loss 5.5561 LearningRate 0.0233 Epoch: 10 Global Step: 172680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:46,114-Speed 9197.71 samples/sec Loss 5.5414 LearningRate 0.0233 Epoch: 10 Global Step: 172690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:47,195-Speed 9478.23 samples/sec Loss 5.5893 LearningRate 0.0233 Epoch: 10 Global Step: 172700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:48,279-Speed 9451.60 samples/sec Loss 5.6265 LearningRate 0.0233 Epoch: 10 Global Step: 172710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:49,363-Speed 9456.04 samples/sec Loss 5.6916 LearningRate 0.0233 Epoch: 10 Global Step: 172720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:50,453-Speed 9392.68 samples/sec Loss 5.5850 LearningRate 0.0233 Epoch: 10 Global Step: 172730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:51,542-Speed 9413.23 samples/sec Loss 5.6390 LearningRate 0.0233 Epoch: 10 Global Step: 172740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:52,609-Speed 9603.45 samples/sec Loss 5.5921 LearningRate 0.0233 Epoch: 10 Global Step: 172750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:53,693-Speed 9459.77 samples/sec Loss 5.5584 LearningRate 0.0233 Epoch: 10 Global Step: 172760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:54,759-Speed 9608.95 samples/sec Loss 5.5700 LearningRate 0.0233 Epoch: 10 Global Step: 172770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:55,838-Speed 9494.82 samples/sec Loss 5.4935 LearningRate 0.0233 Epoch: 10 Global Step: 172780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:56,913-Speed 9525.34 samples/sec Loss 5.6042 LearningRate 0.0233 Epoch: 10 Global Step: 172790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:58,009-Speed 9356.49 samples/sec Loss 5.6547 LearningRate 0.0233 Epoch: 10 Global Step: 172800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:30:59,106-Speed 9334.31 samples/sec Loss 5.6087 LearningRate 0.0233 Epoch: 10 Global Step: 172810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:00,209-Speed 9286.19 samples/sec Loss 5.5477 LearningRate 0.0233 Epoch: 10 Global Step: 172820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:01,282-Speed 9560.91 samples/sec Loss 5.6586 LearningRate 0.0233 Epoch: 10 Global Step: 172830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:02,387-Speed 9272.03 samples/sec Loss 5.6409 LearningRate 0.0233 Epoch: 10 Global Step: 172840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:03,469-Speed 9471.08 samples/sec Loss 5.5002 LearningRate 0.0233 Epoch: 10 Global Step: 172850 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:31:04,520-Speed 9749.53 samples/sec Loss 5.6158 LearningRate 0.0232 Epoch: 10 Global Step: 172860 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:31:05,588-Speed 9594.44 samples/sec Loss 5.6422 LearningRate 0.0232 Epoch: 10 Global Step: 172870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:06,666-Speed 9502.73 samples/sec Loss 5.5283 LearningRate 0.0232 Epoch: 10 Global Step: 172880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:07,786-Speed 9154.84 samples/sec Loss 5.5753 LearningRate 0.0232 Epoch: 10 Global Step: 172890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:08,880-Speed 9360.64 samples/sec Loss 5.6212 LearningRate 0.0232 Epoch: 10 Global Step: 172900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:09,991-Speed 9225.89 samples/sec Loss 5.7212 LearningRate 0.0232 Epoch: 10 Global Step: 172910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:11,081-Speed 9392.34 samples/sec Loss 5.5544 LearningRate 0.0232 Epoch: 10 Global Step: 172920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:12,177-Speed 9353.06 samples/sec Loss 5.4916 LearningRate 0.0232 Epoch: 10 Global Step: 172930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:13,279-Speed 9298.24 samples/sec Loss 5.6053 LearningRate 0.0232 Epoch: 10 Global Step: 172940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:14,328-Speed 9764.13 samples/sec Loss 5.6503 LearningRate 0.0232 Epoch: 10 Global Step: 172950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:15,437-Speed 9234.04 samples/sec Loss 5.5632 LearningRate 0.0232 Epoch: 10 Global Step: 172960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:16,570-Speed 9042.70 samples/sec Loss 5.6808 LearningRate 0.0232 Epoch: 10 Global Step: 172970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:17,649-Speed 9501.98 samples/sec Loss 5.5070 LearningRate 0.0232 Epoch: 10 Global Step: 172980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:18,730-Speed 9478.37 samples/sec Loss 5.5502 LearningRate 0.0232 Epoch: 10 Global Step: 172990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:19,813-Speed 9463.88 samples/sec Loss 5.5930 LearningRate 0.0232 Epoch: 10 Global Step: 173000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:20,882-Speed 9582.10 samples/sec Loss 5.6079 LearningRate 0.0232 Epoch: 10 Global Step: 173010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:21,983-Speed 9304.59 samples/sec Loss 5.4618 LearningRate 0.0232 Epoch: 10 Global Step: 173020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:23,079-Speed 9354.50 samples/sec Loss 5.5735 LearningRate 0.0232 Epoch: 10 Global Step: 173030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:24,152-Speed 9545.09 samples/sec Loss 5.5949 LearningRate 0.0232 Epoch: 10 Global Step: 173040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:25,248-Speed 9352.39 samples/sec Loss 5.4886 LearningRate 0.0232 Epoch: 10 Global Step: 173050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:26,364-Speed 9179.44 samples/sec Loss 5.5332 LearningRate 0.0232 Epoch: 10 Global Step: 173060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:27,477-Speed 9205.33 samples/sec Loss 5.5173 LearningRate 0.0232 Epoch: 10 Global Step: 173070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:31:28,548-Speed 9560.71 samples/sec Loss 5.5007 LearningRate 0.0232 Epoch: 10 Global Step: 173080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:29,612-Speed 9636.96 samples/sec Loss 5.7483 LearningRate 0.0232 Epoch: 10 Global Step: 173090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:30,696-Speed 9445.62 samples/sec Loss 5.6028 LearningRate 0.0232 Epoch: 10 Global Step: 173100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:31,788-Speed 9387.28 samples/sec Loss 5.6370 LearningRate 0.0232 Epoch: 10 Global Step: 173110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:32,888-Speed 9315.71 samples/sec Loss 5.7136 LearningRate 0.0232 Epoch: 10 Global Step: 173120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:34,004-Speed 9177.89 samples/sec Loss 5.5803 LearningRate 0.0232 Epoch: 10 Global Step: 173130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:35,072-Speed 9599.67 samples/sec Loss 5.6178 LearningRate 0.0232 Epoch: 10 Global Step: 173140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:36,150-Speed 9499.23 samples/sec Loss 5.5502 LearningRate 0.0232 Epoch: 10 Global Step: 173150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:37,245-Speed 9362.96 samples/sec Loss 5.6483 LearningRate 0.0232 Epoch: 10 Global Step: 173160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:38,352-Speed 9256.74 samples/sec Loss 5.5507 LearningRate 0.0232 Epoch: 10 Global Step: 173170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:39,422-Speed 9573.57 samples/sec Loss 5.5569 LearningRate 0.0232 Epoch: 10 Global Step: 173180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:40,536-Speed 9200.59 samples/sec Loss 5.6275 LearningRate 0.0232 Epoch: 10 Global Step: 173190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:41,674-Speed 9002.09 samples/sec Loss 5.5848 LearningRate 0.0232 Epoch: 10 Global Step: 173200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:42,754-Speed 9491.64 samples/sec Loss 5.6528 LearningRate 0.0231 Epoch: 10 Global Step: 173210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:43,868-Speed 9197.53 samples/sec Loss 5.5468 LearningRate 0.0231 Epoch: 10 Global Step: 173220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:44,916-Speed 9775.95 samples/sec Loss 5.6013 LearningRate 0.0231 Epoch: 10 Global Step: 173230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:31:46,003-Speed 9426.09 samples/sec Loss 5.6013 LearningRate 0.0231 Epoch: 10 Global Step: 173240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:31:47,089-Speed 9431.99 samples/sec Loss 5.6857 LearningRate 0.0231 Epoch: 10 Global Step: 173250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:31:48,152-Speed 9639.24 samples/sec Loss 5.5188 LearningRate 0.0231 Epoch: 10 Global Step: 173260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:31:49,212-Speed 9667.06 samples/sec Loss 5.5418 LearningRate 0.0231 Epoch: 10 Global Step: 173270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:31:50,283-Speed 9571.17 samples/sec Loss 5.6274 LearningRate 0.0231 Epoch: 10 Global Step: 173280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:31:51,347-Speed 9625.48 samples/sec Loss 5.5808 LearningRate 0.0231 Epoch: 10 Global Step: 173290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:31:52,469-Speed 9137.87 samples/sec Loss 5.5421 LearningRate 0.0231 Epoch: 10 Global Step: 173300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:31:53,567-Speed 9324.52 samples/sec Loss 5.6001 LearningRate 0.0231 Epoch: 10 Global Step: 173310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:31:54,646-Speed 9495.26 samples/sec Loss 5.6347 LearningRate 0.0231 Epoch: 10 Global Step: 173320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:31:55,698-Speed 9738.89 samples/sec Loss 5.6382 LearningRate 0.0231 Epoch: 10 Global Step: 173330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:56,773-Speed 9535.92 samples/sec Loss 5.5516 LearningRate 0.0231 Epoch: 10 Global Step: 173340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:57,893-Speed 9146.61 samples/sec Loss 5.5835 LearningRate 0.0231 Epoch: 10 Global Step: 173350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:31:59,008-Speed 9183.65 samples/sec Loss 5.6896 LearningRate 0.0231 Epoch: 10 Global Step: 173360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:00,169-Speed 8829.63 samples/sec Loss 5.6523 LearningRate 0.0231 Epoch: 10 Global Step: 173370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:01,298-Speed 9080.12 samples/sec Loss 5.5013 LearningRate 0.0231 Epoch: 10 Global Step: 173380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:02,396-Speed 9337.05 samples/sec Loss 5.6405 LearningRate 0.0231 Epoch: 10 Global Step: 173390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:03,480-Speed 9451.29 samples/sec Loss 5.6574 LearningRate 0.0231 Epoch: 10 Global Step: 173400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:04,571-Speed 9387.24 samples/sec Loss 5.5748 LearningRate 0.0231 Epoch: 10 Global Step: 173410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:05,666-Speed 9362.87 samples/sec Loss 5.5732 LearningRate 0.0231 Epoch: 10 Global Step: 173420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:06,767-Speed 9300.52 samples/sec Loss 5.5888 LearningRate 0.0231 Epoch: 10 Global Step: 173430 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:32:07,843-Speed 9522.53 samples/sec Loss 5.6739 LearningRate 0.0231 Epoch: 10 Global Step: 173440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:08,919-Speed 9522.31 samples/sec Loss 5.6224 LearningRate 0.0231 Epoch: 10 Global Step: 173450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:09,990-Speed 9566.43 samples/sec Loss 5.5204 LearningRate 0.0231 Epoch: 10 Global Step: 173460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:11,069-Speed 9502.46 samples/sec Loss 5.6693 LearningRate 0.0231 Epoch: 10 Global Step: 173470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:12,171-Speed 9291.84 samples/sec Loss 5.6338 LearningRate 0.0231 Epoch: 10 Global Step: 173480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:13,274-Speed 9292.22 samples/sec Loss 5.6032 LearningRate 0.0231 Epoch: 10 Global Step: 173490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:14,325-Speed 9749.81 samples/sec Loss 5.5986 LearningRate 0.0231 Epoch: 10 Global Step: 173500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:15,436-Speed 9222.32 samples/sec Loss 5.4989 LearningRate 0.0231 Epoch: 10 Global Step: 173510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:16,503-Speed 9603.88 samples/sec Loss 5.6024 LearningRate 0.0231 Epoch: 10 Global Step: 173520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:17,577-Speed 9534.57 samples/sec Loss 5.6039 LearningRate 0.0231 Epoch: 10 Global Step: 173530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:18,757-Speed 8683.27 samples/sec Loss 5.6394 LearningRate 0.0231 Epoch: 10 Global Step: 173540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:19,811-Speed 9726.22 samples/sec Loss 5.5180 LearningRate 0.0231 Epoch: 10 Global Step: 173550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:20,917-Speed 9264.72 samples/sec Loss 5.5800 LearningRate 0.0230 Epoch: 10 Global Step: 173560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:22,018-Speed 9309.15 samples/sec Loss 5.5840 LearningRate 0.0230 Epoch: 10 Global Step: 173570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:23,117-Speed 9321.90 samples/sec Loss 5.6129 LearningRate 0.0230 Epoch: 10 Global Step: 173580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:24,230-Speed 9201.27 samples/sec Loss 5.7171 LearningRate 0.0230 Epoch: 10 Global Step: 173590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:25,309-Speed 9502.53 samples/sec Loss 5.5539 LearningRate 0.0230 Epoch: 10 Global Step: 173600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:26,374-Speed 9617.91 samples/sec Loss 5.5415 LearningRate 0.0230 Epoch: 10 Global Step: 173610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:27,463-Speed 9406.38 samples/sec Loss 5.6385 LearningRate 0.0230 Epoch: 10 Global Step: 173620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:28,514-Speed 9749.40 samples/sec Loss 5.6611 LearningRate 0.0230 Epoch: 10 Global Step: 173630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:29,552-Speed 9868.60 samples/sec Loss 5.6599 LearningRate 0.0230 Epoch: 10 Global Step: 173640 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:32:30,639-Speed 9424.11 samples/sec Loss 5.5635 LearningRate 0.0230 Epoch: 10 Global Step: 173650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:31,760-Speed 9148.20 samples/sec Loss 5.5240 LearningRate 0.0230 Epoch: 10 Global Step: 173660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:32,820-Speed 9660.29 samples/sec Loss 5.4595 LearningRate 0.0230 Epoch: 10 Global Step: 173670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:33,895-Speed 9530.47 samples/sec Loss 5.6009 LearningRate 0.0230 Epoch: 10 Global Step: 173680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:34,993-Speed 9334.75 samples/sec Loss 5.6463 LearningRate 0.0230 Epoch: 10 Global Step: 173690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:36,085-Speed 9385.18 samples/sec Loss 5.6475 LearningRate 0.0230 Epoch: 10 Global Step: 173700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:37,164-Speed 9492.56 samples/sec Loss 5.6653 LearningRate 0.0230 Epoch: 10 Global Step: 173710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:38,213-Speed 9773.30 samples/sec Loss 5.6323 LearningRate 0.0230 Epoch: 10 Global Step: 173720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:39,272-Speed 9675.06 samples/sec Loss 5.6031 LearningRate 0.0230 Epoch: 10 Global Step: 173730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:40,331-Speed 9680.53 samples/sec Loss 5.7045 LearningRate 0.0230 Epoch: 10 Global Step: 173740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:41,434-Speed 9288.39 samples/sec Loss 5.5311 LearningRate 0.0230 Epoch: 10 Global Step: 173750 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:32:42,533-Speed 9318.78 samples/sec Loss 5.6388 LearningRate 0.0230 Epoch: 10 Global Step: 173760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:43,593-Speed 9663.61 samples/sec Loss 5.6243 LearningRate 0.0230 Epoch: 10 Global Step: 173770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:32:44,698-Speed 9278.85 samples/sec Loss 5.6181 LearningRate 0.0230 Epoch: 10 Global Step: 173780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:32:45,729-Speed 9930.59 samples/sec Loss 5.6748 LearningRate 0.0230 Epoch: 10 Global Step: 173790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:32:46,836-Speed 9254.23 samples/sec Loss 5.6485 LearningRate 0.0230 Epoch: 10 Global Step: 173800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:32:47,915-Speed 9498.10 samples/sec Loss 5.7208 LearningRate 0.0230 Epoch: 10 Global Step: 173810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:32:48,989-Speed 9543.13 samples/sec Loss 5.6074 LearningRate 0.0230 Epoch: 10 Global Step: 173820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:32:50,044-Speed 9711.54 samples/sec Loss 5.6008 LearningRate 0.0230 Epoch: 10 Global Step: 173830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:32:51,120-Speed 9525.11 samples/sec Loss 5.6252 LearningRate 0.0230 Epoch: 10 Global Step: 173840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:32:52,172-Speed 9736.83 samples/sec Loss 5.5671 LearningRate 0.0230 Epoch: 10 Global Step: 173850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:32:53,260-Speed 9413.68 samples/sec Loss 5.6425 LearningRate 0.0230 Epoch: 10 Global Step: 173860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:32:54,357-Speed 9346.43 samples/sec Loss 5.6208 LearningRate 0.0230 Epoch: 10 Global Step: 173870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:55,426-Speed 9581.75 samples/sec Loss 5.6055 LearningRate 0.0230 Epoch: 10 Global Step: 173880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:56,517-Speed 9392.39 samples/sec Loss 5.5638 LearningRate 0.0230 Epoch: 10 Global Step: 173890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:57,618-Speed 9306.95 samples/sec Loss 5.5687 LearningRate 0.0229 Epoch: 10 Global Step: 173900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:58,720-Speed 9298.85 samples/sec Loss 5.6595 LearningRate 0.0229 Epoch: 10 Global Step: 173910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:32:59,780-Speed 9664.44 samples/sec Loss 5.6358 LearningRate 0.0229 Epoch: 10 Global Step: 173920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:33:00,857-Speed 9519.14 samples/sec Loss 5.5878 LearningRate 0.0229 Epoch: 10 Global Step: 173930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:33:01,922-Speed 9618.18 samples/sec Loss 5.6249 LearningRate 0.0229 Epoch: 10 Global Step: 173940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:33:02,998-Speed 9523.13 samples/sec Loss 5.6627 LearningRate 0.0229 Epoch: 10 Global Step: 173950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:33:04,043-Speed 9810.60 samples/sec Loss 5.7093 LearningRate 0.0229 Epoch: 10 Global Step: 173960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:33:05,114-Speed 9561.44 samples/sec Loss 5.5373 LearningRate 0.0229 Epoch: 10 Global Step: 173970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:33:06,192-Speed 9507.31 samples/sec Loss 5.6511 LearningRate 0.0229 Epoch: 10 Global Step: 173980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:33:07,287-Speed 9359.57 samples/sec Loss 5.5062 LearningRate 0.0229 Epoch: 10 Global Step: 173990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:33:08,389-Speed 9295.13 samples/sec Loss 5.5919 LearningRate 0.0229 Epoch: 10 Global Step: 174000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:33:30,363-[lfw][174000]XNorm: 9.322621 Training: 2022-04-11 18:33:30,364-[lfw][174000]Accuracy-Flip: 0.99683+-0.00283 Training: 2022-04-11 18:33:30,365-[lfw][174000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:33:55,602-[cfp_fp][174000]XNorm: 7.952245 Training: 2022-04-11 18:33:55,603-[cfp_fp][174000]Accuracy-Flip: 0.96443+-0.00970 Training: 2022-04-11 18:33:55,604-[cfp_fp][174000]Accuracy-Highest: 0.96500 Training: 2022-04-11 18:34:17,465-[agedb_30][174000]XNorm: 9.038745 Training: 2022-04-11 18:34:17,466-[agedb_30][174000]Accuracy-Flip: 0.96750+-0.01047 Training: 2022-04-11 18:34:17,466-[agedb_30][174000]Accuracy-Highest: 0.96917 Training: 2022-04-11 18:34:18,565-Speed 145.92 samples/sec Loss 5.5199 LearningRate 0.0229 Epoch: 10 Global Step: 174010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:19,619-Speed 9720.20 samples/sec Loss 5.6029 LearningRate 0.0229 Epoch: 10 Global Step: 174020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:20,668-Speed 9766.90 samples/sec Loss 5.6372 LearningRate 0.0229 Epoch: 10 Global Step: 174030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:21,728-Speed 9668.50 samples/sec Loss 5.7038 LearningRate 0.0229 Epoch: 10 Global Step: 174040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:22,804-Speed 9519.81 samples/sec Loss 5.6057 LearningRate 0.0229 Epoch: 10 Global Step: 174050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:23,915-Speed 9220.95 samples/sec Loss 5.5419 LearningRate 0.0229 Epoch: 10 Global Step: 174060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:25,011-Speed 9354.56 samples/sec Loss 5.5655 LearningRate 0.0229 Epoch: 10 Global Step: 174070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:26,095-Speed 9455.18 samples/sec Loss 5.5056 LearningRate 0.0229 Epoch: 10 Global Step: 174080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:27,192-Speed 9332.22 samples/sec Loss 5.5824 LearningRate 0.0229 Epoch: 10 Global Step: 174090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:28,282-Speed 9402.10 samples/sec Loss 5.6776 LearningRate 0.0229 Epoch: 10 Global Step: 174100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:29,383-Speed 9308.36 samples/sec Loss 5.6224 LearningRate 0.0229 Epoch: 10 Global Step: 174110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:30,437-Speed 9720.91 samples/sec Loss 5.7111 LearningRate 0.0229 Epoch: 10 Global Step: 174120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:31,535-Speed 9329.29 samples/sec Loss 5.5851 LearningRate 0.0229 Epoch: 10 Global Step: 174130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:32,641-Speed 9264.90 samples/sec Loss 5.6479 LearningRate 0.0229 Epoch: 10 Global Step: 174140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:33,761-Speed 9149.32 samples/sec Loss 5.6929 LearningRate 0.0229 Epoch: 10 Global Step: 174150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:34,842-Speed 9474.52 samples/sec Loss 5.6792 LearningRate 0.0229 Epoch: 10 Global Step: 174160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:35,938-Speed 9346.63 samples/sec Loss 5.6205 LearningRate 0.0229 Epoch: 10 Global Step: 174170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:36,998-Speed 9673.99 samples/sec Loss 5.6458 LearningRate 0.0229 Epoch: 10 Global Step: 174180 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:34:38,088-Speed 9404.29 samples/sec Loss 5.5614 LearningRate 0.0229 Epoch: 10 Global Step: 174190 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:34:39,147-Speed 9673.41 samples/sec Loss 5.6048 LearningRate 0.0229 Epoch: 10 Global Step: 174200 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:34:40,204-Speed 9694.14 samples/sec Loss 5.6430 LearningRate 0.0229 Epoch: 10 Global Step: 174210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:41,281-Speed 9514.28 samples/sec Loss 5.5842 LearningRate 0.0229 Epoch: 10 Global Step: 174220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:42,395-Speed 9196.53 samples/sec Loss 5.6079 LearningRate 0.0229 Epoch: 10 Global Step: 174230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:43,467-Speed 9557.10 samples/sec Loss 5.6532 LearningRate 0.0229 Epoch: 10 Global Step: 174240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:44,530-Speed 9642.02 samples/sec Loss 5.7281 LearningRate 0.0228 Epoch: 10 Global Step: 174250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:45,588-Speed 9686.22 samples/sec Loss 5.6002 LearningRate 0.0228 Epoch: 10 Global Step: 174260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:46,670-Speed 9465.95 samples/sec Loss 5.6559 LearningRate 0.0228 Epoch: 10 Global Step: 174270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:47,771-Speed 9304.61 samples/sec Loss 5.5384 LearningRate 0.0228 Epoch: 10 Global Step: 174280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:48,854-Speed 9463.91 samples/sec Loss 5.6080 LearningRate 0.0228 Epoch: 10 Global Step: 174290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:34:49,920-Speed 9608.79 samples/sec Loss 5.6096 LearningRate 0.0228 Epoch: 10 Global Step: 174300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:34:51,000-Speed 9492.63 samples/sec Loss 5.6364 LearningRate 0.0228 Epoch: 10 Global Step: 174310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:34:52,133-Speed 9038.84 samples/sec Loss 5.5721 LearningRate 0.0228 Epoch: 10 Global Step: 174320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:34:53,221-Speed 9416.26 samples/sec Loss 5.6569 LearningRate 0.0228 Epoch: 10 Global Step: 174330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:34:54,328-Speed 9261.09 samples/sec Loss 5.5632 LearningRate 0.0228 Epoch: 10 Global Step: 174340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:34:55,454-Speed 9094.02 samples/sec Loss 5.6527 LearningRate 0.0228 Epoch: 10 Global Step: 174350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:34:56,526-Speed 9564.05 samples/sec Loss 5.6006 LearningRate 0.0228 Epoch: 10 Global Step: 174360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:34:57,636-Speed 9231.81 samples/sec Loss 5.5520 LearningRate 0.0228 Epoch: 10 Global Step: 174370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:34:58,734-Speed 9324.72 samples/sec Loss 5.5320 LearningRate 0.0228 Epoch: 10 Global Step: 174380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:34:59,856-Speed 9137.23 samples/sec Loss 5.5280 LearningRate 0.0228 Epoch: 10 Global Step: 174390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:00,952-Speed 9346.45 samples/sec Loss 5.5696 LearningRate 0.0228 Epoch: 10 Global Step: 174400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:02,072-Speed 9146.97 samples/sec Loss 5.6434 LearningRate 0.0228 Epoch: 10 Global Step: 174410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:03,138-Speed 9614.65 samples/sec Loss 5.6533 LearningRate 0.0228 Epoch: 10 Global Step: 174420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:04,246-Speed 9246.67 samples/sec Loss 5.6886 LearningRate 0.0228 Epoch: 10 Global Step: 174430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:05,359-Speed 9208.63 samples/sec Loss 5.6965 LearningRate 0.0228 Epoch: 10 Global Step: 174440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:06,484-Speed 9105.09 samples/sec Loss 5.6072 LearningRate 0.0228 Epoch: 10 Global Step: 174450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:07,564-Speed 9486.32 samples/sec Loss 5.5338 LearningRate 0.0228 Epoch: 10 Global Step: 174460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:08,604-Speed 9858.04 samples/sec Loss 5.6277 LearningRate 0.0228 Epoch: 10 Global Step: 174470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:09,670-Speed 9615.80 samples/sec Loss 5.6957 LearningRate 0.0228 Epoch: 10 Global Step: 174480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:10,787-Speed 9171.87 samples/sec Loss 5.6925 LearningRate 0.0228 Epoch: 10 Global Step: 174490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:11,902-Speed 9190.88 samples/sec Loss 5.6239 LearningRate 0.0228 Epoch: 10 Global Step: 174500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:35:12,955-Speed 9726.70 samples/sec Loss 5.6452 LearningRate 0.0228 Epoch: 10 Global Step: 174510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:14,068-Speed 9210.22 samples/sec Loss 5.6603 LearningRate 0.0228 Epoch: 10 Global Step: 174520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:15,179-Speed 9228.34 samples/sec Loss 5.6667 LearningRate 0.0228 Epoch: 10 Global Step: 174530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:16,264-Speed 9442.63 samples/sec Loss 5.6968 LearningRate 0.0228 Epoch: 10 Global Step: 174540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:17,360-Speed 9345.97 samples/sec Loss 5.5446 LearningRate 0.0228 Epoch: 10 Global Step: 174550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:18,454-Speed 9366.63 samples/sec Loss 5.7333 LearningRate 0.0228 Epoch: 10 Global Step: 174560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:19,533-Speed 9489.09 samples/sec Loss 5.6429 LearningRate 0.0228 Epoch: 10 Global Step: 174570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:20,630-Speed 9339.89 samples/sec Loss 5.5744 LearningRate 0.0228 Epoch: 10 Global Step: 174580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:21,690-Speed 9666.93 samples/sec Loss 5.5233 LearningRate 0.0228 Epoch: 10 Global Step: 174590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:22,776-Speed 9440.20 samples/sec Loss 5.6351 LearningRate 0.0227 Epoch: 10 Global Step: 174600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:23,877-Speed 9306.15 samples/sec Loss 5.5867 LearningRate 0.0227 Epoch: 10 Global Step: 174610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:24,965-Speed 9416.71 samples/sec Loss 5.6746 LearningRate 0.0227 Epoch: 10 Global Step: 174620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:26,034-Speed 9585.85 samples/sec Loss 5.5461 LearningRate 0.0227 Epoch: 10 Global Step: 174630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:27,141-Speed 9257.15 samples/sec Loss 5.6742 LearningRate 0.0227 Epoch: 10 Global Step: 174640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:28,223-Speed 9474.28 samples/sec Loss 5.5964 LearningRate 0.0227 Epoch: 10 Global Step: 174650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:29,334-Speed 9218.92 samples/sec Loss 5.6052 LearningRate 0.0227 Epoch: 10 Global Step: 174660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:30,434-Speed 9316.60 samples/sec Loss 5.6589 LearningRate 0.0227 Epoch: 10 Global Step: 174670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:31,529-Speed 9357.14 samples/sec Loss 5.5429 LearningRate 0.0227 Epoch: 10 Global Step: 174680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:32,647-Speed 9162.35 samples/sec Loss 5.5763 LearningRate 0.0227 Epoch: 10 Global Step: 174690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:33,798-Speed 8896.35 samples/sec Loss 5.5335 LearningRate 0.0227 Epoch: 10 Global Step: 174700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:34,882-Speed 9457.78 samples/sec Loss 5.5552 LearningRate 0.0227 Epoch: 10 Global Step: 174710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:35,949-Speed 9596.51 samples/sec Loss 5.6328 LearningRate 0.0227 Epoch: 10 Global Step: 174720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:37,008-Speed 9682.82 samples/sec Loss 5.6462 LearningRate 0.0227 Epoch: 10 Global Step: 174730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:38,098-Speed 9398.12 samples/sec Loss 5.7009 LearningRate 0.0227 Epoch: 10 Global Step: 174740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:39,213-Speed 9192.78 samples/sec Loss 5.7491 LearningRate 0.0227 Epoch: 10 Global Step: 174750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:40,279-Speed 9613.23 samples/sec Loss 5.5741 LearningRate 0.0227 Epoch: 10 Global Step: 174760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:41,355-Speed 9518.00 samples/sec Loss 5.6262 LearningRate 0.0227 Epoch: 10 Global Step: 174770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:42,468-Speed 9209.86 samples/sec Loss 5.7000 LearningRate 0.0227 Epoch: 10 Global Step: 174780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:43,574-Speed 9284.53 samples/sec Loss 5.5319 LearningRate 0.0227 Epoch: 10 Global Step: 174790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:44,651-Speed 9515.26 samples/sec Loss 5.4583 LearningRate 0.0227 Epoch: 10 Global Step: 174800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:45,773-Speed 9130.67 samples/sec Loss 5.5525 LearningRate 0.0227 Epoch: 10 Global Step: 174810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:46,838-Speed 9618.85 samples/sec Loss 5.5648 LearningRate 0.0227 Epoch: 10 Global Step: 174820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:47,894-Speed 9707.14 samples/sec Loss 5.5498 LearningRate 0.0227 Epoch: 10 Global Step: 174830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:48,971-Speed 9515.71 samples/sec Loss 5.6339 LearningRate 0.0227 Epoch: 10 Global Step: 174840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:50,071-Speed 9313.19 samples/sec Loss 5.5583 LearningRate 0.0227 Epoch: 10 Global Step: 174850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:51,186-Speed 9184.94 samples/sec Loss 5.7076 LearningRate 0.0227 Epoch: 10 Global Step: 174860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:52,305-Speed 9154.43 samples/sec Loss 5.5726 LearningRate 0.0227 Epoch: 10 Global Step: 174870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:53,405-Speed 9315.10 samples/sec Loss 5.6078 LearningRate 0.0227 Epoch: 10 Global Step: 174880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:35:54,445-Speed 9851.35 samples/sec Loss 5.6041 LearningRate 0.0227 Epoch: 10 Global Step: 174890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:55,526-Speed 9481.66 samples/sec Loss 5.6018 LearningRate 0.0227 Epoch: 10 Global Step: 174900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:56,576-Speed 9759.18 samples/sec Loss 5.5558 LearningRate 0.0227 Epoch: 10 Global Step: 174910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:57,666-Speed 9403.15 samples/sec Loss 5.5093 LearningRate 0.0227 Epoch: 10 Global Step: 174920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:58,770-Speed 9281.29 samples/sec Loss 5.5244 LearningRate 0.0227 Epoch: 10 Global Step: 174930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:35:59,844-Speed 9539.24 samples/sec Loss 5.5964 LearningRate 0.0227 Epoch: 10 Global Step: 174940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:00,913-Speed 9583.85 samples/sec Loss 5.6865 LearningRate 0.0226 Epoch: 10 Global Step: 174950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:02,012-Speed 9319.97 samples/sec Loss 5.5841 LearningRate 0.0226 Epoch: 10 Global Step: 174960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:03,091-Speed 9494.75 samples/sec Loss 5.5565 LearningRate 0.0226 Epoch: 10 Global Step: 174970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:04,200-Speed 9245.85 samples/sec Loss 5.5983 LearningRate 0.0226 Epoch: 10 Global Step: 174980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:05,319-Speed 9150.43 samples/sec Loss 5.5939 LearningRate 0.0226 Epoch: 10 Global Step: 174990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:06,375-Speed 9706.00 samples/sec Loss 5.5800 LearningRate 0.0226 Epoch: 10 Global Step: 175000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:07,494-Speed 9161.54 samples/sec Loss 5.5921 LearningRate 0.0226 Epoch: 10 Global Step: 175010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:08,556-Speed 9643.11 samples/sec Loss 5.7622 LearningRate 0.0226 Epoch: 10 Global Step: 175020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:09,629-Speed 9548.24 samples/sec Loss 5.5135 LearningRate 0.0226 Epoch: 10 Global Step: 175030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:10,688-Speed 9680.13 samples/sec Loss 5.5762 LearningRate 0.0226 Epoch: 10 Global Step: 175040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:11,778-Speed 9397.61 samples/sec Loss 5.6196 LearningRate 0.0226 Epoch: 10 Global Step: 175050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:12,867-Speed 9411.02 samples/sec Loss 5.6024 LearningRate 0.0226 Epoch: 10 Global Step: 175060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:13,963-Speed 9354.67 samples/sec Loss 5.5582 LearningRate 0.0226 Epoch: 10 Global Step: 175070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:15,073-Speed 9225.54 samples/sec Loss 5.7766 LearningRate 0.0226 Epoch: 10 Global Step: 175080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:16,197-Speed 9115.46 samples/sec Loss 5.6555 LearningRate 0.0226 Epoch: 10 Global Step: 175090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:17,288-Speed 9390.09 samples/sec Loss 5.6176 LearningRate 0.0226 Epoch: 10 Global Step: 175100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:18,359-Speed 9568.12 samples/sec Loss 5.6587 LearningRate 0.0226 Epoch: 10 Global Step: 175110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:19,436-Speed 9517.01 samples/sec Loss 5.6315 LearningRate 0.0226 Epoch: 10 Global Step: 175120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:20,579-Speed 8962.81 samples/sec Loss 5.5144 LearningRate 0.0226 Epoch: 10 Global Step: 175130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:21,658-Speed 9490.84 samples/sec Loss 5.6435 LearningRate 0.0226 Epoch: 10 Global Step: 175140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:22,727-Speed 9592.16 samples/sec Loss 5.6486 LearningRate 0.0226 Epoch: 10 Global Step: 175150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:23,836-Speed 9237.37 samples/sec Loss 5.6563 LearningRate 0.0226 Epoch: 10 Global Step: 175160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:24,924-Speed 9418.91 samples/sec Loss 5.5878 LearningRate 0.0226 Epoch: 10 Global Step: 175170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:26,039-Speed 9189.37 samples/sec Loss 5.6828 LearningRate 0.0226 Epoch: 10 Global Step: 175180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:27,151-Speed 9209.89 samples/sec Loss 5.6952 LearningRate 0.0226 Epoch: 10 Global Step: 175190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:28,225-Speed 9544.15 samples/sec Loss 5.5344 LearningRate 0.0226 Epoch: 10 Global Step: 175200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:29,294-Speed 9579.48 samples/sec Loss 5.6268 LearningRate 0.0226 Epoch: 10 Global Step: 175210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:30,380-Speed 9436.24 samples/sec Loss 5.6528 LearningRate 0.0226 Epoch: 10 Global Step: 175220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:31,485-Speed 9275.40 samples/sec Loss 5.5608 LearningRate 0.0226 Epoch: 10 Global Step: 175230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:32,559-Speed 9539.20 samples/sec Loss 5.6392 LearningRate 0.0226 Epoch: 10 Global Step: 175240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:33,646-Speed 9425.48 samples/sec Loss 5.5277 LearningRate 0.0226 Epoch: 10 Global Step: 175250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:34,732-Speed 9437.97 samples/sec Loss 5.6857 LearningRate 0.0226 Epoch: 10 Global Step: 175260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:35,841-Speed 9238.25 samples/sec Loss 5.6201 LearningRate 0.0226 Epoch: 10 Global Step: 175270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:36,925-Speed 9449.68 samples/sec Loss 5.5464 LearningRate 0.0226 Epoch: 10 Global Step: 175280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:37,991-Speed 9611.93 samples/sec Loss 5.6056 LearningRate 0.0226 Epoch: 10 Global Step: 175290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:39,049-Speed 9683.58 samples/sec Loss 5.7413 LearningRate 0.0225 Epoch: 10 Global Step: 175300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:40,118-Speed 9584.06 samples/sec Loss 5.5819 LearningRate 0.0225 Epoch: 10 Global Step: 175310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:41,189-Speed 9568.90 samples/sec Loss 5.5567 LearningRate 0.0225 Epoch: 10 Global Step: 175320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:42,268-Speed 9499.41 samples/sec Loss 5.5510 LearningRate 0.0225 Epoch: 10 Global Step: 175330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:43,327-Speed 9683.23 samples/sec Loss 5.6016 LearningRate 0.0225 Epoch: 10 Global Step: 175340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:44,401-Speed 9531.87 samples/sec Loss 5.6367 LearningRate 0.0225 Epoch: 10 Global Step: 175350 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:36:45,453-Speed 9743.57 samples/sec Loss 5.6344 LearningRate 0.0225 Epoch: 10 Global Step: 175360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:46,544-Speed 9393.76 samples/sec Loss 5.5321 LearningRate 0.0225 Epoch: 10 Global Step: 175370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:47,594-Speed 9754.31 samples/sec Loss 5.5255 LearningRate 0.0225 Epoch: 10 Global Step: 175380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:48,696-Speed 9297.58 samples/sec Loss 5.5818 LearningRate 0.0225 Epoch: 10 Global Step: 175390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:49,810-Speed 9198.91 samples/sec Loss 5.6840 LearningRate 0.0225 Epoch: 10 Global Step: 175400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:50,936-Speed 9100.55 samples/sec Loss 5.6639 LearningRate 0.0225 Epoch: 10 Global Step: 175410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:52,003-Speed 9598.34 samples/sec Loss 5.6409 LearningRate 0.0225 Epoch: 10 Global Step: 175420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:53,101-Speed 9328.75 samples/sec Loss 5.7135 LearningRate 0.0225 Epoch: 10 Global Step: 175430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:54,200-Speed 9322.31 samples/sec Loss 5.5544 LearningRate 0.0225 Epoch: 10 Global Step: 175440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:55,254-Speed 9721.38 samples/sec Loss 5.6983 LearningRate 0.0225 Epoch: 10 Global Step: 175450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:36:56,360-Speed 9268.37 samples/sec Loss 5.5833 LearningRate 0.0225 Epoch: 10 Global Step: 175460 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:36:57,463-Speed 9291.01 samples/sec Loss 5.6317 LearningRate 0.0225 Epoch: 10 Global Step: 175470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:58,577-Speed 9200.03 samples/sec Loss 5.7184 LearningRate 0.0225 Epoch: 10 Global Step: 175480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:36:59,661-Speed 9452.44 samples/sec Loss 5.5428 LearningRate 0.0225 Epoch: 10 Global Step: 175490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:37:00,726-Speed 9615.97 samples/sec Loss 5.8387 LearningRate 0.0225 Epoch: 10 Global Step: 175500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:37:01,826-Speed 9315.07 samples/sec Loss 5.6147 LearningRate 0.0225 Epoch: 10 Global Step: 175510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:37:02,916-Speed 9405.38 samples/sec Loss 5.6912 LearningRate 0.0225 Epoch: 10 Global Step: 175520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:37:04,002-Speed 9435.18 samples/sec Loss 5.5784 LearningRate 0.0225 Epoch: 10 Global Step: 175530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:37:05,086-Speed 9447.24 samples/sec Loss 5.6207 LearningRate 0.0225 Epoch: 10 Global Step: 175540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:37:06,142-Speed 9698.49 samples/sec Loss 5.4579 LearningRate 0.0225 Epoch: 10 Global Step: 175550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:37:07,235-Speed 9380.69 samples/sec Loss 5.6327 LearningRate 0.0225 Epoch: 10 Global Step: 175560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:37:08,392-Speed 8855.31 samples/sec Loss 5.6691 LearningRate 0.0225 Epoch: 10 Global Step: 175570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:09,458-Speed 9614.90 samples/sec Loss 5.7077 LearningRate 0.0225 Epoch: 10 Global Step: 175580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:10,513-Speed 9704.31 samples/sec Loss 5.6884 LearningRate 0.0225 Epoch: 10 Global Step: 175590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:11,604-Speed 9398.29 samples/sec Loss 5.6767 LearningRate 0.0225 Epoch: 10 Global Step: 175600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:12,731-Speed 9087.00 samples/sec Loss 5.6179 LearningRate 0.0225 Epoch: 10 Global Step: 175610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:13,808-Speed 9513.78 samples/sec Loss 5.5571 LearningRate 0.0225 Epoch: 10 Global Step: 175620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:14,861-Speed 9734.53 samples/sec Loss 5.7246 LearningRate 0.0225 Epoch: 10 Global Step: 175630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:15,961-Speed 9312.81 samples/sec Loss 5.6290 LearningRate 0.0225 Epoch: 10 Global Step: 175640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:17,069-Speed 9244.63 samples/sec Loss 5.7200 LearningRate 0.0225 Epoch: 10 Global Step: 175650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:18,186-Speed 9172.58 samples/sec Loss 5.5802 LearningRate 0.0224 Epoch: 10 Global Step: 175660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:19,297-Speed 9225.19 samples/sec Loss 5.6374 LearningRate 0.0224 Epoch: 10 Global Step: 175670 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:37:20,373-Speed 9524.50 samples/sec Loss 5.6585 LearningRate 0.0224 Epoch: 10 Global Step: 175680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:21,459-Speed 9435.95 samples/sec Loss 5.6588 LearningRate 0.0224 Epoch: 10 Global Step: 175690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:22,507-Speed 9774.22 samples/sec Loss 5.5977 LearningRate 0.0224 Epoch: 10 Global Step: 175700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:23,621-Speed 9202.08 samples/sec Loss 5.5534 LearningRate 0.0224 Epoch: 10 Global Step: 175710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:24,691-Speed 9576.28 samples/sec Loss 5.7039 LearningRate 0.0224 Epoch: 10 Global Step: 175720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:25,777-Speed 9431.91 samples/sec Loss 5.6603 LearningRate 0.0224 Epoch: 10 Global Step: 175730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:26,832-Speed 9715.14 samples/sec Loss 5.5905 LearningRate 0.0224 Epoch: 10 Global Step: 175740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:27,889-Speed 9692.70 samples/sec Loss 5.6108 LearningRate 0.0224 Epoch: 10 Global Step: 175750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:28,954-Speed 9622.06 samples/sec Loss 5.5740 LearningRate 0.0224 Epoch: 10 Global Step: 175760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:30,085-Speed 9059.17 samples/sec Loss 5.5795 LearningRate 0.0224 Epoch: 10 Global Step: 175770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:31,198-Speed 9199.83 samples/sec Loss 5.8018 LearningRate 0.0224 Epoch: 10 Global Step: 175780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:37:32,273-Speed 9533.39 samples/sec Loss 5.6908 LearningRate 0.0224 Epoch: 10 Global Step: 175790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:33,406-Speed 9040.88 samples/sec Loss 5.6513 LearningRate 0.0224 Epoch: 10 Global Step: 175800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:34,514-Speed 9248.06 samples/sec Loss 5.5340 LearningRate 0.0224 Epoch: 10 Global Step: 175810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:35,602-Speed 9419.55 samples/sec Loss 5.6526 LearningRate 0.0224 Epoch: 10 Global Step: 175820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:36,675-Speed 9551.67 samples/sec Loss 5.6359 LearningRate 0.0224 Epoch: 10 Global Step: 175830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:37,747-Speed 9556.63 samples/sec Loss 5.5887 LearningRate 0.0224 Epoch: 10 Global Step: 175840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:38,839-Speed 9391.36 samples/sec Loss 5.6507 LearningRate 0.0224 Epoch: 10 Global Step: 175850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:39,923-Speed 9455.77 samples/sec Loss 5.6767 LearningRate 0.0224 Epoch: 10 Global Step: 175860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:41,008-Speed 9440.35 samples/sec Loss 5.6785 LearningRate 0.0224 Epoch: 10 Global Step: 175870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:42,116-Speed 9246.56 samples/sec Loss 5.6615 LearningRate 0.0224 Epoch: 10 Global Step: 175880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:43,196-Speed 9489.25 samples/sec Loss 5.6381 LearningRate 0.0224 Epoch: 10 Global Step: 175890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:44,313-Speed 9172.32 samples/sec Loss 5.6604 LearningRate 0.0224 Epoch: 10 Global Step: 175900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:45,379-Speed 9612.30 samples/sec Loss 5.5885 LearningRate 0.0224 Epoch: 10 Global Step: 175910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:46,442-Speed 9643.79 samples/sec Loss 5.5998 LearningRate 0.0224 Epoch: 10 Global Step: 175920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:47,537-Speed 9352.18 samples/sec Loss 5.6579 LearningRate 0.0224 Epoch: 10 Global Step: 175930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:48,634-Speed 9341.91 samples/sec Loss 5.4957 LearningRate 0.0224 Epoch: 10 Global Step: 175940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:49,756-Speed 9128.49 samples/sec Loss 5.5974 LearningRate 0.0224 Epoch: 10 Global Step: 175950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:50,847-Speed 9389.52 samples/sec Loss 5.6044 LearningRate 0.0224 Epoch: 10 Global Step: 175960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:51,936-Speed 9409.04 samples/sec Loss 5.5227 LearningRate 0.0224 Epoch: 10 Global Step: 175970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:53,075-Speed 9000.09 samples/sec Loss 5.5337 LearningRate 0.0224 Epoch: 10 Global Step: 175980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:37:54,171-Speed 9346.19 samples/sec Loss 5.6387 LearningRate 0.0224 Epoch: 10 Global Step: 175990 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:37:55,248-Speed 9514.58 samples/sec Loss 5.7409 LearningRate 0.0224 Epoch: 10 Global Step: 176000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:38:17,021-[lfw][176000]XNorm: 9.283242 Training: 2022-04-11 18:38:17,022-[lfw][176000]Accuracy-Flip: 0.99550+-0.00279 Training: 2022-04-11 18:38:17,023-[lfw][176000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:38:42,188-[cfp_fp][176000]XNorm: 8.007009 Training: 2022-04-11 18:38:42,189-[cfp_fp][176000]Accuracy-Flip: 0.96100+-0.01002 Training: 2022-04-11 18:38:42,189-[cfp_fp][176000]Accuracy-Highest: 0.96500 Training: 2022-04-11 18:39:03,905-[agedb_30][176000]XNorm: 8.989899 Training: 2022-04-11 18:39:03,906-[agedb_30][176000]Accuracy-Flip: 0.96550+-0.00931 Training: 2022-04-11 18:39:03,906-[agedb_30][176000]Accuracy-Highest: 0.96917 Training: 2022-04-11 18:39:04,959-Speed 146.89 samples/sec Loss 5.5873 LearningRate 0.0223 Epoch: 10 Global Step: 176010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:06,004-Speed 9795.05 samples/sec Loss 5.6207 LearningRate 0.0223 Epoch: 10 Global Step: 176020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:07,064-Speed 9666.30 samples/sec Loss 5.6416 LearningRate 0.0223 Epoch: 10 Global Step: 176030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:08,184-Speed 9151.08 samples/sec Loss 5.5674 LearningRate 0.0223 Epoch: 10 Global Step: 176040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:09,266-Speed 9471.01 samples/sec Loss 5.7244 LearningRate 0.0223 Epoch: 10 Global Step: 176050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:10,354-Speed 9412.49 samples/sec Loss 5.5498 LearningRate 0.0223 Epoch: 10 Global Step: 176060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:11,436-Speed 9468.07 samples/sec Loss 5.5895 LearningRate 0.0223 Epoch: 10 Global Step: 176070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:12,504-Speed 9598.30 samples/sec Loss 5.6402 LearningRate 0.0223 Epoch: 10 Global Step: 176080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:13,619-Speed 9185.01 samples/sec Loss 5.6699 LearningRate 0.0223 Epoch: 10 Global Step: 176090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:14,681-Speed 9649.68 samples/sec Loss 5.6574 LearningRate 0.0223 Epoch: 10 Global Step: 176100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:15,765-Speed 9450.17 samples/sec Loss 5.7409 LearningRate 0.0223 Epoch: 10 Global Step: 176110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:16,833-Speed 9599.32 samples/sec Loss 5.5235 LearningRate 0.0223 Epoch: 10 Global Step: 176120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:17,973-Speed 8991.26 samples/sec Loss 5.6368 LearningRate 0.0223 Epoch: 10 Global Step: 176130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:19,058-Speed 9439.56 samples/sec Loss 5.5672 LearningRate 0.0223 Epoch: 10 Global Step: 176140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:20,173-Speed 9190.42 samples/sec Loss 5.6815 LearningRate 0.0223 Epoch: 10 Global Step: 176150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:21,247-Speed 9536.17 samples/sec Loss 5.6182 LearningRate 0.0223 Epoch: 10 Global Step: 176160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:39:22,363-Speed 9177.78 samples/sec Loss 5.4764 LearningRate 0.0223 Epoch: 10 Global Step: 176170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:39:23,451-Speed 9426.30 samples/sec Loss 5.6420 LearningRate 0.0223 Epoch: 10 Global Step: 176180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:39:24,587-Speed 9015.86 samples/sec Loss 5.7506 LearningRate 0.0223 Epoch: 10 Global Step: 176190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:39:25,693-Speed 9269.39 samples/sec Loss 5.5643 LearningRate 0.0223 Epoch: 10 Global Step: 176200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:39:26,789-Speed 9348.24 samples/sec Loss 5.7092 LearningRate 0.0223 Epoch: 10 Global Step: 176210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:39:27,858-Speed 9589.40 samples/sec Loss 5.6900 LearningRate 0.0223 Epoch: 10 Global Step: 176220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:39:28,959-Speed 9302.91 samples/sec Loss 5.7405 LearningRate 0.0223 Epoch: 10 Global Step: 176230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:39:30,053-Speed 9361.98 samples/sec Loss 5.6190 LearningRate 0.0223 Epoch: 10 Global Step: 176240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:39:31,089-Speed 9890.17 samples/sec Loss 5.6986 LearningRate 0.0223 Epoch: 10 Global Step: 176250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:39:32,188-Speed 9327.10 samples/sec Loss 5.5847 LearningRate 0.0223 Epoch: 10 Global Step: 176260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:33,332-Speed 8950.29 samples/sec Loss 5.5840 LearningRate 0.0223 Epoch: 10 Global Step: 176270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:34,412-Speed 9485.03 samples/sec Loss 5.5823 LearningRate 0.0223 Epoch: 10 Global Step: 176280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:35,529-Speed 9180.04 samples/sec Loss 5.6069 LearningRate 0.0223 Epoch: 10 Global Step: 176290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:36,622-Speed 9376.32 samples/sec Loss 5.7007 LearningRate 0.0223 Epoch: 10 Global Step: 176300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:37,726-Speed 9282.19 samples/sec Loss 5.6093 LearningRate 0.0223 Epoch: 10 Global Step: 176310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:38,871-Speed 8948.72 samples/sec Loss 5.6192 LearningRate 0.0223 Epoch: 10 Global Step: 176320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:40,036-Speed 8791.31 samples/sec Loss 5.6347 LearningRate 0.0223 Epoch: 10 Global Step: 176330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:41,136-Speed 9320.53 samples/sec Loss 5.6088 LearningRate 0.0223 Epoch: 10 Global Step: 176340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:42,245-Speed 9236.05 samples/sec Loss 5.5839 LearningRate 0.0223 Epoch: 10 Global Step: 176350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:43,329-Speed 9456.02 samples/sec Loss 5.5815 LearningRate 0.0222 Epoch: 10 Global Step: 176360 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:39:44,433-Speed 9277.45 samples/sec Loss 5.6291 LearningRate 0.0222 Epoch: 10 Global Step: 176370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:45,532-Speed 9322.75 samples/sec Loss 5.5633 LearningRate 0.0222 Epoch: 10 Global Step: 176380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:46,639-Speed 9260.30 samples/sec Loss 5.7331 LearningRate 0.0222 Epoch: 10 Global Step: 176390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:47,750-Speed 9221.52 samples/sec Loss 5.6109 LearningRate 0.0222 Epoch: 10 Global Step: 176400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:48,840-Speed 9406.29 samples/sec Loss 5.6082 LearningRate 0.0222 Epoch: 10 Global Step: 176410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:49,914-Speed 9532.41 samples/sec Loss 5.6885 LearningRate 0.0222 Epoch: 10 Global Step: 176420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:51,073-Speed 8844.48 samples/sec Loss 5.7015 LearningRate 0.0222 Epoch: 10 Global Step: 176430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:52,186-Speed 9202.82 samples/sec Loss 5.6878 LearningRate 0.0222 Epoch: 10 Global Step: 176440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:53,292-Speed 9268.33 samples/sec Loss 5.6956 LearningRate 0.0222 Epoch: 10 Global Step: 176450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:54,407-Speed 9190.47 samples/sec Loss 5.6092 LearningRate 0.0222 Epoch: 10 Global Step: 176460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:55,522-Speed 9185.47 samples/sec Loss 5.5872 LearningRate 0.0222 Epoch: 10 Global Step: 176470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:56,596-Speed 9541.10 samples/sec Loss 5.6440 LearningRate 0.0222 Epoch: 10 Global Step: 176480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:57,690-Speed 9362.25 samples/sec Loss 5.7211 LearningRate 0.0222 Epoch: 10 Global Step: 176490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:58,762-Speed 9556.37 samples/sec Loss 5.6030 LearningRate 0.0222 Epoch: 10 Global Step: 176500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:39:59,889-Speed 9093.80 samples/sec Loss 5.5993 LearningRate 0.0222 Epoch: 10 Global Step: 176510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:01,018-Speed 9072.40 samples/sec Loss 5.6482 LearningRate 0.0222 Epoch: 10 Global Step: 176520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:02,121-Speed 9296.03 samples/sec Loss 5.5816 LearningRate 0.0222 Epoch: 10 Global Step: 176530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:03,252-Speed 9054.46 samples/sec Loss 5.6626 LearningRate 0.0222 Epoch: 10 Global Step: 176540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:04,319-Speed 9602.92 samples/sec Loss 5.6804 LearningRate 0.0222 Epoch: 10 Global Step: 176550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:05,366-Speed 9796.21 samples/sec Loss 5.6762 LearningRate 0.0222 Epoch: 10 Global Step: 176560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:06,496-Speed 9061.29 samples/sec Loss 5.5991 LearningRate 0.0222 Epoch: 10 Global Step: 176570 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:40:07,581-Speed 9447.49 samples/sec Loss 5.6493 LearningRate 0.0222 Epoch: 10 Global Step: 176580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:08,656-Speed 9530.12 samples/sec Loss 5.5905 LearningRate 0.0222 Epoch: 10 Global Step: 176590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:09,727-Speed 9571.12 samples/sec Loss 5.5832 LearningRate 0.0222 Epoch: 10 Global Step: 176600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:10,825-Speed 9325.84 samples/sec Loss 5.6079 LearningRate 0.0222 Epoch: 10 Global Step: 176610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:11,912-Speed 9422.03 samples/sec Loss 5.6760 LearningRate 0.0222 Epoch: 10 Global Step: 176620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:12,969-Speed 9700.05 samples/sec Loss 5.5921 LearningRate 0.0222 Epoch: 10 Global Step: 176630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:14,122-Speed 8882.65 samples/sec Loss 5.6668 LearningRate 0.0222 Epoch: 10 Global Step: 176640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:15,217-Speed 9358.96 samples/sec Loss 5.5178 LearningRate 0.0222 Epoch: 10 Global Step: 176650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:16,307-Speed 9401.31 samples/sec Loss 5.6215 LearningRate 0.0222 Epoch: 10 Global Step: 176660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:17,384-Speed 9517.65 samples/sec Loss 5.5758 LearningRate 0.0222 Epoch: 10 Global Step: 176670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:18,515-Speed 9058.83 samples/sec Loss 5.5817 LearningRate 0.0222 Epoch: 10 Global Step: 176680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:19,602-Speed 9424.02 samples/sec Loss 5.5608 LearningRate 0.0222 Epoch: 10 Global Step: 176690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:20,671-Speed 9582.82 samples/sec Loss 5.6230 LearningRate 0.0222 Epoch: 10 Global Step: 176700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:21,754-Speed 9461.12 samples/sec Loss 5.6215 LearningRate 0.0222 Epoch: 10 Global Step: 176710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:22,846-Speed 9381.66 samples/sec Loss 5.6143 LearningRate 0.0221 Epoch: 10 Global Step: 176720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:23,980-Speed 9040.93 samples/sec Loss 5.7216 LearningRate 0.0221 Epoch: 10 Global Step: 176730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:25,044-Speed 9635.13 samples/sec Loss 5.6030 LearningRate 0.0221 Epoch: 10 Global Step: 176740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:26,110-Speed 9609.58 samples/sec Loss 5.6095 LearningRate 0.0221 Epoch: 10 Global Step: 176750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:27,201-Speed 9387.42 samples/sec Loss 5.6838 LearningRate 0.0221 Epoch: 10 Global Step: 176760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:28,266-Speed 9624.25 samples/sec Loss 5.6826 LearningRate 0.0221 Epoch: 10 Global Step: 176770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:29,399-Speed 9039.86 samples/sec Loss 5.7198 LearningRate 0.0221 Epoch: 10 Global Step: 176780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:30,455-Speed 9709.72 samples/sec Loss 5.7016 LearningRate 0.0221 Epoch: 10 Global Step: 176790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:31,543-Speed 9412.89 samples/sec Loss 5.6474 LearningRate 0.0221 Epoch: 10 Global Step: 176800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:32,601-Speed 9685.01 samples/sec Loss 5.5905 LearningRate 0.0221 Epoch: 10 Global Step: 176810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:33,678-Speed 9513.93 samples/sec Loss 5.6407 LearningRate 0.0221 Epoch: 10 Global Step: 176820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:40:34,763-Speed 9436.85 samples/sec Loss 5.6841 LearningRate 0.0221 Epoch: 10 Global Step: 176830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:40:35,849-Speed 9440.98 samples/sec Loss 5.6408 LearningRate 0.0221 Epoch: 10 Global Step: 176840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:40:36,927-Speed 9506.65 samples/sec Loss 5.6145 LearningRate 0.0221 Epoch: 10 Global Step: 176850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:40:38,020-Speed 9372.39 samples/sec Loss 5.6073 LearningRate 0.0221 Epoch: 10 Global Step: 176860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:40:39,093-Speed 9549.71 samples/sec Loss 5.7179 LearningRate 0.0221 Epoch: 10 Global Step: 176870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:40:40,153-Speed 9659.47 samples/sec Loss 5.6832 LearningRate 0.0221 Epoch: 10 Global Step: 176880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:40:41,215-Speed 9647.55 samples/sec Loss 5.5906 LearningRate 0.0221 Epoch: 10 Global Step: 176890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:40:42,308-Speed 9377.39 samples/sec Loss 5.6645 LearningRate 0.0221 Epoch: 10 Global Step: 176900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:40:43,406-Speed 9328.13 samples/sec Loss 5.7371 LearningRate 0.0221 Epoch: 10 Global Step: 176910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:40:44,502-Speed 9351.92 samples/sec Loss 5.5605 LearningRate 0.0221 Epoch: 10 Global Step: 176920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:45,607-Speed 9267.52 samples/sec Loss 5.6809 LearningRate 0.0221 Epoch: 10 Global Step: 176930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:46,694-Speed 9429.12 samples/sec Loss 5.5599 LearningRate 0.0221 Epoch: 10 Global Step: 176940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:47,778-Speed 9455.47 samples/sec Loss 5.7195 LearningRate 0.0221 Epoch: 10 Global Step: 176950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:48,872-Speed 9368.36 samples/sec Loss 5.6693 LearningRate 0.0221 Epoch: 10 Global Step: 176960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:49,986-Speed 9192.64 samples/sec Loss 5.5787 LearningRate 0.0221 Epoch: 10 Global Step: 176970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:51,093-Speed 9258.63 samples/sec Loss 5.6221 LearningRate 0.0221 Epoch: 10 Global Step: 176980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:52,171-Speed 9507.88 samples/sec Loss 5.6414 LearningRate 0.0221 Epoch: 10 Global Step: 176990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:53,321-Speed 8913.76 samples/sec Loss 5.6257 LearningRate 0.0221 Epoch: 10 Global Step: 177000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:54,422-Speed 9308.62 samples/sec Loss 5.6782 LearningRate 0.0221 Epoch: 10 Global Step: 177010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:55,516-Speed 9369.82 samples/sec Loss 5.6960 LearningRate 0.0221 Epoch: 10 Global Step: 177020 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:40:56,595-Speed 9493.77 samples/sec Loss 5.6874 LearningRate 0.0221 Epoch: 10 Global Step: 177030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:57,674-Speed 9492.32 samples/sec Loss 5.5657 LearningRate 0.0221 Epoch: 10 Global Step: 177040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:58,754-Speed 9483.93 samples/sec Loss 5.6215 LearningRate 0.0221 Epoch: 10 Global Step: 177050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:40:59,875-Speed 9137.66 samples/sec Loss 5.5971 LearningRate 0.0221 Epoch: 10 Global Step: 177060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:00,941-Speed 9619.80 samples/sec Loss 5.6483 LearningRate 0.0220 Epoch: 10 Global Step: 177070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:02,070-Speed 9069.17 samples/sec Loss 5.6512 LearningRate 0.0220 Epoch: 10 Global Step: 177080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:03,117-Speed 9790.60 samples/sec Loss 5.6215 LearningRate 0.0220 Epoch: 10 Global Step: 177090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:04,237-Speed 9144.84 samples/sec Loss 5.7160 LearningRate 0.0220 Epoch: 10 Global Step: 177100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:41:05,329-Speed 9386.64 samples/sec Loss 5.5712 LearningRate 0.0220 Epoch: 10 Global Step: 177110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:41:06,442-Speed 9205.08 samples/sec Loss 5.7283 LearningRate 0.0220 Epoch: 10 Global Step: 177120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:41:07,562-Speed 9149.66 samples/sec Loss 5.6099 LearningRate 0.0220 Epoch: 10 Global Step: 177130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:41:08,644-Speed 9466.40 samples/sec Loss 5.7241 LearningRate 0.0220 Epoch: 10 Global Step: 177140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:41:09,731-Speed 9428.11 samples/sec Loss 5.5610 LearningRate 0.0220 Epoch: 10 Global Step: 177150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:41:10,800-Speed 9588.48 samples/sec Loss 5.5997 LearningRate 0.0220 Epoch: 10 Global Step: 177160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:41:11,882-Speed 9470.07 samples/sec Loss 5.6210 LearningRate 0.0220 Epoch: 10 Global Step: 177170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:41:12,995-Speed 9197.98 samples/sec Loss 5.6832 LearningRate 0.0220 Epoch: 10 Global Step: 177180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:41:14,081-Speed 9441.38 samples/sec Loss 5.6487 LearningRate 0.0220 Epoch: 10 Global Step: 177190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:41:15,140-Speed 9673.35 samples/sec Loss 5.5713 LearningRate 0.0220 Epoch: 10 Global Step: 177200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:16,246-Speed 9263.03 samples/sec Loss 5.7188 LearningRate 0.0220 Epoch: 10 Global Step: 177210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:17,337-Speed 9402.95 samples/sec Loss 5.6455 LearningRate 0.0220 Epoch: 10 Global Step: 177220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:18,443-Speed 9265.07 samples/sec Loss 5.5972 LearningRate 0.0220 Epoch: 10 Global Step: 177230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:19,513-Speed 9571.78 samples/sec Loss 5.5954 LearningRate 0.0220 Epoch: 10 Global Step: 177240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:20,598-Speed 9445.53 samples/sec Loss 5.5056 LearningRate 0.0220 Epoch: 10 Global Step: 177250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:21,670-Speed 9560.30 samples/sec Loss 5.5789 LearningRate 0.0220 Epoch: 10 Global Step: 177260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:22,762-Speed 9379.36 samples/sec Loss 5.6260 LearningRate 0.0220 Epoch: 10 Global Step: 177270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:23,850-Speed 9420.44 samples/sec Loss 5.5602 LearningRate 0.0220 Epoch: 10 Global Step: 177280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:24,926-Speed 9528.97 samples/sec Loss 5.5937 LearningRate 0.0220 Epoch: 10 Global Step: 177290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:26,003-Speed 9514.45 samples/sec Loss 5.6996 LearningRate 0.0220 Epoch: 10 Global Step: 177300 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:41:27,095-Speed 9376.46 samples/sec Loss 5.6338 LearningRate 0.0220 Epoch: 10 Global Step: 177310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:28,226-Speed 9059.30 samples/sec Loss 5.6042 LearningRate 0.0220 Epoch: 10 Global Step: 177320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:29,292-Speed 9613.73 samples/sec Loss 5.7469 LearningRate 0.0220 Epoch: 10 Global Step: 177330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:30,332-Speed 9849.81 samples/sec Loss 5.6728 LearningRate 0.0220 Epoch: 10 Global Step: 177340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:31,435-Speed 9291.01 samples/sec Loss 5.7280 LearningRate 0.0220 Epoch: 10 Global Step: 177350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:32,524-Speed 9403.32 samples/sec Loss 5.6625 LearningRate 0.0220 Epoch: 10 Global Step: 177360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:33,594-Speed 9583.29 samples/sec Loss 5.7124 LearningRate 0.0220 Epoch: 10 Global Step: 177370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:34,671-Speed 9508.11 samples/sec Loss 5.6668 LearningRate 0.0220 Epoch: 10 Global Step: 177380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:35,723-Speed 9746.13 samples/sec Loss 5.6383 LearningRate 0.0220 Epoch: 10 Global Step: 177390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:36,813-Speed 9401.05 samples/sec Loss 5.6701 LearningRate 0.0220 Epoch: 10 Global Step: 177400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:37,943-Speed 9065.08 samples/sec Loss 5.6433 LearningRate 0.0220 Epoch: 10 Global Step: 177410 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:41:39,022-Speed 9500.38 samples/sec Loss 5.7070 LearningRate 0.0220 Epoch: 10 Global Step: 177420 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:41:40,099-Speed 9512.02 samples/sec Loss 5.6636 LearningRate 0.0219 Epoch: 10 Global Step: 177430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:41,180-Speed 9482.72 samples/sec Loss 5.5542 LearningRate 0.0219 Epoch: 10 Global Step: 177440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:42,301-Speed 9136.43 samples/sec Loss 5.6727 LearningRate 0.0219 Epoch: 10 Global Step: 177450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:43,406-Speed 9274.72 samples/sec Loss 5.6419 LearningRate 0.0219 Epoch: 10 Global Step: 177460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:44,513-Speed 9257.31 samples/sec Loss 5.7026 LearningRate 0.0219 Epoch: 10 Global Step: 177470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:45,584-Speed 9561.17 samples/sec Loss 5.5812 LearningRate 0.0219 Epoch: 10 Global Step: 177480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:46,673-Speed 9416.46 samples/sec Loss 5.6783 LearningRate 0.0219 Epoch: 10 Global Step: 177490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:47,778-Speed 9272.47 samples/sec Loss 5.6623 LearningRate 0.0219 Epoch: 10 Global Step: 177500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:48,881-Speed 9281.87 samples/sec Loss 5.5880 LearningRate 0.0219 Epoch: 10 Global Step: 177510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:49,954-Speed 9551.94 samples/sec Loss 5.7698 LearningRate 0.0219 Epoch: 10 Global Step: 177520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:51,027-Speed 9545.82 samples/sec Loss 5.6682 LearningRate 0.0219 Epoch: 10 Global Step: 177530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:52,128-Speed 9303.48 samples/sec Loss 5.5857 LearningRate 0.0219 Epoch: 10 Global Step: 177540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:53,202-Speed 9551.27 samples/sec Loss 5.6878 LearningRate 0.0219 Epoch: 10 Global Step: 177550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:54,294-Speed 9379.84 samples/sec Loss 5.6398 LearningRate 0.0219 Epoch: 10 Global Step: 177560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:55,402-Speed 9250.16 samples/sec Loss 5.6622 LearningRate 0.0219 Epoch: 10 Global Step: 177570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:56,472-Speed 9578.62 samples/sec Loss 5.6084 LearningRate 0.0219 Epoch: 10 Global Step: 177580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:57,538-Speed 9609.49 samples/sec Loss 5.6146 LearningRate 0.0219 Epoch: 10 Global Step: 177590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:58,614-Speed 9528.94 samples/sec Loss 5.6426 LearningRate 0.0219 Epoch: 10 Global Step: 177600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:41:59,673-Speed 9670.43 samples/sec Loss 5.7032 LearningRate 0.0219 Epoch: 10 Global Step: 177610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:00,775-Speed 9300.96 samples/sec Loss 5.6798 LearningRate 0.0219 Epoch: 10 Global Step: 177620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:01,882-Speed 9259.07 samples/sec Loss 5.6443 LearningRate 0.0219 Epoch: 10 Global Step: 177630 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:42:02,937-Speed 9711.08 samples/sec Loss 5.6303 LearningRate 0.0219 Epoch: 10 Global Step: 177640 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:42:04,022-Speed 9443.09 samples/sec Loss 5.6054 LearningRate 0.0219 Epoch: 10 Global Step: 177650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:05,104-Speed 9465.56 samples/sec Loss 5.6080 LearningRate 0.0219 Epoch: 10 Global Step: 177660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:42:06,217-Speed 9210.48 samples/sec Loss 5.6838 LearningRate 0.0219 Epoch: 10 Global Step: 177670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:42:07,295-Speed 9500.76 samples/sec Loss 5.6428 LearningRate 0.0219 Epoch: 10 Global Step: 177680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:42:08,477-Speed 8672.84 samples/sec Loss 5.7342 LearningRate 0.0219 Epoch: 10 Global Step: 177690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:42:09,590-Speed 9204.93 samples/sec Loss 5.6431 LearningRate 0.0219 Epoch: 10 Global Step: 177700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:42:10,694-Speed 9278.58 samples/sec Loss 5.6919 LearningRate 0.0219 Epoch: 10 Global Step: 177710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:42:11,793-Speed 9325.70 samples/sec Loss 5.6387 LearningRate 0.0219 Epoch: 10 Global Step: 177720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:42:12,894-Speed 9304.88 samples/sec Loss 5.5788 LearningRate 0.0219 Epoch: 10 Global Step: 177730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:42:14,020-Speed 9106.48 samples/sec Loss 5.5696 LearningRate 0.0219 Epoch: 10 Global Step: 177740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:42:15,124-Speed 9279.07 samples/sec Loss 5.5716 LearningRate 0.0219 Epoch: 10 Global Step: 177750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:42:16,201-Speed 9509.23 samples/sec Loss 5.6893 LearningRate 0.0219 Epoch: 10 Global Step: 177760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:17,322-Speed 9158.73 samples/sec Loss 5.6187 LearningRate 0.0219 Epoch: 10 Global Step: 177770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:18,442-Speed 9146.28 samples/sec Loss 5.5252 LearningRate 0.0218 Epoch: 10 Global Step: 177780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:19,529-Speed 9429.52 samples/sec Loss 5.7383 LearningRate 0.0218 Epoch: 10 Global Step: 177790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:20,609-Speed 9486.58 samples/sec Loss 5.6875 LearningRate 0.0218 Epoch: 10 Global Step: 177800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:21,710-Speed 9306.73 samples/sec Loss 5.6540 LearningRate 0.0218 Epoch: 10 Global Step: 177810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:22,800-Speed 9396.20 samples/sec Loss 5.7208 LearningRate 0.0218 Epoch: 10 Global Step: 177820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:23,865-Speed 9625.46 samples/sec Loss 5.6828 LearningRate 0.0218 Epoch: 10 Global Step: 177830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:24,935-Speed 9573.53 samples/sec Loss 5.5604 LearningRate 0.0218 Epoch: 10 Global Step: 177840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:26,027-Speed 9380.00 samples/sec Loss 5.5758 LearningRate 0.0218 Epoch: 10 Global Step: 177850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:27,143-Speed 9186.86 samples/sec Loss 5.6067 LearningRate 0.0218 Epoch: 10 Global Step: 177860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:28,231-Speed 9419.06 samples/sec Loss 5.6172 LearningRate 0.0218 Epoch: 10 Global Step: 177870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:29,286-Speed 9711.13 samples/sec Loss 5.7662 LearningRate 0.0218 Epoch: 10 Global Step: 177880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:30,371-Speed 9443.93 samples/sec Loss 5.7206 LearningRate 0.0218 Epoch: 10 Global Step: 177890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:31,460-Speed 9407.35 samples/sec Loss 5.5881 LearningRate 0.0218 Epoch: 10 Global Step: 177900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:32,619-Speed 8845.02 samples/sec Loss 5.6345 LearningRate 0.0218 Epoch: 10 Global Step: 177910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:33,707-Speed 9416.71 samples/sec Loss 5.6762 LearningRate 0.0218 Epoch: 10 Global Step: 177920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:34,806-Speed 9323.09 samples/sec Loss 5.7180 LearningRate 0.0218 Epoch: 10 Global Step: 177930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:35,901-Speed 9359.88 samples/sec Loss 5.6263 LearningRate 0.0218 Epoch: 10 Global Step: 177940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:37,018-Speed 9167.00 samples/sec Loss 5.5218 LearningRate 0.0218 Epoch: 10 Global Step: 177950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:38,130-Speed 9216.97 samples/sec Loss 5.5799 LearningRate 0.0218 Epoch: 10 Global Step: 177960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:42:39,194-Speed 9627.69 samples/sec Loss 5.5668 LearningRate 0.0218 Epoch: 10 Global Step: 177970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:40,286-Speed 9385.51 samples/sec Loss 5.5736 LearningRate 0.0218 Epoch: 10 Global Step: 177980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:41,348-Speed 9643.57 samples/sec Loss 5.5965 LearningRate 0.0218 Epoch: 10 Global Step: 177990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:42:42,496-Speed 8922.72 samples/sec Loss 5.6176 LearningRate 0.0218 Epoch: 10 Global Step: 178000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:43:04,678-[lfw][178000]XNorm: 9.328771 Training: 2022-04-11 18:43:04,679-[lfw][178000]Accuracy-Flip: 0.99617+-0.00269 Training: 2022-04-11 18:43:04,679-[lfw][178000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:43:30,233-[cfp_fp][178000]XNorm: 7.926289 Training: 2022-04-11 18:43:30,234-[cfp_fp][178000]Accuracy-Flip: 0.96586+-0.00765 Training: 2022-04-11 18:43:30,234-[cfp_fp][178000]Accuracy-Highest: 0.96586 Training: 2022-04-11 18:43:52,234-[agedb_30][178000]XNorm: 8.942136 Training: 2022-04-11 18:43:52,235-[agedb_30][178000]Accuracy-Flip: 0.96867+-0.01137 Training: 2022-04-11 18:43:52,236-[agedb_30][178000]Accuracy-Highest: 0.96917 Training: 2022-04-11 18:43:53,332-Speed 144.56 samples/sec Loss 5.6423 LearningRate 0.0218 Epoch: 10 Global Step: 178010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:43:54,433-Speed 9310.02 samples/sec Loss 5.7036 LearningRate 0.0218 Epoch: 10 Global Step: 178020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:43:55,545-Speed 9210.04 samples/sec Loss 5.7103 LearningRate 0.0218 Epoch: 10 Global Step: 178030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:43:56,632-Speed 9425.47 samples/sec Loss 5.6158 LearningRate 0.0218 Epoch: 10 Global Step: 178040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:43:57,685-Speed 9730.95 samples/sec Loss 5.6494 LearningRate 0.0218 Epoch: 10 Global Step: 178050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:43:58,749-Speed 9630.00 samples/sec Loss 5.6988 LearningRate 0.0218 Epoch: 10 Global Step: 178060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:43:59,837-Speed 9419.18 samples/sec Loss 5.5224 LearningRate 0.0218 Epoch: 10 Global Step: 178070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:44:00,920-Speed 9465.05 samples/sec Loss 5.7026 LearningRate 0.0218 Epoch: 10 Global Step: 178080 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:44:02,007-Speed 9425.77 samples/sec Loss 5.6855 LearningRate 0.0218 Epoch: 10 Global Step: 178090 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:44:03,078-Speed 9564.16 samples/sec Loss 5.6272 LearningRate 0.0218 Epoch: 10 Global Step: 178100 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:44:04,171-Speed 9374.88 samples/sec Loss 5.6299 LearningRate 0.0218 Epoch: 10 Global Step: 178110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:05,238-Speed 9613.66 samples/sec Loss 5.5649 LearningRate 0.0218 Epoch: 10 Global Step: 178120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:06,309-Speed 9566.50 samples/sec Loss 5.5788 LearningRate 0.0218 Epoch: 10 Global Step: 178130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:07,380-Speed 9563.26 samples/sec Loss 5.5745 LearningRate 0.0217 Epoch: 10 Global Step: 178140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:08,470-Speed 9400.13 samples/sec Loss 5.7367 LearningRate 0.0217 Epoch: 10 Global Step: 178150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:09,563-Speed 9374.06 samples/sec Loss 5.6530 LearningRate 0.0217 Epoch: 10 Global Step: 178160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:10,650-Speed 9421.68 samples/sec Loss 5.5318 LearningRate 0.0217 Epoch: 10 Global Step: 178170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:11,728-Speed 9506.15 samples/sec Loss 5.5698 LearningRate 0.0217 Epoch: 10 Global Step: 178180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:12,783-Speed 9717.69 samples/sec Loss 5.6518 LearningRate 0.0217 Epoch: 10 Global Step: 178190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:13,870-Speed 9422.88 samples/sec Loss 5.6929 LearningRate 0.0217 Epoch: 10 Global Step: 178200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:14,949-Speed 9497.76 samples/sec Loss 5.5490 LearningRate 0.0217 Epoch: 10 Global Step: 178210 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:44:16,035-Speed 9434.28 samples/sec Loss 5.5929 LearningRate 0.0217 Epoch: 10 Global Step: 178220 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:44:17,102-Speed 9601.58 samples/sec Loss 5.5842 LearningRate 0.0217 Epoch: 10 Global Step: 178230 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:44:18,232-Speed 9070.91 samples/sec Loss 5.6232 LearningRate 0.0217 Epoch: 10 Global Step: 178240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:19,295-Speed 9638.00 samples/sec Loss 5.6194 LearningRate 0.0217 Epoch: 10 Global Step: 178250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:20,346-Speed 9747.09 samples/sec Loss 5.6783 LearningRate 0.0217 Epoch: 10 Global Step: 178260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:21,387-Speed 9843.36 samples/sec Loss 5.6611 LearningRate 0.0217 Epoch: 10 Global Step: 178270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:22,475-Speed 9418.54 samples/sec Loss 5.5915 LearningRate 0.0217 Epoch: 10 Global Step: 178280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:23,632-Speed 8851.05 samples/sec Loss 5.7012 LearningRate 0.0217 Epoch: 10 Global Step: 178290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:24,740-Speed 9245.71 samples/sec Loss 5.7186 LearningRate 0.0217 Epoch: 10 Global Step: 178300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:25,813-Speed 9555.31 samples/sec Loss 5.6158 LearningRate 0.0217 Epoch: 10 Global Step: 178310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:26,900-Speed 9421.22 samples/sec Loss 5.6789 LearningRate 0.0217 Epoch: 10 Global Step: 178320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:27,999-Speed 9326.12 samples/sec Loss 5.6180 LearningRate 0.0217 Epoch: 10 Global Step: 178330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:29,082-Speed 9461.76 samples/sec Loss 5.5843 LearningRate 0.0217 Epoch: 10 Global Step: 178340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:30,165-Speed 9461.95 samples/sec Loss 5.7125 LearningRate 0.0217 Epoch: 10 Global Step: 178350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:31,252-Speed 9423.93 samples/sec Loss 5.6895 LearningRate 0.0217 Epoch: 10 Global Step: 178360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:32,325-Speed 9550.77 samples/sec Loss 5.5757 LearningRate 0.0217 Epoch: 10 Global Step: 178370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:33,442-Speed 9174.95 samples/sec Loss 5.6273 LearningRate 0.0217 Epoch: 10 Global Step: 178380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:34,546-Speed 9273.82 samples/sec Loss 5.6101 LearningRate 0.0217 Epoch: 10 Global Step: 178390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:35,578-Speed 9935.07 samples/sec Loss 5.6221 LearningRate 0.0217 Epoch: 10 Global Step: 178400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:36,654-Speed 9526.40 samples/sec Loss 5.7227 LearningRate 0.0217 Epoch: 10 Global Step: 178410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:37,743-Speed 9407.17 samples/sec Loss 5.6924 LearningRate 0.0217 Epoch: 10 Global Step: 178420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:38,821-Speed 9503.95 samples/sec Loss 5.7494 LearningRate 0.0217 Epoch: 10 Global Step: 178430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:39,859-Speed 9869.31 samples/sec Loss 5.5943 LearningRate 0.0217 Epoch: 10 Global Step: 178440 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:44:40,909-Speed 9762.83 samples/sec Loss 5.6337 LearningRate 0.0217 Epoch: 10 Global Step: 178450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:42,011-Speed 9295.71 samples/sec Loss 5.6676 LearningRate 0.0217 Epoch: 10 Global Step: 178460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:43,124-Speed 9207.12 samples/sec Loss 5.5937 LearningRate 0.0217 Epoch: 10 Global Step: 178470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:44,242-Speed 9157.77 samples/sec Loss 5.6157 LearningRate 0.0217 Epoch: 10 Global Step: 178480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:45,324-Speed 9474.20 samples/sec Loss 5.5897 LearningRate 0.0217 Epoch: 10 Global Step: 178490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:46,373-Speed 9767.31 samples/sec Loss 5.6945 LearningRate 0.0216 Epoch: 10 Global Step: 178500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:47,441-Speed 9599.81 samples/sec Loss 5.6506 LearningRate 0.0216 Epoch: 10 Global Step: 178510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:48,529-Speed 9413.53 samples/sec Loss 5.6019 LearningRate 0.0216 Epoch: 10 Global Step: 178520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:49,602-Speed 9550.48 samples/sec Loss 5.6676 LearningRate 0.0216 Epoch: 10 Global Step: 178530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:50,678-Speed 9523.65 samples/sec Loss 5.6004 LearningRate 0.0216 Epoch: 10 Global Step: 178540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:51,759-Speed 9471.02 samples/sec Loss 5.6723 LearningRate 0.0216 Epoch: 10 Global Step: 178550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:44:52,848-Speed 9406.54 samples/sec Loss 5.6847 LearningRate 0.0216 Epoch: 10 Global Step: 178560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:44:53,927-Speed 9499.61 samples/sec Loss 5.6864 LearningRate 0.0216 Epoch: 10 Global Step: 178570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:44:54,998-Speed 9567.18 samples/sec Loss 5.5744 LearningRate 0.0216 Epoch: 10 Global Step: 178580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:44:56,096-Speed 9332.88 samples/sec Loss 5.5241 LearningRate 0.0216 Epoch: 10 Global Step: 178590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:44:57,180-Speed 9460.52 samples/sec Loss 5.6090 LearningRate 0.0216 Epoch: 10 Global Step: 178600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:44:58,271-Speed 9392.55 samples/sec Loss 5.6632 LearningRate 0.0216 Epoch: 10 Global Step: 178610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:44:59,324-Speed 9723.99 samples/sec Loss 5.6681 LearningRate 0.0216 Epoch: 10 Global Step: 178620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:00,409-Speed 9441.51 samples/sec Loss 5.6815 LearningRate 0.0216 Epoch: 10 Global Step: 178630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:01,472-Speed 9638.16 samples/sec Loss 5.7832 LearningRate 0.0216 Epoch: 10 Global Step: 178640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:02,546-Speed 9540.10 samples/sec Loss 5.6706 LearningRate 0.0216 Epoch: 10 Global Step: 178650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:03,671-Speed 9109.05 samples/sec Loss 5.5593 LearningRate 0.0216 Epoch: 10 Global Step: 178660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:04,703-Speed 9930.53 samples/sec Loss 5.6089 LearningRate 0.0216 Epoch: 10 Global Step: 178670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:05,783-Speed 9488.61 samples/sec Loss 5.5803 LearningRate 0.0216 Epoch: 10 Global Step: 178680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:06,866-Speed 9461.13 samples/sec Loss 5.4717 LearningRate 0.0216 Epoch: 10 Global Step: 178690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:07,935-Speed 9579.12 samples/sec Loss 5.6709 LearningRate 0.0216 Epoch: 10 Global Step: 178700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:08,992-Speed 9696.08 samples/sec Loss 5.5352 LearningRate 0.0216 Epoch: 10 Global Step: 178710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:10,142-Speed 8909.44 samples/sec Loss 5.6683 LearningRate 0.0216 Epoch: 10 Global Step: 178720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:11,226-Speed 9455.51 samples/sec Loss 5.5864 LearningRate 0.0216 Epoch: 10 Global Step: 178730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:12,377-Speed 8895.03 samples/sec Loss 5.6356 LearningRate 0.0216 Epoch: 10 Global Step: 178740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:13,463-Speed 9439.95 samples/sec Loss 5.6711 LearningRate 0.0216 Epoch: 10 Global Step: 178750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:14,550-Speed 9429.28 samples/sec Loss 5.6479 LearningRate 0.0216 Epoch: 10 Global Step: 178760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:15,612-Speed 9647.09 samples/sec Loss 5.6402 LearningRate 0.0216 Epoch: 10 Global Step: 178770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:16,706-Speed 9370.23 samples/sec Loss 5.6833 LearningRate 0.0216 Epoch: 10 Global Step: 178780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:17,834-Speed 9080.46 samples/sec Loss 5.6505 LearningRate 0.0216 Epoch: 10 Global Step: 178790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:18,946-Speed 9217.80 samples/sec Loss 5.5769 LearningRate 0.0216 Epoch: 10 Global Step: 178800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:20,042-Speed 9344.35 samples/sec Loss 5.5615 LearningRate 0.0216 Epoch: 10 Global Step: 178810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:21,135-Speed 9378.68 samples/sec Loss 5.6850 LearningRate 0.0216 Epoch: 10 Global Step: 178820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:22,245-Speed 9223.33 samples/sec Loss 5.6038 LearningRate 0.0216 Epoch: 10 Global Step: 178830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:23,347-Speed 9298.79 samples/sec Loss 5.6985 LearningRate 0.0216 Epoch: 10 Global Step: 178840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:24,406-Speed 9673.72 samples/sec Loss 5.6847 LearningRate 0.0216 Epoch: 10 Global Step: 178850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:25,510-Speed 9282.85 samples/sec Loss 5.6602 LearningRate 0.0215 Epoch: 10 Global Step: 178860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:26,546-Speed 9891.55 samples/sec Loss 5.6882 LearningRate 0.0215 Epoch: 10 Global Step: 178870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:27,646-Speed 9320.70 samples/sec Loss 5.6556 LearningRate 0.0215 Epoch: 10 Global Step: 178880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:28,736-Speed 9396.76 samples/sec Loss 5.7584 LearningRate 0.0215 Epoch: 10 Global Step: 178890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:29,819-Speed 9464.36 samples/sec Loss 5.6402 LearningRate 0.0215 Epoch: 10 Global Step: 178900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:30,930-Speed 9222.06 samples/sec Loss 5.5846 LearningRate 0.0215 Epoch: 10 Global Step: 178910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:32,009-Speed 9498.18 samples/sec Loss 5.5628 LearningRate 0.0215 Epoch: 10 Global Step: 178920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:33,096-Speed 9432.93 samples/sec Loss 5.7236 LearningRate 0.0215 Epoch: 10 Global Step: 178930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:34,198-Speed 9297.90 samples/sec Loss 5.6579 LearningRate 0.0215 Epoch: 10 Global Step: 178940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:35,305-Speed 9250.09 samples/sec Loss 5.7603 LearningRate 0.0215 Epoch: 10 Global Step: 178950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:36,430-Speed 9111.10 samples/sec Loss 5.7157 LearningRate 0.0215 Epoch: 10 Global Step: 178960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:37,548-Speed 9163.82 samples/sec Loss 5.6961 LearningRate 0.0215 Epoch: 10 Global Step: 178970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:45:38,668-Speed 9142.33 samples/sec Loss 5.4796 LearningRate 0.0215 Epoch: 10 Global Step: 178980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:39,719-Speed 9749.53 samples/sec Loss 5.5922 LearningRate 0.0215 Epoch: 10 Global Step: 178990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:40,794-Speed 9532.76 samples/sec Loss 5.6456 LearningRate 0.0215 Epoch: 10 Global Step: 179000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:41,940-Speed 8937.53 samples/sec Loss 5.6611 LearningRate 0.0215 Epoch: 10 Global Step: 179010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:43,028-Speed 9423.30 samples/sec Loss 5.6361 LearningRate 0.0215 Epoch: 10 Global Step: 179020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:44,121-Speed 9369.83 samples/sec Loss 5.6326 LearningRate 0.0215 Epoch: 10 Global Step: 179030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:45,196-Speed 9535.79 samples/sec Loss 5.6512 LearningRate 0.0215 Epoch: 10 Global Step: 179040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:46,271-Speed 9528.58 samples/sec Loss 5.5765 LearningRate 0.0215 Epoch: 10 Global Step: 179050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:47,385-Speed 9194.54 samples/sec Loss 5.6186 LearningRate 0.0215 Epoch: 10 Global Step: 179060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:48,480-Speed 9360.79 samples/sec Loss 5.5535 LearningRate 0.0215 Epoch: 10 Global Step: 179070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:49,619-Speed 8991.72 samples/sec Loss 5.6771 LearningRate 0.0215 Epoch: 10 Global Step: 179080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:50,728-Speed 9246.45 samples/sec Loss 5.6427 LearningRate 0.0215 Epoch: 10 Global Step: 179090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:51,847-Speed 9152.12 samples/sec Loss 5.5962 LearningRate 0.0215 Epoch: 10 Global Step: 179100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:52,982-Speed 9029.88 samples/sec Loss 5.6463 LearningRate 0.0215 Epoch: 10 Global Step: 179110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:45:54,074-Speed 9382.64 samples/sec Loss 5.6365 LearningRate 0.0215 Epoch: 10 Global Step: 179120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:55,171-Speed 9337.04 samples/sec Loss 5.5761 LearningRate 0.0215 Epoch: 10 Global Step: 179130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:56,280-Speed 9242.95 samples/sec Loss 5.7159 LearningRate 0.0215 Epoch: 10 Global Step: 179140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:57,380-Speed 9320.16 samples/sec Loss 5.6870 LearningRate 0.0215 Epoch: 10 Global Step: 179150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:58,486-Speed 9255.84 samples/sec Loss 5.7652 LearningRate 0.0215 Epoch: 10 Global Step: 179160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:45:59,579-Speed 9373.65 samples/sec Loss 5.6321 LearningRate 0.0215 Epoch: 10 Global Step: 179170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:00,656-Speed 9516.81 samples/sec Loss 5.6479 LearningRate 0.0215 Epoch: 10 Global Step: 179180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:01,704-Speed 9774.47 samples/sec Loss 5.5936 LearningRate 0.0215 Epoch: 10 Global Step: 179190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:02,802-Speed 9336.53 samples/sec Loss 5.6767 LearningRate 0.0215 Epoch: 10 Global Step: 179200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:03,884-Speed 9472.36 samples/sec Loss 5.6970 LearningRate 0.0215 Epoch: 10 Global Step: 179210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:04,935-Speed 9747.08 samples/sec Loss 5.6843 LearningRate 0.0214 Epoch: 10 Global Step: 179220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:06,021-Speed 9438.94 samples/sec Loss 5.6077 LearningRate 0.0214 Epoch: 10 Global Step: 179230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:07,091-Speed 9572.20 samples/sec Loss 5.6720 LearningRate 0.0214 Epoch: 10 Global Step: 179240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:08,149-Speed 9681.01 samples/sec Loss 5.5739 LearningRate 0.0214 Epoch: 10 Global Step: 179250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:09,196-Speed 9790.21 samples/sec Loss 5.6943 LearningRate 0.0214 Epoch: 10 Global Step: 179260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:10,237-Speed 9847.17 samples/sec Loss 5.4959 LearningRate 0.0214 Epoch: 10 Global Step: 179270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:11,306-Speed 9581.64 samples/sec Loss 5.5517 LearningRate 0.0214 Epoch: 10 Global Step: 179280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:12,393-Speed 9425.30 samples/sec Loss 5.5701 LearningRate 0.0214 Epoch: 10 Global Step: 179290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:13,446-Speed 9731.88 samples/sec Loss 5.5839 LearningRate 0.0214 Epoch: 10 Global Step: 179300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:14,530-Speed 9454.37 samples/sec Loss 5.5755 LearningRate 0.0214 Epoch: 10 Global Step: 179310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:15,622-Speed 9375.78 samples/sec Loss 5.5552 LearningRate 0.0214 Epoch: 10 Global Step: 179320 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:46:16,723-Speed 9317.52 samples/sec Loss 5.6352 LearningRate 0.0214 Epoch: 10 Global Step: 179330 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:46:17,783-Speed 9663.97 samples/sec Loss 5.6484 LearningRate 0.0214 Epoch: 10 Global Step: 179340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:18,864-Speed 9480.75 samples/sec Loss 5.7064 LearningRate 0.0214 Epoch: 10 Global Step: 179350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:19,946-Speed 9464.45 samples/sec Loss 5.5829 LearningRate 0.0214 Epoch: 10 Global Step: 179360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:21,053-Speed 9257.60 samples/sec Loss 5.6370 LearningRate 0.0214 Epoch: 10 Global Step: 179370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:22,106-Speed 9729.78 samples/sec Loss 5.7036 LearningRate 0.0214 Epoch: 10 Global Step: 179380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:23,176-Speed 9576.07 samples/sec Loss 5.7670 LearningRate 0.0214 Epoch: 10 Global Step: 179390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:24,285-Speed 9239.60 samples/sec Loss 5.6715 LearningRate 0.0214 Epoch: 10 Global Step: 179400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:25,363-Speed 9502.23 samples/sec Loss 5.6789 LearningRate 0.0214 Epoch: 10 Global Step: 179410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:26,466-Speed 9289.44 samples/sec Loss 5.7175 LearningRate 0.0214 Epoch: 10 Global Step: 179420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:27,583-Speed 9176.09 samples/sec Loss 5.7011 LearningRate 0.0214 Epoch: 10 Global Step: 179430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:28,654-Speed 9565.83 samples/sec Loss 5.6702 LearningRate 0.0214 Epoch: 10 Global Step: 179440 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:46:29,753-Speed 9329.33 samples/sec Loss 5.5791 LearningRate 0.0214 Epoch: 10 Global Step: 179450 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:46:30,795-Speed 9826.33 samples/sec Loss 5.6335 LearningRate 0.0214 Epoch: 10 Global Step: 179460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:31,873-Speed 9510.76 samples/sec Loss 5.6177 LearningRate 0.0214 Epoch: 10 Global Step: 179470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:32,970-Speed 9334.17 samples/sec Loss 5.5655 LearningRate 0.0214 Epoch: 10 Global Step: 179480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:34,061-Speed 9397.97 samples/sec Loss 5.6933 LearningRate 0.0214 Epoch: 10 Global Step: 179490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:35,136-Speed 9532.94 samples/sec Loss 5.5463 LearningRate 0.0214 Epoch: 10 Global Step: 179500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:36,227-Speed 9385.30 samples/sec Loss 5.5650 LearningRate 0.0214 Epoch: 10 Global Step: 179510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:37,325-Speed 9338.08 samples/sec Loss 5.6519 LearningRate 0.0214 Epoch: 10 Global Step: 179520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:38,397-Speed 9555.09 samples/sec Loss 5.7354 LearningRate 0.0214 Epoch: 10 Global Step: 179530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:39,464-Speed 9596.62 samples/sec Loss 5.5716 LearningRate 0.0214 Epoch: 10 Global Step: 179540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:40,555-Speed 9393.45 samples/sec Loss 5.6040 LearningRate 0.0214 Epoch: 10 Global Step: 179550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:41,649-Speed 9363.76 samples/sec Loss 5.4922 LearningRate 0.0214 Epoch: 10 Global Step: 179560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:46:42,719-Speed 9581.10 samples/sec Loss 5.6029 LearningRate 0.0214 Epoch: 10 Global Step: 179570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:43,788-Speed 9584.15 samples/sec Loss 5.6137 LearningRate 0.0213 Epoch: 10 Global Step: 179580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:44,872-Speed 9451.38 samples/sec Loss 5.6099 LearningRate 0.0213 Epoch: 10 Global Step: 179590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:45,940-Speed 9587.16 samples/sec Loss 5.5929 LearningRate 0.0213 Epoch: 10 Global Step: 179600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:47,029-Speed 9413.74 samples/sec Loss 5.5888 LearningRate 0.0213 Epoch: 10 Global Step: 179610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:48,080-Speed 9759.25 samples/sec Loss 5.6242 LearningRate 0.0213 Epoch: 10 Global Step: 179620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:49,140-Speed 9658.03 samples/sec Loss 5.6957 LearningRate 0.0213 Epoch: 10 Global Step: 179630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:50,241-Speed 9311.54 samples/sec Loss 5.6218 LearningRate 0.0213 Epoch: 10 Global Step: 179640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:51,294-Speed 9731.10 samples/sec Loss 5.6811 LearningRate 0.0213 Epoch: 10 Global Step: 179650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:52,401-Speed 9254.96 samples/sec Loss 5.6362 LearningRate 0.0213 Epoch: 10 Global Step: 179660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:53,497-Speed 9343.68 samples/sec Loss 5.6804 LearningRate 0.0213 Epoch: 10 Global Step: 179670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:54,593-Speed 9351.07 samples/sec Loss 5.6567 LearningRate 0.0213 Epoch: 10 Global Step: 179680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:55,714-Speed 9141.77 samples/sec Loss 5.6594 LearningRate 0.0213 Epoch: 10 Global Step: 179690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:56,776-Speed 9651.64 samples/sec Loss 5.6444 LearningRate 0.0213 Epoch: 10 Global Step: 179700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:57,867-Speed 9387.52 samples/sec Loss 5.5692 LearningRate 0.0213 Epoch: 10 Global Step: 179710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:58,939-Speed 9561.99 samples/sec Loss 5.7212 LearningRate 0.0213 Epoch: 10 Global Step: 179720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:46:59,983-Speed 9809.23 samples/sec Loss 5.7158 LearningRate 0.0213 Epoch: 10 Global Step: 179730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:01,067-Speed 9459.52 samples/sec Loss 5.6814 LearningRate 0.0213 Epoch: 10 Global Step: 179740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:02,130-Speed 9636.68 samples/sec Loss 5.5769 LearningRate 0.0213 Epoch: 10 Global Step: 179750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:03,190-Speed 9663.88 samples/sec Loss 5.5907 LearningRate 0.0213 Epoch: 10 Global Step: 179760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:04,248-Speed 9683.79 samples/sec Loss 5.6624 LearningRate 0.0213 Epoch: 10 Global Step: 179770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:05,345-Speed 9343.88 samples/sec Loss 5.7115 LearningRate 0.0213 Epoch: 10 Global Step: 179780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:06,425-Speed 9489.02 samples/sec Loss 5.6487 LearningRate 0.0213 Epoch: 10 Global Step: 179790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:07,496-Speed 9565.57 samples/sec Loss 5.6620 LearningRate 0.0213 Epoch: 10 Global Step: 179800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:08,601-Speed 9272.14 samples/sec Loss 5.7764 LearningRate 0.0213 Epoch: 10 Global Step: 179810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:09,661-Speed 9669.61 samples/sec Loss 5.6091 LearningRate 0.0213 Epoch: 10 Global Step: 179820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:10,740-Speed 9497.44 samples/sec Loss 5.5935 LearningRate 0.0213 Epoch: 10 Global Step: 179830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:11,791-Speed 9746.51 samples/sec Loss 5.6686 LearningRate 0.0213 Epoch: 10 Global Step: 179840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:12,927-Speed 9015.19 samples/sec Loss 5.6782 LearningRate 0.0213 Epoch: 10 Global Step: 179850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:14,027-Speed 9311.94 samples/sec Loss 5.5745 LearningRate 0.0213 Epoch: 10 Global Step: 179860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:15,104-Speed 9529.10 samples/sec Loss 5.6497 LearningRate 0.0213 Epoch: 10 Global Step: 179870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:16,152-Speed 9782.02 samples/sec Loss 5.7403 LearningRate 0.0213 Epoch: 10 Global Step: 179880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:17,234-Speed 9469.37 samples/sec Loss 5.5906 LearningRate 0.0213 Epoch: 10 Global Step: 179890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:18,347-Speed 9204.84 samples/sec Loss 5.4919 LearningRate 0.0213 Epoch: 10 Global Step: 179900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:19,493-Speed 8942.67 samples/sec Loss 5.6631 LearningRate 0.0213 Epoch: 10 Global Step: 179910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:20,570-Speed 9514.11 samples/sec Loss 5.6664 LearningRate 0.0213 Epoch: 10 Global Step: 179920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:21,649-Speed 9493.65 samples/sec Loss 5.5232 LearningRate 0.0213 Epoch: 10 Global Step: 179930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:22,713-Speed 9632.40 samples/sec Loss 5.6240 LearningRate 0.0212 Epoch: 10 Global Step: 179940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:23,790-Speed 9506.41 samples/sec Loss 5.7183 LearningRate 0.0212 Epoch: 10 Global Step: 179950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:24,900-Speed 9233.45 samples/sec Loss 5.7076 LearningRate 0.0212 Epoch: 10 Global Step: 179960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:26,000-Speed 9319.26 samples/sec Loss 5.5832 LearningRate 0.0212 Epoch: 10 Global Step: 179970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:47:27,074-Speed 9542.91 samples/sec Loss 5.6386 LearningRate 0.0212 Epoch: 10 Global Step: 179980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:28,177-Speed 9293.62 samples/sec Loss 5.6799 LearningRate 0.0212 Epoch: 10 Global Step: 179990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:29,264-Speed 9418.83 samples/sec Loss 5.6542 LearningRate 0.0212 Epoch: 10 Global Step: 180000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:47:51,352-[lfw][180000]XNorm: 9.247617 Training: 2022-04-11 18:47:51,353-[lfw][180000]Accuracy-Flip: 0.99650+-0.00361 Training: 2022-04-11 18:47:51,354-[lfw][180000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:48:16,872-[cfp_fp][180000]XNorm: 7.956881 Training: 2022-04-11 18:48:16,873-[cfp_fp][180000]Accuracy-Flip: 0.96357+-0.00924 Training: 2022-04-11 18:48:16,873-[cfp_fp][180000]Accuracy-Highest: 0.96586 Training: 2022-04-11 18:48:38,932-[agedb_30][180000]XNorm: 8.940597 Training: 2022-04-11 18:48:38,932-[agedb_30][180000]Accuracy-Flip: 0.96733+-0.00937 Training: 2022-04-11 18:48:38,933-[agedb_30][180000]Accuracy-Highest: 0.96917 Training: 2022-04-11 18:48:40,014-Speed 144.74 samples/sec Loss 5.6204 LearningRate 0.0212 Epoch: 10 Global Step: 180010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:41,104-Speed 9400.70 samples/sec Loss 5.6844 LearningRate 0.0212 Epoch: 10 Global Step: 180020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:42,201-Speed 9338.76 samples/sec Loss 5.5781 LearningRate 0.0212 Epoch: 10 Global Step: 180030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:43,305-Speed 9282.81 samples/sec Loss 5.6363 LearningRate 0.0212 Epoch: 10 Global Step: 180040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:44,390-Speed 9437.60 samples/sec Loss 5.7741 LearningRate 0.0212 Epoch: 10 Global Step: 180050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:45,445-Speed 9719.64 samples/sec Loss 5.6623 LearningRate 0.0212 Epoch: 10 Global Step: 180060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:46,533-Speed 9412.10 samples/sec Loss 5.6776 LearningRate 0.0212 Epoch: 10 Global Step: 180070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:47,603-Speed 9577.57 samples/sec Loss 5.7317 LearningRate 0.0212 Epoch: 10 Global Step: 180080 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:48:48,677-Speed 9538.71 samples/sec Loss 5.5854 LearningRate 0.0212 Epoch: 10 Global Step: 180090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:49,793-Speed 9180.56 samples/sec Loss 5.6296 LearningRate 0.0212 Epoch: 10 Global Step: 180100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:50,890-Speed 9340.97 samples/sec Loss 5.5866 LearningRate 0.0212 Epoch: 10 Global Step: 180110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:51,949-Speed 9680.03 samples/sec Loss 5.6243 LearningRate 0.0212 Epoch: 10 Global Step: 180120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:53,025-Speed 9518.89 samples/sec Loss 5.5315 LearningRate 0.0212 Epoch: 10 Global Step: 180130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:54,191-Speed 8788.27 samples/sec Loss 5.6843 LearningRate 0.0212 Epoch: 10 Global Step: 180140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:55,310-Speed 9158.48 samples/sec Loss 5.6038 LearningRate 0.0212 Epoch: 10 Global Step: 180150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:56,412-Speed 9299.31 samples/sec Loss 5.5562 LearningRate 0.0212 Epoch: 10 Global Step: 180160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:57,508-Speed 9343.14 samples/sec Loss 5.6679 LearningRate 0.0212 Epoch: 10 Global Step: 180170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:58,602-Speed 9369.35 samples/sec Loss 5.6423 LearningRate 0.0212 Epoch: 10 Global Step: 180180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:48:59,680-Speed 9507.33 samples/sec Loss 5.6314 LearningRate 0.0212 Epoch: 10 Global Step: 180190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:00,773-Speed 9372.51 samples/sec Loss 5.6666 LearningRate 0.0212 Epoch: 10 Global Step: 180200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:01,845-Speed 9562.51 samples/sec Loss 5.5473 LearningRate 0.0212 Epoch: 10 Global Step: 180210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:02,909-Speed 9624.16 samples/sec Loss 5.5719 LearningRate 0.0212 Epoch: 10 Global Step: 180220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:04,004-Speed 9355.85 samples/sec Loss 5.6256 LearningRate 0.0212 Epoch: 10 Global Step: 180230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:05,089-Speed 9440.20 samples/sec Loss 5.7098 LearningRate 0.0212 Epoch: 10 Global Step: 180240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:06,150-Speed 9659.50 samples/sec Loss 5.6585 LearningRate 0.0212 Epoch: 10 Global Step: 180250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:07,231-Speed 9477.58 samples/sec Loss 5.5324 LearningRate 0.0212 Epoch: 10 Global Step: 180260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:08,309-Speed 9506.54 samples/sec Loss 5.5343 LearningRate 0.0212 Epoch: 10 Global Step: 180270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:09,404-Speed 9353.30 samples/sec Loss 5.7549 LearningRate 0.0212 Epoch: 10 Global Step: 180280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:10,453-Speed 9772.81 samples/sec Loss 5.6131 LearningRate 0.0212 Epoch: 10 Global Step: 180290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:11,524-Speed 9573.24 samples/sec Loss 5.5757 LearningRate 0.0211 Epoch: 10 Global Step: 180300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:12,638-Speed 9197.11 samples/sec Loss 5.6421 LearningRate 0.0211 Epoch: 10 Global Step: 180310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:13,758-Speed 9143.12 samples/sec Loss 5.6852 LearningRate 0.0211 Epoch: 10 Global Step: 180320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:14,835-Speed 9512.69 samples/sec Loss 5.5999 LearningRate 0.0211 Epoch: 10 Global Step: 180330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:15,950-Speed 9192.03 samples/sec Loss 5.5765 LearningRate 0.0211 Epoch: 10 Global Step: 180340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:17,069-Speed 9159.19 samples/sec Loss 5.5357 LearningRate 0.0211 Epoch: 10 Global Step: 180350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:18,125-Speed 9701.78 samples/sec Loss 5.6061 LearningRate 0.0211 Epoch: 10 Global Step: 180360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:19,205-Speed 9487.91 samples/sec Loss 5.6122 LearningRate 0.0211 Epoch: 10 Global Step: 180370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:20,320-Speed 9190.33 samples/sec Loss 5.6663 LearningRate 0.0211 Epoch: 10 Global Step: 180380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:21,382-Speed 9647.97 samples/sec Loss 5.5569 LearningRate 0.0211 Epoch: 10 Global Step: 180390 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:49:22,481-Speed 9324.54 samples/sec Loss 5.6064 LearningRate 0.0211 Epoch: 10 Global Step: 180400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:23,584-Speed 9285.89 samples/sec Loss 5.6214 LearningRate 0.0211 Epoch: 10 Global Step: 180410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:24,682-Speed 9335.17 samples/sec Loss 5.7100 LearningRate 0.0211 Epoch: 10 Global Step: 180420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:25,752-Speed 9571.34 samples/sec Loss 5.5579 LearningRate 0.0211 Epoch: 10 Global Step: 180430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:26,852-Speed 9316.73 samples/sec Loss 5.5754 LearningRate 0.0211 Epoch: 10 Global Step: 180440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:27,924-Speed 9565.43 samples/sec Loss 5.5594 LearningRate 0.0211 Epoch: 10 Global Step: 180450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:29,007-Speed 9452.44 samples/sec Loss 5.6418 LearningRate 0.0211 Epoch: 10 Global Step: 180460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:30,150-Speed 8970.46 samples/sec Loss 5.6015 LearningRate 0.0211 Epoch: 10 Global Step: 180470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:31,232-Speed 9471.43 samples/sec Loss 5.6993 LearningRate 0.0211 Epoch: 10 Global Step: 180480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:32,330-Speed 9327.59 samples/sec Loss 5.6124 LearningRate 0.0211 Epoch: 10 Global Step: 180490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:33,430-Speed 9317.83 samples/sec Loss 5.6498 LearningRate 0.0211 Epoch: 10 Global Step: 180500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:49:34,543-Speed 9200.88 samples/sec Loss 5.6828 LearningRate 0.0211 Epoch: 10 Global Step: 180510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:35,631-Speed 9421.94 samples/sec Loss 5.7167 LearningRate 0.0211 Epoch: 10 Global Step: 180520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:36,750-Speed 9158.14 samples/sec Loss 5.5211 LearningRate 0.0211 Epoch: 10 Global Step: 180530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:37,814-Speed 9626.28 samples/sec Loss 5.6241 LearningRate 0.0211 Epoch: 10 Global Step: 180540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:38,940-Speed 9106.23 samples/sec Loss 5.5788 LearningRate 0.0211 Epoch: 10 Global Step: 180550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:40,034-Speed 9361.12 samples/sec Loss 5.6768 LearningRate 0.0211 Epoch: 10 Global Step: 180560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:41,131-Speed 9346.73 samples/sec Loss 5.7130 LearningRate 0.0211 Epoch: 10 Global Step: 180570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:42,214-Speed 9459.87 samples/sec Loss 5.7697 LearningRate 0.0211 Epoch: 10 Global Step: 180580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:43,320-Speed 9262.57 samples/sec Loss 5.7095 LearningRate 0.0211 Epoch: 10 Global Step: 180590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:44,442-Speed 9129.84 samples/sec Loss 5.5048 LearningRate 0.0211 Epoch: 10 Global Step: 180600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:45,530-Speed 9416.33 samples/sec Loss 5.5987 LearningRate 0.0211 Epoch: 10 Global Step: 180610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:46,637-Speed 9260.12 samples/sec Loss 5.6356 LearningRate 0.0211 Epoch: 10 Global Step: 180620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:47,715-Speed 9505.22 samples/sec Loss 5.6238 LearningRate 0.0211 Epoch: 10 Global Step: 180630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:48,800-Speed 9441.52 samples/sec Loss 5.6660 LearningRate 0.0211 Epoch: 10 Global Step: 180640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:49,929-Speed 9075.81 samples/sec Loss 5.6314 LearningRate 0.0211 Epoch: 10 Global Step: 180650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:50,997-Speed 9589.63 samples/sec Loss 5.6908 LearningRate 0.0211 Epoch: 10 Global Step: 180660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:52,059-Speed 9653.56 samples/sec Loss 5.6982 LearningRate 0.0210 Epoch: 10 Global Step: 180670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:53,182-Speed 9122.84 samples/sec Loss 5.5809 LearningRate 0.0210 Epoch: 10 Global Step: 180680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:54,317-Speed 9029.56 samples/sec Loss 5.6048 LearningRate 0.0210 Epoch: 10 Global Step: 180690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:55,412-Speed 9352.71 samples/sec Loss 5.5708 LearningRate 0.0210 Epoch: 10 Global Step: 180700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:56,495-Speed 9460.95 samples/sec Loss 5.6906 LearningRate 0.0210 Epoch: 10 Global Step: 180710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:57,585-Speed 9407.69 samples/sec Loss 5.6313 LearningRate 0.0210 Epoch: 10 Global Step: 180720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:58,795-Speed 8470.09 samples/sec Loss 5.7131 LearningRate 0.0210 Epoch: 10 Global Step: 180730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:49:59,904-Speed 9230.48 samples/sec Loss 5.6350 LearningRate 0.0210 Epoch: 10 Global Step: 180740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:00,990-Speed 9437.47 samples/sec Loss 5.6016 LearningRate 0.0210 Epoch: 10 Global Step: 180750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:02,099-Speed 9240.06 samples/sec Loss 5.5453 LearningRate 0.0210 Epoch: 10 Global Step: 180760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:03,208-Speed 9241.65 samples/sec Loss 5.6001 LearningRate 0.0210 Epoch: 10 Global Step: 180770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:04,309-Speed 9305.23 samples/sec Loss 5.6071 LearningRate 0.0210 Epoch: 10 Global Step: 180780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:05,397-Speed 9409.55 samples/sec Loss 5.6707 LearningRate 0.0210 Epoch: 10 Global Step: 180790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:06,517-Speed 9152.30 samples/sec Loss 5.5990 LearningRate 0.0210 Epoch: 10 Global Step: 180800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:07,609-Speed 9378.98 samples/sec Loss 5.6595 LearningRate 0.0210 Epoch: 10 Global Step: 180810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:08,699-Speed 9405.36 samples/sec Loss 5.5782 LearningRate 0.0210 Epoch: 10 Global Step: 180820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:09,771-Speed 9551.07 samples/sec Loss 5.5398 LearningRate 0.0210 Epoch: 10 Global Step: 180830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:10,844-Speed 9556.42 samples/sec Loss 5.6444 LearningRate 0.0210 Epoch: 10 Global Step: 180840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:11,914-Speed 9569.13 samples/sec Loss 5.5036 LearningRate 0.0210 Epoch: 10 Global Step: 180850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:13,036-Speed 9135.51 samples/sec Loss 5.5650 LearningRate 0.0210 Epoch: 10 Global Step: 180860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:14,128-Speed 9380.57 samples/sec Loss 5.5918 LearningRate 0.0210 Epoch: 10 Global Step: 180870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:15,174-Speed 9803.85 samples/sec Loss 5.6872 LearningRate 0.0210 Epoch: 10 Global Step: 180880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:16,220-Speed 9795.18 samples/sec Loss 5.6435 LearningRate 0.0210 Epoch: 10 Global Step: 180890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:17,305-Speed 9445.55 samples/sec Loss 5.5992 LearningRate 0.0210 Epoch: 10 Global Step: 180900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:18,350-Speed 9797.36 samples/sec Loss 5.5788 LearningRate 0.0210 Epoch: 10 Global Step: 180910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:19,441-Speed 9394.12 samples/sec Loss 5.7157 LearningRate 0.0210 Epoch: 10 Global Step: 180920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:20,558-Speed 9176.61 samples/sec Loss 5.6408 LearningRate 0.0210 Epoch: 10 Global Step: 180930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:21,654-Speed 9349.34 samples/sec Loss 5.6264 LearningRate 0.0210 Epoch: 10 Global Step: 180940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:22,704-Speed 9754.57 samples/sec Loss 5.6743 LearningRate 0.0210 Epoch: 10 Global Step: 180950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:23,783-Speed 9494.23 samples/sec Loss 5.7442 LearningRate 0.0210 Epoch: 10 Global Step: 180960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:24,838-Speed 9719.79 samples/sec Loss 5.5779 LearningRate 0.0210 Epoch: 10 Global Step: 180970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:25,957-Speed 9150.21 samples/sec Loss 5.7216 LearningRate 0.0210 Epoch: 10 Global Step: 180980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:27,043-Speed 9435.83 samples/sec Loss 5.5420 LearningRate 0.0210 Epoch: 10 Global Step: 180990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:28,136-Speed 9381.07 samples/sec Loss 5.6821 LearningRate 0.0210 Epoch: 10 Global Step: 181000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:29,222-Speed 9429.99 samples/sec Loss 5.5576 LearningRate 0.0210 Epoch: 10 Global Step: 181010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:30,311-Speed 9413.28 samples/sec Loss 5.6353 LearningRate 0.0210 Epoch: 10 Global Step: 181020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:31,370-Speed 9670.63 samples/sec Loss 5.6050 LearningRate 0.0209 Epoch: 10 Global Step: 181030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:32,530-Speed 8833.41 samples/sec Loss 5.7147 LearningRate 0.0209 Epoch: 10 Global Step: 181040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:33,606-Speed 9522.39 samples/sec Loss 5.6316 LearningRate 0.0209 Epoch: 10 Global Step: 181050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:34,712-Speed 9266.43 samples/sec Loss 5.5758 LearningRate 0.0209 Epoch: 10 Global Step: 181060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:35,829-Speed 9176.67 samples/sec Loss 5.7086 LearningRate 0.0209 Epoch: 10 Global Step: 181070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:36,908-Speed 9495.01 samples/sec Loss 5.6009 LearningRate 0.0209 Epoch: 10 Global Step: 181080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:37,991-Speed 9461.30 samples/sec Loss 5.5266 LearningRate 0.0209 Epoch: 10 Global Step: 181090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:39,080-Speed 9409.05 samples/sec Loss 5.6452 LearningRate 0.0209 Epoch: 10 Global Step: 181100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:40,145-Speed 9621.72 samples/sec Loss 5.6058 LearningRate 0.0209 Epoch: 10 Global Step: 181110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:41,217-Speed 9552.57 samples/sec Loss 5.6062 LearningRate 0.0209 Epoch: 10 Global Step: 181120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:42,266-Speed 9768.86 samples/sec Loss 5.5085 LearningRate 0.0209 Epoch: 10 Global Step: 181130 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:50:43,395-Speed 9078.60 samples/sec Loss 5.5726 LearningRate 0.0209 Epoch: 10 Global Step: 181140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:44,455-Speed 9661.08 samples/sec Loss 5.6247 LearningRate 0.0209 Epoch: 10 Global Step: 181150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:45,537-Speed 9473.99 samples/sec Loss 5.5418 LearningRate 0.0209 Epoch: 10 Global Step: 181160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:46,606-Speed 9579.48 samples/sec Loss 5.6141 LearningRate 0.0209 Epoch: 10 Global Step: 181170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:47,705-Speed 9319.19 samples/sec Loss 5.4677 LearningRate 0.0209 Epoch: 10 Global Step: 181180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:48,826-Speed 9140.51 samples/sec Loss 5.5491 LearningRate 0.0209 Epoch: 10 Global Step: 181190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:49,933-Speed 9258.73 samples/sec Loss 5.5575 LearningRate 0.0209 Epoch: 10 Global Step: 181200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:51,030-Speed 9344.15 samples/sec Loss 5.6622 LearningRate 0.0209 Epoch: 10 Global Step: 181210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:52,091-Speed 9661.01 samples/sec Loss 5.6933 LearningRate 0.0209 Epoch: 10 Global Step: 181220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:53,171-Speed 9490.10 samples/sec Loss 5.6974 LearningRate 0.0209 Epoch: 10 Global Step: 181230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:50:54,270-Speed 9323.54 samples/sec Loss 5.5561 LearningRate 0.0209 Epoch: 10 Global Step: 181240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:55,380-Speed 9226.09 samples/sec Loss 5.7878 LearningRate 0.0209 Epoch: 10 Global Step: 181250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:56,502-Speed 9134.04 samples/sec Loss 5.5603 LearningRate 0.0209 Epoch: 10 Global Step: 181260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:57,620-Speed 9165.98 samples/sec Loss 5.6754 LearningRate 0.0209 Epoch: 10 Global Step: 181270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:58,717-Speed 9356.03 samples/sec Loss 5.6441 LearningRate 0.0209 Epoch: 10 Global Step: 181280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:50:59,812-Speed 9359.42 samples/sec Loss 5.5476 LearningRate 0.0209 Epoch: 10 Global Step: 181290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:51:00,914-Speed 9291.27 samples/sec Loss 5.6210 LearningRate 0.0209 Epoch: 10 Global Step: 181300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:51:01,997-Speed 9467.08 samples/sec Loss 5.6936 LearningRate 0.0209 Epoch: 10 Global Step: 181310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:51:03,136-Speed 8994.49 samples/sec Loss 5.6400 LearningRate 0.0209 Epoch: 10 Global Step: 181320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:51:04,250-Speed 9199.25 samples/sec Loss 5.5381 LearningRate 0.0209 Epoch: 10 Global Step: 181330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:51:05,345-Speed 9361.92 samples/sec Loss 5.6691 LearningRate 0.0209 Epoch: 10 Global Step: 181340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:06,439-Speed 9359.34 samples/sec Loss 5.5773 LearningRate 0.0209 Epoch: 10 Global Step: 181350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:07,537-Speed 9333.68 samples/sec Loss 5.7516 LearningRate 0.0209 Epoch: 10 Global Step: 181360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:08,657-Speed 9148.35 samples/sec Loss 5.6163 LearningRate 0.0209 Epoch: 10 Global Step: 181370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:09,765-Speed 9243.85 samples/sec Loss 5.6372 LearningRate 0.0209 Epoch: 10 Global Step: 181380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:10,862-Speed 9347.72 samples/sec Loss 5.5625 LearningRate 0.0209 Epoch: 10 Global Step: 181390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:11,968-Speed 9260.14 samples/sec Loss 5.5311 LearningRate 0.0208 Epoch: 10 Global Step: 181400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:13,026-Speed 9681.01 samples/sec Loss 5.6505 LearningRate 0.0208 Epoch: 10 Global Step: 181410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:14,149-Speed 9130.88 samples/sec Loss 5.5357 LearningRate 0.0208 Epoch: 10 Global Step: 181420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:15,231-Speed 9468.18 samples/sec Loss 5.5377 LearningRate 0.0208 Epoch: 10 Global Step: 181430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:16,345-Speed 9196.96 samples/sec Loss 5.6003 LearningRate 0.0208 Epoch: 10 Global Step: 181440 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:51:17,474-Speed 9069.89 samples/sec Loss 5.5989 LearningRate 0.0208 Epoch: 10 Global Step: 181450 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:51:18,572-Speed 9336.65 samples/sec Loss 5.6531 LearningRate 0.0208 Epoch: 10 Global Step: 181460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:19,667-Speed 9353.52 samples/sec Loss 5.5019 LearningRate 0.0208 Epoch: 10 Global Step: 181470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:20,749-Speed 9471.89 samples/sec Loss 5.6164 LearningRate 0.0208 Epoch: 10 Global Step: 181480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:21,870-Speed 9144.14 samples/sec Loss 5.5038 LearningRate 0.0208 Epoch: 10 Global Step: 181490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:22,955-Speed 9446.39 samples/sec Loss 5.7318 LearningRate 0.0208 Epoch: 10 Global Step: 181500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:24,141-Speed 8636.44 samples/sec Loss 5.5975 LearningRate 0.0208 Epoch: 10 Global Step: 181510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:25,211-Speed 9576.14 samples/sec Loss 5.4638 LearningRate 0.0208 Epoch: 10 Global Step: 181520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:26,342-Speed 9060.35 samples/sec Loss 5.6532 LearningRate 0.0208 Epoch: 10 Global Step: 181530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:27,464-Speed 9125.77 samples/sec Loss 5.6067 LearningRate 0.0208 Epoch: 10 Global Step: 181540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:28,521-Speed 9704.16 samples/sec Loss 5.5547 LearningRate 0.0208 Epoch: 10 Global Step: 181550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:29,602-Speed 9477.21 samples/sec Loss 5.5008 LearningRate 0.0208 Epoch: 10 Global Step: 181560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:30,683-Speed 9470.15 samples/sec Loss 5.5585 LearningRate 0.0208 Epoch: 10 Global Step: 181570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:31,774-Speed 9395.95 samples/sec Loss 5.5616 LearningRate 0.0208 Epoch: 10 Global Step: 181580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:32,899-Speed 9110.49 samples/sec Loss 5.6655 LearningRate 0.0208 Epoch: 10 Global Step: 181590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:34,036-Speed 9005.40 samples/sec Loss 5.7240 LearningRate 0.0208 Epoch: 10 Global Step: 181600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:35,098-Speed 9647.51 samples/sec Loss 5.5580 LearningRate 0.0208 Epoch: 10 Global Step: 181610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:36,192-Speed 9372.37 samples/sec Loss 5.5240 LearningRate 0.0208 Epoch: 10 Global Step: 181620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:37,279-Speed 9421.66 samples/sec Loss 5.5062 LearningRate 0.0208 Epoch: 10 Global Step: 181630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:38,361-Speed 9469.47 samples/sec Loss 5.5274 LearningRate 0.0208 Epoch: 10 Global Step: 181640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:39,471-Speed 9235.13 samples/sec Loss 5.5004 LearningRate 0.0208 Epoch: 10 Global Step: 181650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:40,587-Speed 9183.89 samples/sec Loss 5.5862 LearningRate 0.0208 Epoch: 10 Global Step: 181660 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:51:41,647-Speed 9664.17 samples/sec Loss 5.6505 LearningRate 0.0208 Epoch: 10 Global Step: 181670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:42,762-Speed 9190.53 samples/sec Loss 5.6566 LearningRate 0.0208 Epoch: 10 Global Step: 181680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:43,898-Speed 9016.39 samples/sec Loss 5.5602 LearningRate 0.0208 Epoch: 10 Global Step: 181690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:44,982-Speed 9454.06 samples/sec Loss 5.6773 LearningRate 0.0208 Epoch: 10 Global Step: 181700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:46,067-Speed 9437.92 samples/sec Loss 5.5398 LearningRate 0.0208 Epoch: 10 Global Step: 181710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:47,106-Speed 9868.97 samples/sec Loss 5.4874 LearningRate 0.0208 Epoch: 10 Global Step: 181720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:48,186-Speed 9485.68 samples/sec Loss 5.6235 LearningRate 0.0208 Epoch: 10 Global Step: 181730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:49,235-Speed 9763.71 samples/sec Loss 5.6116 LearningRate 0.0208 Epoch: 10 Global Step: 181740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:50,326-Speed 9392.31 samples/sec Loss 5.6387 LearningRate 0.0208 Epoch: 10 Global Step: 181750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:51,414-Speed 9418.15 samples/sec Loss 5.6454 LearningRate 0.0207 Epoch: 10 Global Step: 181760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:52,518-Speed 9282.51 samples/sec Loss 5.7010 LearningRate 0.0207 Epoch: 10 Global Step: 181770 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:51:53,637-Speed 9155.42 samples/sec Loss 5.6186 LearningRate 0.0207 Epoch: 10 Global Step: 181780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:51:54,713-Speed 9522.46 samples/sec Loss 5.5325 LearningRate 0.0207 Epoch: 10 Global Step: 181790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:55,781-Speed 9596.16 samples/sec Loss 5.5884 LearningRate 0.0207 Epoch: 10 Global Step: 181800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:56,892-Speed 9220.32 samples/sec Loss 5.6106 LearningRate 0.0207 Epoch: 10 Global Step: 181810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:58,007-Speed 9199.06 samples/sec Loss 5.5213 LearningRate 0.0207 Epoch: 10 Global Step: 181820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:51:59,158-Speed 8901.22 samples/sec Loss 5.5667 LearningRate 0.0207 Epoch: 10 Global Step: 181830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:00,212-Speed 9723.24 samples/sec Loss 5.6580 LearningRate 0.0207 Epoch: 10 Global Step: 181840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:01,296-Speed 9450.79 samples/sec Loss 5.5230 LearningRate 0.0207 Epoch: 10 Global Step: 181850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:02,396-Speed 9316.12 samples/sec Loss 5.5810 LearningRate 0.0207 Epoch: 10 Global Step: 181860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:03,479-Speed 9461.34 samples/sec Loss 5.6915 LearningRate 0.0207 Epoch: 10 Global Step: 181870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:04,582-Speed 9282.88 samples/sec Loss 5.6160 LearningRate 0.0207 Epoch: 10 Global Step: 181880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:05,649-Speed 9609.94 samples/sec Loss 5.5538 LearningRate 0.0207 Epoch: 10 Global Step: 181890 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:52:06,692-Speed 9819.89 samples/sec Loss 5.5279 LearningRate 0.0207 Epoch: 10 Global Step: 181900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:07,785-Speed 9374.92 samples/sec Loss 5.5133 LearningRate 0.0207 Epoch: 10 Global Step: 181910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:08,891-Speed 9257.69 samples/sec Loss 5.5091 LearningRate 0.0207 Epoch: 10 Global Step: 181920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:10,066-Speed 8727.68 samples/sec Loss 5.5121 LearningRate 0.0207 Epoch: 10 Global Step: 181930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:11,169-Speed 9282.96 samples/sec Loss 5.5437 LearningRate 0.0207 Epoch: 10 Global Step: 181940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:12,216-Speed 9788.59 samples/sec Loss 5.6042 LearningRate 0.0207 Epoch: 10 Global Step: 181950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:13,318-Speed 9301.67 samples/sec Loss 5.6618 LearningRate 0.0207 Epoch: 10 Global Step: 181960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:14,451-Speed 9043.06 samples/sec Loss 5.6189 LearningRate 0.0207 Epoch: 10 Global Step: 181970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:15,499-Speed 9775.16 samples/sec Loss 5.5977 LearningRate 0.0207 Epoch: 10 Global Step: 181980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:16,582-Speed 9457.61 samples/sec Loss 5.5196 LearningRate 0.0207 Epoch: 10 Global Step: 181990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:17,634-Speed 9742.04 samples/sec Loss 5.5282 LearningRate 0.0207 Epoch: 10 Global Step: 182000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:52:39,709-[lfw][182000]XNorm: 9.092343 Training: 2022-04-11 18:52:39,710-[lfw][182000]Accuracy-Flip: 0.99667+-0.00269 Training: 2022-04-11 18:52:39,710-[lfw][182000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:53:05,080-[cfp_fp][182000]XNorm: 7.787156 Training: 2022-04-11 18:53:05,080-[cfp_fp][182000]Accuracy-Flip: 0.96643+-0.00806 Training: 2022-04-11 18:53:05,081-[cfp_fp][182000]Accuracy-Highest: 0.96643 Training: 2022-04-11 18:53:26,990-[agedb_30][182000]XNorm: 8.836864 Training: 2022-04-11 18:53:26,991-[agedb_30][182000]Accuracy-Flip: 0.96800+-0.00994 Training: 2022-04-11 18:53:26,992-[agedb_30][182000]Accuracy-Highest: 0.96917 Training: 2022-04-11 18:53:28,065-Speed 145.39 samples/sec Loss 5.6238 LearningRate 0.0207 Epoch: 10 Global Step: 182010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:29,127-Speed 9654.11 samples/sec Loss 5.6082 LearningRate 0.0207 Epoch: 10 Global Step: 182020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:30,184-Speed 9687.22 samples/sec Loss 5.5817 LearningRate 0.0207 Epoch: 10 Global Step: 182030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:31,305-Speed 9145.97 samples/sec Loss 5.5862 LearningRate 0.0207 Epoch: 10 Global Step: 182040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:53:32,380-Speed 9528.06 samples/sec Loss 5.6473 LearningRate 0.0207 Epoch: 10 Global Step: 182050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:53:33,458-Speed 9500.03 samples/sec Loss 5.6451 LearningRate 0.0207 Epoch: 10 Global Step: 182060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:53:34,533-Speed 9541.66 samples/sec Loss 5.5579 LearningRate 0.0207 Epoch: 10 Global Step: 182070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:53:35,607-Speed 9531.32 samples/sec Loss 5.6759 LearningRate 0.0207 Epoch: 10 Global Step: 182080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:53:36,687-Speed 9492.06 samples/sec Loss 5.6180 LearningRate 0.0207 Epoch: 10 Global Step: 182090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:53:37,782-Speed 9357.20 samples/sec Loss 5.6159 LearningRate 0.0207 Epoch: 10 Global Step: 182100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:53:38,880-Speed 9331.92 samples/sec Loss 5.6732 LearningRate 0.0207 Epoch: 10 Global Step: 182110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:53:40,042-Speed 8812.22 samples/sec Loss 5.7330 LearningRate 0.0207 Epoch: 10 Global Step: 182120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:53:41,110-Speed 9601.20 samples/sec Loss 5.5622 LearningRate 0.0206 Epoch: 10 Global Step: 182130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:53:42,203-Speed 9373.15 samples/sec Loss 5.6258 LearningRate 0.0206 Epoch: 10 Global Step: 182140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:43,298-Speed 9355.38 samples/sec Loss 5.6507 LearningRate 0.0206 Epoch: 10 Global Step: 182150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:44,372-Speed 9538.11 samples/sec Loss 5.6195 LearningRate 0.0206 Epoch: 10 Global Step: 182160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:45,444-Speed 9555.46 samples/sec Loss 5.5303 LearningRate 0.0206 Epoch: 10 Global Step: 182170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:46,567-Speed 9129.51 samples/sec Loss 5.5612 LearningRate 0.0206 Epoch: 10 Global Step: 182180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:47,657-Speed 9400.32 samples/sec Loss 5.6525 LearningRate 0.0206 Epoch: 10 Global Step: 182190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:48,732-Speed 9527.91 samples/sec Loss 5.6620 LearningRate 0.0206 Epoch: 10 Global Step: 182200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:49,852-Speed 9151.55 samples/sec Loss 5.5979 LearningRate 0.0206 Epoch: 10 Global Step: 182210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:50,937-Speed 9443.12 samples/sec Loss 5.5870 LearningRate 0.0206 Epoch: 10 Global Step: 182220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:52,016-Speed 9497.78 samples/sec Loss 5.6357 LearningRate 0.0206 Epoch: 10 Global Step: 182230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:53,056-Speed 9852.54 samples/sec Loss 5.5885 LearningRate 0.0206 Epoch: 10 Global Step: 182240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:54,157-Speed 9309.39 samples/sec Loss 5.6244 LearningRate 0.0206 Epoch: 10 Global Step: 182250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:55,252-Speed 9358.31 samples/sec Loss 5.5810 LearningRate 0.0206 Epoch: 10 Global Step: 182260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:56,333-Speed 9475.40 samples/sec Loss 5.7566 LearningRate 0.0206 Epoch: 10 Global Step: 182270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:57,396-Speed 9644.03 samples/sec Loss 5.6259 LearningRate 0.0206 Epoch: 10 Global Step: 182280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:58,450-Speed 9721.33 samples/sec Loss 5.5914 LearningRate 0.0206 Epoch: 10 Global Step: 182290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:53:59,537-Speed 9425.81 samples/sec Loss 5.6213 LearningRate 0.0206 Epoch: 10 Global Step: 182300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:00,674-Speed 9014.11 samples/sec Loss 5.5508 LearningRate 0.0206 Epoch: 10 Global Step: 182310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:01,754-Speed 9489.83 samples/sec Loss 5.6738 LearningRate 0.0206 Epoch: 10 Global Step: 182320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:02,832-Speed 9495.69 samples/sec Loss 5.6995 LearningRate 0.0206 Epoch: 10 Global Step: 182330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:03,891-Speed 9681.78 samples/sec Loss 5.6355 LearningRate 0.0206 Epoch: 10 Global Step: 182340 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:54:04,972-Speed 9479.68 samples/sec Loss 5.6295 LearningRate 0.0206 Epoch: 10 Global Step: 182350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:06,059-Speed 9428.04 samples/sec Loss 5.6184 LearningRate 0.0206 Epoch: 10 Global Step: 182360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:07,171-Speed 9214.44 samples/sec Loss 5.6524 LearningRate 0.0206 Epoch: 10 Global Step: 182370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:08,254-Speed 9460.59 samples/sec Loss 5.6582 LearningRate 0.0206 Epoch: 10 Global Step: 182380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:09,342-Speed 9410.68 samples/sec Loss 5.5448 LearningRate 0.0206 Epoch: 10 Global Step: 182390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:10,428-Speed 9437.89 samples/sec Loss 5.5687 LearningRate 0.0206 Epoch: 10 Global Step: 182400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:11,474-Speed 9793.42 samples/sec Loss 5.6035 LearningRate 0.0206 Epoch: 10 Global Step: 182410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:12,559-Speed 9445.18 samples/sec Loss 5.5611 LearningRate 0.0206 Epoch: 10 Global Step: 182420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:13,638-Speed 9501.08 samples/sec Loss 5.5608 LearningRate 0.0206 Epoch: 10 Global Step: 182430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:14,691-Speed 9729.09 samples/sec Loss 5.6674 LearningRate 0.0206 Epoch: 10 Global Step: 182440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:54:15,762-Speed 9564.08 samples/sec Loss 5.6098 LearningRate 0.0206 Epoch: 10 Global Step: 182450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:54:16,849-Speed 9425.79 samples/sec Loss 5.6631 LearningRate 0.0206 Epoch: 10 Global Step: 182460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:54:17,931-Speed 9476.75 samples/sec Loss 5.7545 LearningRate 0.0206 Epoch: 10 Global Step: 182470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:54:19,027-Speed 9344.83 samples/sec Loss 5.4929 LearningRate 0.0206 Epoch: 10 Global Step: 182480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:54:20,102-Speed 9532.86 samples/sec Loss 5.5826 LearningRate 0.0206 Epoch: 10 Global Step: 182490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:54:21,194-Speed 9382.36 samples/sec Loss 5.5249 LearningRate 0.0205 Epoch: 10 Global Step: 182500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:54:22,295-Speed 9302.69 samples/sec Loss 5.4344 LearningRate 0.0205 Epoch: 10 Global Step: 182510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:54:23,406-Speed 9226.23 samples/sec Loss 5.5435 LearningRate 0.0205 Epoch: 10 Global Step: 182520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:54:24,487-Speed 9477.90 samples/sec Loss 5.5617 LearningRate 0.0205 Epoch: 10 Global Step: 182530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:54:25,565-Speed 9503.94 samples/sec Loss 5.5567 LearningRate 0.0205 Epoch: 10 Global Step: 182540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:26,661-Speed 9351.39 samples/sec Loss 5.5380 LearningRate 0.0205 Epoch: 10 Global Step: 182550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:27,750-Speed 9404.57 samples/sec Loss 5.5377 LearningRate 0.0205 Epoch: 10 Global Step: 182560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:28,813-Speed 9644.50 samples/sec Loss 5.6343 LearningRate 0.0205 Epoch: 10 Global Step: 182570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:29,918-Speed 9272.82 samples/sec Loss 5.5837 LearningRate 0.0205 Epoch: 10 Global Step: 182580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:30,982-Speed 9631.97 samples/sec Loss 5.6204 LearningRate 0.0205 Epoch: 10 Global Step: 182590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:32,041-Speed 9676.32 samples/sec Loss 5.5737 LearningRate 0.0205 Epoch: 10 Global Step: 182600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:33,158-Speed 9166.57 samples/sec Loss 5.5436 LearningRate 0.0205 Epoch: 10 Global Step: 182610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:34,243-Speed 9447.67 samples/sec Loss 5.6154 LearningRate 0.0205 Epoch: 10 Global Step: 182620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:35,369-Speed 9097.99 samples/sec Loss 5.7129 LearningRate 0.0205 Epoch: 10 Global Step: 182630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:36,476-Speed 9252.50 samples/sec Loss 5.5720 LearningRate 0.0205 Epoch: 10 Global Step: 182640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:37,550-Speed 9545.06 samples/sec Loss 5.6051 LearningRate 0.0205 Epoch: 10 Global Step: 182650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:38,682-Speed 9044.39 samples/sec Loss 5.5697 LearningRate 0.0205 Epoch: 10 Global Step: 182660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:39,766-Speed 9458.13 samples/sec Loss 5.5202 LearningRate 0.0205 Epoch: 10 Global Step: 182670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:40,811-Speed 9800.11 samples/sec Loss 5.5546 LearningRate 0.0205 Epoch: 10 Global Step: 182680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:41,847-Speed 9895.70 samples/sec Loss 5.5447 LearningRate 0.0205 Epoch: 10 Global Step: 182690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:42,948-Speed 9309.04 samples/sec Loss 5.5890 LearningRate 0.0205 Epoch: 10 Global Step: 182700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:44,012-Speed 9630.42 samples/sec Loss 5.6779 LearningRate 0.0205 Epoch: 10 Global Step: 182710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:45,085-Speed 9549.86 samples/sec Loss 5.5508 LearningRate 0.0205 Epoch: 10 Global Step: 182720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:46,188-Speed 9287.69 samples/sec Loss 5.6389 LearningRate 0.0205 Epoch: 10 Global Step: 182730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:47,302-Speed 9190.58 samples/sec Loss 5.6313 LearningRate 0.0205 Epoch: 10 Global Step: 182740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:48,388-Speed 9436.24 samples/sec Loss 5.5885 LearningRate 0.0205 Epoch: 10 Global Step: 182750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:49,442-Speed 9724.90 samples/sec Loss 5.5795 LearningRate 0.0205 Epoch: 10 Global Step: 182760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:50,530-Speed 9414.00 samples/sec Loss 5.5564 LearningRate 0.0205 Epoch: 10 Global Step: 182770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:51,590-Speed 9675.26 samples/sec Loss 5.6013 LearningRate 0.0205 Epoch: 10 Global Step: 182780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:52,629-Speed 9855.74 samples/sec Loss 5.6042 LearningRate 0.0205 Epoch: 10 Global Step: 182790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:53,786-Speed 8856.23 samples/sec Loss 5.5566 LearningRate 0.0205 Epoch: 10 Global Step: 182800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:54,867-Speed 9482.50 samples/sec Loss 5.5655 LearningRate 0.0205 Epoch: 10 Global Step: 182810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:55,936-Speed 9578.95 samples/sec Loss 5.5553 LearningRate 0.0205 Epoch: 10 Global Step: 182820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:57,003-Speed 9602.68 samples/sec Loss 5.6045 LearningRate 0.0205 Epoch: 10 Global Step: 182830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:54:58,041-Speed 9880.28 samples/sec Loss 5.6326 LearningRate 0.0205 Epoch: 10 Global Step: 182840 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:54:59,095-Speed 9716.16 samples/sec Loss 5.5484 LearningRate 0.0205 Epoch: 10 Global Step: 182850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:00,159-Speed 9627.15 samples/sec Loss 5.5424 LearningRate 0.0205 Epoch: 10 Global Step: 182860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:01,243-Speed 9453.14 samples/sec Loss 5.5995 LearningRate 0.0204 Epoch: 10 Global Step: 182870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:02,303-Speed 9683.56 samples/sec Loss 5.5921 LearningRate 0.0204 Epoch: 10 Global Step: 182880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:03,347-Speed 9815.19 samples/sec Loss 5.6609 LearningRate 0.0204 Epoch: 10 Global Step: 182890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:04,436-Speed 9403.43 samples/sec Loss 5.6274 LearningRate 0.0204 Epoch: 10 Global Step: 182900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:05,526-Speed 9405.60 samples/sec Loss 5.6448 LearningRate 0.0204 Epoch: 10 Global Step: 182910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:06,639-Speed 9199.12 samples/sec Loss 5.5837 LearningRate 0.0204 Epoch: 10 Global Step: 182920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:07,724-Speed 9445.35 samples/sec Loss 5.5744 LearningRate 0.0204 Epoch: 10 Global Step: 182930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:08,820-Speed 9351.82 samples/sec Loss 5.6055 LearningRate 0.0204 Epoch: 10 Global Step: 182940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:09,899-Speed 9492.34 samples/sec Loss 5.5055 LearningRate 0.0204 Epoch: 10 Global Step: 182950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:10,953-Speed 9729.43 samples/sec Loss 5.6040 LearningRate 0.0204 Epoch: 10 Global Step: 182960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:12,047-Speed 9365.02 samples/sec Loss 5.4835 LearningRate 0.0204 Epoch: 10 Global Step: 182970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:13,079-Speed 9926.40 samples/sec Loss 5.6108 LearningRate 0.0204 Epoch: 10 Global Step: 182980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:14,157-Speed 9504.17 samples/sec Loss 5.5905 LearningRate 0.0204 Epoch: 10 Global Step: 182990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:15,222-Speed 9622.28 samples/sec Loss 5.4704 LearningRate 0.0204 Epoch: 10 Global Step: 183000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:16,332-Speed 9226.09 samples/sec Loss 5.5883 LearningRate 0.0204 Epoch: 10 Global Step: 183010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:17,407-Speed 9526.15 samples/sec Loss 5.5622 LearningRate 0.0204 Epoch: 10 Global Step: 183020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:18,497-Speed 9407.50 samples/sec Loss 5.5588 LearningRate 0.0204 Epoch: 10 Global Step: 183030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:19,609-Speed 9219.15 samples/sec Loss 5.6118 LearningRate 0.0204 Epoch: 10 Global Step: 183040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:20,681-Speed 9555.69 samples/sec Loss 5.5545 LearningRate 0.0204 Epoch: 10 Global Step: 183050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:21,735-Speed 9723.28 samples/sec Loss 5.5672 LearningRate 0.0204 Epoch: 10 Global Step: 183060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:22,860-Speed 9107.23 samples/sec Loss 5.5944 LearningRate 0.0204 Epoch: 10 Global Step: 183070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:24,012-Speed 8899.89 samples/sec Loss 5.6068 LearningRate 0.0204 Epoch: 10 Global Step: 183080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:25,104-Speed 9377.92 samples/sec Loss 5.5958 LearningRate 0.0204 Epoch: 10 Global Step: 183090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:26,165-Speed 9654.45 samples/sec Loss 5.5829 LearningRate 0.0204 Epoch: 10 Global Step: 183100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:27,295-Speed 9067.54 samples/sec Loss 5.5000 LearningRate 0.0204 Epoch: 10 Global Step: 183110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:28,393-Speed 9338.17 samples/sec Loss 5.5956 LearningRate 0.0204 Epoch: 10 Global Step: 183120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:29,477-Speed 9452.25 samples/sec Loss 5.6265 LearningRate 0.0204 Epoch: 10 Global Step: 183130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:30,547-Speed 9572.98 samples/sec Loss 5.5586 LearningRate 0.0204 Epoch: 10 Global Step: 183140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:31,651-Speed 9283.36 samples/sec Loss 5.6304 LearningRate 0.0204 Epoch: 10 Global Step: 183150 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:55:32,745-Speed 9364.32 samples/sec Loss 5.5727 LearningRate 0.0204 Epoch: 10 Global Step: 183160 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:55:33,865-Speed 9149.84 samples/sec Loss 5.5119 LearningRate 0.0204 Epoch: 10 Global Step: 183170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:35,038-Speed 8735.18 samples/sec Loss 5.6349 LearningRate 0.0204 Epoch: 10 Global Step: 183180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:36,096-Speed 9685.14 samples/sec Loss 5.7069 LearningRate 0.0204 Epoch: 10 Global Step: 183190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:37,180-Speed 9448.62 samples/sec Loss 5.5557 LearningRate 0.0204 Epoch: 10 Global Step: 183200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:38,254-Speed 9538.62 samples/sec Loss 5.5420 LearningRate 0.0204 Epoch: 10 Global Step: 183210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:39,357-Speed 9295.05 samples/sec Loss 5.5582 LearningRate 0.0204 Epoch: 10 Global Step: 183220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:40,409-Speed 9743.24 samples/sec Loss 5.5730 LearningRate 0.0204 Epoch: 10 Global Step: 183230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:41,479-Speed 9573.96 samples/sec Loss 5.7083 LearningRate 0.0203 Epoch: 10 Global Step: 183240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:42,592-Speed 9207.03 samples/sec Loss 5.6759 LearningRate 0.0203 Epoch: 10 Global Step: 183250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:43,648-Speed 9694.78 samples/sec Loss 5.6249 LearningRate 0.0203 Epoch: 10 Global Step: 183260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:44,743-Speed 9360.67 samples/sec Loss 5.5807 LearningRate 0.0203 Epoch: 10 Global Step: 183270 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:55:45,836-Speed 9378.71 samples/sec Loss 5.6503 LearningRate 0.0203 Epoch: 10 Global Step: 183280 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:55:46,939-Speed 9286.58 samples/sec Loss 5.6135 LearningRate 0.0203 Epoch: 10 Global Step: 183290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:48,053-Speed 9193.50 samples/sec Loss 5.5318 LearningRate 0.0203 Epoch: 10 Global Step: 183300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:49,137-Speed 9451.64 samples/sec Loss 5.5127 LearningRate 0.0203 Epoch: 10 Global Step: 183310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:50,240-Speed 9291.35 samples/sec Loss 5.6676 LearningRate 0.0203 Epoch: 10 Global Step: 183320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:51,314-Speed 9537.95 samples/sec Loss 5.6666 LearningRate 0.0203 Epoch: 10 Global Step: 183330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:52,392-Speed 9508.32 samples/sec Loss 5.5407 LearningRate 0.0203 Epoch: 10 Global Step: 183340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:53,518-Speed 9100.11 samples/sec Loss 5.5676 LearningRate 0.0203 Epoch: 10 Global Step: 183350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:54,642-Speed 9114.97 samples/sec Loss 5.5671 LearningRate 0.0203 Epoch: 10 Global Step: 183360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:55:55,704-Speed 9652.39 samples/sec Loss 5.5550 LearningRate 0.0203 Epoch: 10 Global Step: 183370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:55:56,783-Speed 9493.17 samples/sec Loss 5.6711 LearningRate 0.0203 Epoch: 10 Global Step: 183380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:55:57,881-Speed 9335.78 samples/sec Loss 5.4884 LearningRate 0.0203 Epoch: 10 Global Step: 183390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:55:58,930-Speed 9771.17 samples/sec Loss 5.6652 LearningRate 0.0203 Epoch: 10 Global Step: 183400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:55:59,990-Speed 9673.11 samples/sec Loss 5.5816 LearningRate 0.0203 Epoch: 10 Global Step: 183410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:56:01,084-Speed 9360.50 samples/sec Loss 5.5389 LearningRate 0.0203 Epoch: 10 Global Step: 183420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:56:02,134-Speed 9760.64 samples/sec Loss 5.5983 LearningRate 0.0203 Epoch: 10 Global Step: 183430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:56:03,187-Speed 9728.76 samples/sec Loss 5.6523 LearningRate 0.0203 Epoch: 10 Global Step: 183440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:56:04,280-Speed 9370.67 samples/sec Loss 5.5508 LearningRate 0.0203 Epoch: 10 Global Step: 183450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:56:05,348-Speed 9596.50 samples/sec Loss 5.6522 LearningRate 0.0203 Epoch: 10 Global Step: 183460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:56:06,395-Speed 9787.31 samples/sec Loss 5.5624 LearningRate 0.0203 Epoch: 10 Global Step: 183470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:07,507-Speed 9213.67 samples/sec Loss 5.5962 LearningRate 0.0203 Epoch: 10 Global Step: 183480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:08,621-Speed 9199.14 samples/sec Loss 5.6105 LearningRate 0.0203 Epoch: 10 Global Step: 183490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:09,709-Speed 9413.01 samples/sec Loss 5.6253 LearningRate 0.0203 Epoch: 10 Global Step: 183500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:10,817-Speed 9247.45 samples/sec Loss 5.6466 LearningRate 0.0203 Epoch: 10 Global Step: 183510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:11,851-Speed 9920.22 samples/sec Loss 5.5812 LearningRate 0.0203 Epoch: 10 Global Step: 183520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:12,954-Speed 9289.19 samples/sec Loss 5.6129 LearningRate 0.0203 Epoch: 10 Global Step: 183530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:14,008-Speed 9715.42 samples/sec Loss 5.6037 LearningRate 0.0203 Epoch: 10 Global Step: 183540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:15,063-Speed 9712.94 samples/sec Loss 5.5188 LearningRate 0.0203 Epoch: 10 Global Step: 183550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:16,126-Speed 9643.33 samples/sec Loss 5.5483 LearningRate 0.0203 Epoch: 10 Global Step: 183560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:17,204-Speed 9503.76 samples/sec Loss 5.5635 LearningRate 0.0203 Epoch: 10 Global Step: 183570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:18,274-Speed 9574.32 samples/sec Loss 5.5489 LearningRate 0.0203 Epoch: 10 Global Step: 183580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:19,605-Speed 7694.51 samples/sec Loss 5.5103 LearningRate 0.0203 Epoch: 10 Global Step: 183590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:20,616-Speed 10135.68 samples/sec Loss 5.5595 LearningRate 0.0203 Epoch: 10 Global Step: 183600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:48,665-Speed 365.09 samples/sec Loss 4.9268 LearningRate 0.0202 Epoch: 11 Global Step: 183610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:50,040-Speed 7449.33 samples/sec Loss 4.8029 LearningRate 0.0202 Epoch: 11 Global Step: 183620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:51,201-Speed 8838.64 samples/sec Loss 4.9345 LearningRate 0.0202 Epoch: 11 Global Step: 183630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:52,336-Speed 9031.47 samples/sec Loss 4.8140 LearningRate 0.0202 Epoch: 11 Global Step: 183640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:53,869-Speed 6682.77 samples/sec Loss 4.7780 LearningRate 0.0202 Epoch: 11 Global Step: 183650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:55,302-Speed 7148.93 samples/sec Loss 4.7841 LearningRate 0.0202 Epoch: 11 Global Step: 183660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:56,390-Speed 9412.16 samples/sec Loss 4.8076 LearningRate 0.0202 Epoch: 11 Global Step: 183670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:57,475-Speed 9446.82 samples/sec Loss 4.7526 LearningRate 0.0202 Epoch: 11 Global Step: 183680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:58,650-Speed 8725.75 samples/sec Loss 4.8343 LearningRate 0.0202 Epoch: 11 Global Step: 183690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:56:59,754-Speed 9282.14 samples/sec Loss 4.9393 LearningRate 0.0202 Epoch: 11 Global Step: 183700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:00,841-Speed 9427.36 samples/sec Loss 4.9270 LearningRate 0.0202 Epoch: 11 Global Step: 183710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:01,986-Speed 8954.68 samples/sec Loss 4.7821 LearningRate 0.0202 Epoch: 11 Global Step: 183720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:03,156-Speed 8755.53 samples/sec Loss 4.7759 LearningRate 0.0202 Epoch: 11 Global Step: 183730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:04,250-Speed 9363.43 samples/sec Loss 4.8451 LearningRate 0.0202 Epoch: 11 Global Step: 183740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:05,313-Speed 9634.49 samples/sec Loss 4.8778 LearningRate 0.0202 Epoch: 11 Global Step: 183750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:06,433-Speed 9151.17 samples/sec Loss 4.9040 LearningRate 0.0202 Epoch: 11 Global Step: 183760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:07,531-Speed 9331.12 samples/sec Loss 4.8773 LearningRate 0.0202 Epoch: 11 Global Step: 183770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:08,598-Speed 9597.86 samples/sec Loss 4.9153 LearningRate 0.0202 Epoch: 11 Global Step: 183780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:09,655-Speed 9694.50 samples/sec Loss 4.7982 LearningRate 0.0202 Epoch: 11 Global Step: 183790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:10,748-Speed 9381.38 samples/sec Loss 4.8464 LearningRate 0.0202 Epoch: 11 Global Step: 183800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:11,868-Speed 9144.15 samples/sec Loss 4.8691 LearningRate 0.0202 Epoch: 11 Global Step: 183810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:12,966-Speed 9334.33 samples/sec Loss 4.8941 LearningRate 0.0202 Epoch: 11 Global Step: 183820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:14,047-Speed 9477.02 samples/sec Loss 4.7691 LearningRate 0.0202 Epoch: 11 Global Step: 183830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:15,137-Speed 9397.55 samples/sec Loss 4.8517 LearningRate 0.0202 Epoch: 11 Global Step: 183840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:16,574-Speed 7130.00 samples/sec Loss 4.8946 LearningRate 0.0202 Epoch: 11 Global Step: 183850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:17,823-Speed 8202.65 samples/sec Loss 4.9000 LearningRate 0.0202 Epoch: 11 Global Step: 183860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:19,488-Speed 6150.86 samples/sec Loss 4.8430 LearningRate 0.0202 Epoch: 11 Global Step: 183870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:20,548-Speed 9671.04 samples/sec Loss 4.8591 LearningRate 0.0202 Epoch: 11 Global Step: 183880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:21,652-Speed 9280.29 samples/sec Loss 4.9169 LearningRate 0.0202 Epoch: 11 Global Step: 183890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:22,797-Speed 8954.66 samples/sec Loss 4.9579 LearningRate 0.0202 Epoch: 11 Global Step: 183900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:24,107-Speed 7816.87 samples/sec Loss 4.9482 LearningRate 0.0202 Epoch: 11 Global Step: 183910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:25,242-Speed 9030.53 samples/sec Loss 4.9164 LearningRate 0.0202 Epoch: 11 Global Step: 183920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:26,354-Speed 9208.02 samples/sec Loss 4.7748 LearningRate 0.0202 Epoch: 11 Global Step: 183930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:27,446-Speed 9386.44 samples/sec Loss 4.8495 LearningRate 0.0202 Epoch: 11 Global Step: 183940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:28,488-Speed 9829.92 samples/sec Loss 4.9308 LearningRate 0.0202 Epoch: 11 Global Step: 183950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:29,620-Speed 9059.30 samples/sec Loss 4.9244 LearningRate 0.0202 Epoch: 11 Global Step: 183960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:30,693-Speed 9548.61 samples/sec Loss 4.8406 LearningRate 0.0202 Epoch: 11 Global Step: 183970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:57:31,774-Speed 9473.97 samples/sec Loss 4.8899 LearningRate 0.0201 Epoch: 11 Global Step: 183980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:32,843-Speed 9587.25 samples/sec Loss 4.9610 LearningRate 0.0201 Epoch: 11 Global Step: 183990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:33,930-Speed 9425.75 samples/sec Loss 4.9508 LearningRate 0.0201 Epoch: 11 Global Step: 184000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:57:55,934-[lfw][184000]XNorm: 9.015050 Training: 2022-04-11 18:57:55,934-[lfw][184000]Accuracy-Flip: 0.99583+-0.00310 Training: 2022-04-11 18:57:55,935-[lfw][184000]Accuracy-Highest: 0.99683 Training: 2022-04-11 18:58:21,329-[cfp_fp][184000]XNorm: 7.712500 Training: 2022-04-11 18:58:21,330-[cfp_fp][184000]Accuracy-Flip: 0.96586+-0.00839 Training: 2022-04-11 18:58:21,331-[cfp_fp][184000]Accuracy-Highest: 0.96643 Training: 2022-04-11 18:58:43,151-[agedb_30][184000]XNorm: 8.715467 Training: 2022-04-11 18:58:43,152-[agedb_30][184000]Accuracy-Flip: 0.96383+-0.01000 Training: 2022-04-11 18:58:43,152-[agedb_30][184000]Accuracy-Highest: 0.96917 Training: 2022-04-11 18:58:44,240-Speed 145.64 samples/sec Loss 4.8704 LearningRate 0.0201 Epoch: 11 Global Step: 184010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:58:45,313-Speed 9552.59 samples/sec Loss 4.8666 LearningRate 0.0201 Epoch: 11 Global Step: 184020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:58:46,400-Speed 9421.12 samples/sec Loss 4.9195 LearningRate 0.0201 Epoch: 11 Global Step: 184030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:58:47,467-Speed 9605.65 samples/sec Loss 4.9889 LearningRate 0.0201 Epoch: 11 Global Step: 184040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:58:48,551-Speed 9448.92 samples/sec Loss 4.9943 LearningRate 0.0201 Epoch: 11 Global Step: 184050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:58:49,614-Speed 9640.84 samples/sec Loss 4.9615 LearningRate 0.0201 Epoch: 11 Global Step: 184060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:58:50,697-Speed 9461.96 samples/sec Loss 5.0203 LearningRate 0.0201 Epoch: 11 Global Step: 184070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 18:58:51,795-Speed 9328.08 samples/sec Loss 4.9540 LearningRate 0.0201 Epoch: 11 Global Step: 184080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:58:52,908-Speed 9213.43 samples/sec Loss 4.8982 LearningRate 0.0201 Epoch: 11 Global Step: 184090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:58:53,974-Speed 9611.70 samples/sec Loss 4.9565 LearningRate 0.0201 Epoch: 11 Global Step: 184100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:58:55,029-Speed 9710.71 samples/sec Loss 5.0132 LearningRate 0.0201 Epoch: 11 Global Step: 184110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:58:56,073-Speed 9814.51 samples/sec Loss 5.0160 LearningRate 0.0201 Epoch: 11 Global Step: 184120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:58:57,168-Speed 9351.16 samples/sec Loss 4.8398 LearningRate 0.0201 Epoch: 11 Global Step: 184130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:58:58,232-Speed 9634.44 samples/sec Loss 4.9400 LearningRate 0.0201 Epoch: 11 Global Step: 184140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:58:59,317-Speed 9447.61 samples/sec Loss 4.9002 LearningRate 0.0201 Epoch: 11 Global Step: 184150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:00,399-Speed 9470.22 samples/sec Loss 4.9262 LearningRate 0.0201 Epoch: 11 Global Step: 184160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:01,698-Speed 7885.94 samples/sec Loss 4.9077 LearningRate 0.0201 Epoch: 11 Global Step: 184170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:02,790-Speed 9381.44 samples/sec Loss 5.0080 LearningRate 0.0201 Epoch: 11 Global Step: 184180 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:59:03,877-Speed 9424.02 samples/sec Loss 4.8504 LearningRate 0.0201 Epoch: 11 Global Step: 184190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:05,015-Speed 9001.50 samples/sec Loss 5.0051 LearningRate 0.0201 Epoch: 11 Global Step: 184200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:06,100-Speed 9447.40 samples/sec Loss 5.0051 LearningRate 0.0201 Epoch: 11 Global Step: 184210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:07,216-Speed 9181.72 samples/sec Loss 5.0235 LearningRate 0.0201 Epoch: 11 Global Step: 184220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:08,335-Speed 9157.06 samples/sec Loss 4.8270 LearningRate 0.0201 Epoch: 11 Global Step: 184230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:09,388-Speed 9730.79 samples/sec Loss 4.8826 LearningRate 0.0201 Epoch: 11 Global Step: 184240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:10,446-Speed 9683.00 samples/sec Loss 4.9405 LearningRate 0.0201 Epoch: 11 Global Step: 184250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:11,542-Speed 9351.97 samples/sec Loss 4.9825 LearningRate 0.0201 Epoch: 11 Global Step: 184260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:12,623-Speed 9481.90 samples/sec Loss 4.8906 LearningRate 0.0201 Epoch: 11 Global Step: 184270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:13,714-Speed 9388.93 samples/sec Loss 4.9033 LearningRate 0.0201 Epoch: 11 Global Step: 184280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:14,779-Speed 9620.19 samples/sec Loss 4.9671 LearningRate 0.0201 Epoch: 11 Global Step: 184290 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:59:15,851-Speed 9560.40 samples/sec Loss 4.9376 LearningRate 0.0201 Epoch: 11 Global Step: 184300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:16,929-Speed 9502.78 samples/sec Loss 4.9059 LearningRate 0.0201 Epoch: 11 Global Step: 184310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:18,009-Speed 9483.67 samples/sec Loss 4.9334 LearningRate 0.0201 Epoch: 11 Global Step: 184320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:19,125-Speed 9182.44 samples/sec Loss 4.9056 LearningRate 0.0201 Epoch: 11 Global Step: 184330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:20,201-Speed 9528.24 samples/sec Loss 4.8996 LearningRate 0.0201 Epoch: 11 Global Step: 184340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:21,281-Speed 9483.56 samples/sec Loss 5.0497 LearningRate 0.0200 Epoch: 11 Global Step: 184350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:22,392-Speed 9234.03 samples/sec Loss 4.9743 LearningRate 0.0200 Epoch: 11 Global Step: 184360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:23,528-Speed 9015.22 samples/sec Loss 4.9630 LearningRate 0.0200 Epoch: 11 Global Step: 184370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:24,662-Speed 9035.45 samples/sec Loss 5.0720 LearningRate 0.0200 Epoch: 11 Global Step: 184380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:25,754-Speed 9381.10 samples/sec Loss 4.9187 LearningRate 0.0200 Epoch: 11 Global Step: 184390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:26,827-Speed 9547.66 samples/sec Loss 5.0154 LearningRate 0.0200 Epoch: 11 Global Step: 184400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:27,910-Speed 9462.23 samples/sec Loss 4.9531 LearningRate 0.0200 Epoch: 11 Global Step: 184410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:29,035-Speed 9116.17 samples/sec Loss 4.9653 LearningRate 0.0200 Epoch: 11 Global Step: 184420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:30,080-Speed 9803.08 samples/sec Loss 5.0883 LearningRate 0.0200 Epoch: 11 Global Step: 184430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:31,166-Speed 9438.27 samples/sec Loss 4.9243 LearningRate 0.0200 Epoch: 11 Global Step: 184440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:32,234-Speed 9590.66 samples/sec Loss 5.0714 LearningRate 0.0200 Epoch: 11 Global Step: 184450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:33,317-Speed 9459.53 samples/sec Loss 5.0511 LearningRate 0.0200 Epoch: 11 Global Step: 184460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:34,389-Speed 9560.18 samples/sec Loss 4.9576 LearningRate 0.0200 Epoch: 11 Global Step: 184470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:35,501-Speed 9212.75 samples/sec Loss 4.9890 LearningRate 0.0200 Epoch: 11 Global Step: 184480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:36,593-Speed 9383.31 samples/sec Loss 5.0397 LearningRate 0.0200 Epoch: 11 Global Step: 184490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:37,683-Speed 9400.62 samples/sec Loss 4.9498 LearningRate 0.0200 Epoch: 11 Global Step: 184500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 18:59:38,737-Speed 9718.05 samples/sec Loss 4.9557 LearningRate 0.0200 Epoch: 11 Global Step: 184510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:39,860-Speed 9126.98 samples/sec Loss 4.9647 LearningRate 0.0200 Epoch: 11 Global Step: 184520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:40,935-Speed 9527.62 samples/sec Loss 4.8969 LearningRate 0.0200 Epoch: 11 Global Step: 184530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:42,040-Speed 9277.75 samples/sec Loss 4.9401 LearningRate 0.0200 Epoch: 11 Global Step: 184540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:43,117-Speed 9513.04 samples/sec Loss 4.9732 LearningRate 0.0200 Epoch: 11 Global Step: 184550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:44,224-Speed 9257.29 samples/sec Loss 4.9558 LearningRate 0.0200 Epoch: 11 Global Step: 184560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:45,317-Speed 9374.67 samples/sec Loss 5.1519 LearningRate 0.0200 Epoch: 11 Global Step: 184570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:46,378-Speed 9653.37 samples/sec Loss 4.9444 LearningRate 0.0200 Epoch: 11 Global Step: 184580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:47,428-Speed 9763.40 samples/sec Loss 5.1081 LearningRate 0.0200 Epoch: 11 Global Step: 184590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:48,472-Speed 9809.62 samples/sec Loss 5.0185 LearningRate 0.0200 Epoch: 11 Global Step: 184600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:49,527-Speed 9718.98 samples/sec Loss 4.9611 LearningRate 0.0200 Epoch: 11 Global Step: 184610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:50,608-Speed 9473.90 samples/sec Loss 5.0117 LearningRate 0.0200 Epoch: 11 Global Step: 184620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:51,685-Speed 9512.96 samples/sec Loss 4.9720 LearningRate 0.0200 Epoch: 11 Global Step: 184630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:52,722-Speed 9888.53 samples/sec Loss 4.9262 LearningRate 0.0200 Epoch: 11 Global Step: 184640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:53,801-Speed 9493.64 samples/sec Loss 4.9664 LearningRate 0.0200 Epoch: 11 Global Step: 184650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:54,899-Speed 9334.62 samples/sec Loss 5.0768 LearningRate 0.0200 Epoch: 11 Global Step: 184660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:55,942-Speed 9817.39 samples/sec Loss 4.9497 LearningRate 0.0200 Epoch: 11 Global Step: 184670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:57,001-Speed 9673.34 samples/sec Loss 4.8980 LearningRate 0.0200 Epoch: 11 Global Step: 184680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:58,114-Speed 9206.53 samples/sec Loss 4.9444 LearningRate 0.0200 Epoch: 11 Global Step: 184690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 18:59:59,186-Speed 9564.69 samples/sec Loss 4.9389 LearningRate 0.0200 Epoch: 11 Global Step: 184700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:00,256-Speed 9576.87 samples/sec Loss 4.9458 LearningRate 0.0200 Epoch: 11 Global Step: 184710 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:00:01,393-Speed 9007.13 samples/sec Loss 4.9826 LearningRate 0.0199 Epoch: 11 Global Step: 184720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:02,505-Speed 9211.21 samples/sec Loss 5.1168 LearningRate 0.0199 Epoch: 11 Global Step: 184730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:03,606-Speed 9311.46 samples/sec Loss 5.0223 LearningRate 0.0199 Epoch: 11 Global Step: 184740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:04,706-Speed 9315.73 samples/sec Loss 5.0012 LearningRate 0.0199 Epoch: 11 Global Step: 184750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:05,769-Speed 9637.83 samples/sec Loss 4.9814 LearningRate 0.0199 Epoch: 11 Global Step: 184760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:06,817-Speed 9781.42 samples/sec Loss 4.9704 LearningRate 0.0199 Epoch: 11 Global Step: 184770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:07,885-Speed 9595.34 samples/sec Loss 4.9974 LearningRate 0.0199 Epoch: 11 Global Step: 184780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:08,974-Speed 9406.36 samples/sec Loss 4.8379 LearningRate 0.0199 Epoch: 11 Global Step: 184790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:10,087-Speed 9200.66 samples/sec Loss 4.9665 LearningRate 0.0199 Epoch: 11 Global Step: 184800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:11,206-Speed 9157.04 samples/sec Loss 5.0285 LearningRate 0.0199 Epoch: 11 Global Step: 184810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:12,272-Speed 9620.40 samples/sec Loss 5.0655 LearningRate 0.0199 Epoch: 11 Global Step: 184820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:13,326-Speed 9722.63 samples/sec Loss 4.9257 LearningRate 0.0199 Epoch: 11 Global Step: 184830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:14,381-Speed 9711.85 samples/sec Loss 4.9964 LearningRate 0.0199 Epoch: 11 Global Step: 184840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:15,424-Speed 9820.54 samples/sec Loss 5.0860 LearningRate 0.0199 Epoch: 11 Global Step: 184850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:16,495-Speed 9571.03 samples/sec Loss 5.0771 LearningRate 0.0199 Epoch: 11 Global Step: 184860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:17,596-Speed 9303.37 samples/sec Loss 5.0875 LearningRate 0.0199 Epoch: 11 Global Step: 184870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:18,700-Speed 9274.80 samples/sec Loss 5.0197 LearningRate 0.0199 Epoch: 11 Global Step: 184880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:19,793-Speed 9379.85 samples/sec Loss 5.0203 LearningRate 0.0199 Epoch: 11 Global Step: 184890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:20,876-Speed 9459.61 samples/sec Loss 5.0834 LearningRate 0.0199 Epoch: 11 Global Step: 184900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:21,953-Speed 9510.17 samples/sec Loss 5.0878 LearningRate 0.0199 Epoch: 11 Global Step: 184910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:23,031-Speed 9507.65 samples/sec Loss 5.1517 LearningRate 0.0199 Epoch: 11 Global Step: 184920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:24,092-Speed 9652.33 samples/sec Loss 5.0246 LearningRate 0.0199 Epoch: 11 Global Step: 184930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:25,170-Speed 9511.91 samples/sec Loss 5.0587 LearningRate 0.0199 Epoch: 11 Global Step: 184940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:26,290-Speed 9146.68 samples/sec Loss 5.0144 LearningRate 0.0199 Epoch: 11 Global Step: 184950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:27,395-Speed 9272.25 samples/sec Loss 4.9734 LearningRate 0.0199 Epoch: 11 Global Step: 184960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:28,490-Speed 9354.21 samples/sec Loss 5.0016 LearningRate 0.0199 Epoch: 11 Global Step: 184970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:29,570-Speed 9497.63 samples/sec Loss 5.0511 LearningRate 0.0199 Epoch: 11 Global Step: 184980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:30,701-Speed 9056.31 samples/sec Loss 5.0711 LearningRate 0.0199 Epoch: 11 Global Step: 184990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:31,805-Speed 9275.96 samples/sec Loss 5.1738 LearningRate 0.0199 Epoch: 11 Global Step: 185000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:32,930-Speed 9114.51 samples/sec Loss 5.1008 LearningRate 0.0199 Epoch: 11 Global Step: 185010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:33,999-Speed 9579.63 samples/sec Loss 5.0458 LearningRate 0.0199 Epoch: 11 Global Step: 185020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:35,085-Speed 9431.55 samples/sec Loss 4.9649 LearningRate 0.0199 Epoch: 11 Global Step: 185030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:36,198-Speed 9207.57 samples/sec Loss 5.0783 LearningRate 0.0199 Epoch: 11 Global Step: 185040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:37,291-Speed 9378.35 samples/sec Loss 4.9867 LearningRate 0.0199 Epoch: 11 Global Step: 185050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:38,349-Speed 9683.04 samples/sec Loss 5.1372 LearningRate 0.0199 Epoch: 11 Global Step: 185060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:39,422-Speed 9547.37 samples/sec Loss 5.0468 LearningRate 0.0199 Epoch: 11 Global Step: 185070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:40,516-Speed 9359.94 samples/sec Loss 5.0160 LearningRate 0.0199 Epoch: 11 Global Step: 185080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:41,616-Speed 9325.62 samples/sec Loss 5.0661 LearningRate 0.0199 Epoch: 11 Global Step: 185090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:42,718-Speed 9290.59 samples/sec Loss 5.0360 LearningRate 0.0198 Epoch: 11 Global Step: 185100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:43,787-Speed 9588.06 samples/sec Loss 4.9953 LearningRate 0.0198 Epoch: 11 Global Step: 185110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:44,851-Speed 9635.60 samples/sec Loss 5.0402 LearningRate 0.0198 Epoch: 11 Global Step: 185120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:45,965-Speed 9190.97 samples/sec Loss 5.0259 LearningRate 0.0198 Epoch: 11 Global Step: 185130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:47,004-Speed 9863.69 samples/sec Loss 5.0879 LearningRate 0.0198 Epoch: 11 Global Step: 185140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:48,068-Speed 9629.31 samples/sec Loss 4.9577 LearningRate 0.0198 Epoch: 11 Global Step: 185150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:49,134-Speed 9612.38 samples/sec Loss 4.9642 LearningRate 0.0198 Epoch: 11 Global Step: 185160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:50,221-Speed 9426.10 samples/sec Loss 5.1123 LearningRate 0.0198 Epoch: 11 Global Step: 185170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:51,305-Speed 9454.59 samples/sec Loss 5.1389 LearningRate 0.0198 Epoch: 11 Global Step: 185180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:52,365-Speed 9671.08 samples/sec Loss 5.1102 LearningRate 0.0198 Epoch: 11 Global Step: 185190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:53,410-Speed 9802.22 samples/sec Loss 4.9574 LearningRate 0.0198 Epoch: 11 Global Step: 185200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:54,469-Speed 9672.20 samples/sec Loss 5.0123 LearningRate 0.0198 Epoch: 11 Global Step: 185210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:55,540-Speed 9568.80 samples/sec Loss 5.0993 LearningRate 0.0198 Epoch: 11 Global Step: 185220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:00:56,646-Speed 9266.80 samples/sec Loss 5.0835 LearningRate 0.0198 Epoch: 11 Global Step: 185230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:57,722-Speed 9525.25 samples/sec Loss 5.0583 LearningRate 0.0198 Epoch: 11 Global Step: 185240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:58,782-Speed 9669.70 samples/sec Loss 5.0656 LearningRate 0.0198 Epoch: 11 Global Step: 185250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:00:59,923-Speed 8975.34 samples/sec Loss 5.0316 LearningRate 0.0198 Epoch: 11 Global Step: 185260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:00,993-Speed 9574.17 samples/sec Loss 5.0160 LearningRate 0.0198 Epoch: 11 Global Step: 185270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:02,077-Speed 9461.77 samples/sec Loss 5.0325 LearningRate 0.0198 Epoch: 11 Global Step: 185280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:03,162-Speed 9439.86 samples/sec Loss 5.0641 LearningRate 0.0198 Epoch: 11 Global Step: 185290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:04,219-Speed 9698.15 samples/sec Loss 5.1305 LearningRate 0.0198 Epoch: 11 Global Step: 185300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:05,294-Speed 9522.29 samples/sec Loss 5.0476 LearningRate 0.0198 Epoch: 11 Global Step: 185310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:06,403-Speed 9244.76 samples/sec Loss 5.1513 LearningRate 0.0198 Epoch: 11 Global Step: 185320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:07,478-Speed 9527.83 samples/sec Loss 5.0616 LearningRate 0.0198 Epoch: 11 Global Step: 185330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:08,590-Speed 9217.86 samples/sec Loss 5.1141 LearningRate 0.0198 Epoch: 11 Global Step: 185340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:09,664-Speed 9533.35 samples/sec Loss 5.0677 LearningRate 0.0198 Epoch: 11 Global Step: 185350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:10,801-Speed 9013.32 samples/sec Loss 5.1108 LearningRate 0.0198 Epoch: 11 Global Step: 185360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:11,931-Speed 9065.46 samples/sec Loss 5.0810 LearningRate 0.0198 Epoch: 11 Global Step: 185370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:13,001-Speed 9579.32 samples/sec Loss 5.0718 LearningRate 0.0198 Epoch: 11 Global Step: 185380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:14,124-Speed 9126.02 samples/sec Loss 5.1281 LearningRate 0.0198 Epoch: 11 Global Step: 185390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:15,193-Speed 9584.73 samples/sec Loss 5.0483 LearningRate 0.0198 Epoch: 11 Global Step: 185400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:16,294-Speed 9303.75 samples/sec Loss 4.9874 LearningRate 0.0198 Epoch: 11 Global Step: 185410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:17,396-Speed 9295.36 samples/sec Loss 5.0039 LearningRate 0.0198 Epoch: 11 Global Step: 185420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:18,496-Speed 9314.73 samples/sec Loss 5.0578 LearningRate 0.0198 Epoch: 11 Global Step: 185430 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:01:19,605-Speed 9242.05 samples/sec Loss 5.0692 LearningRate 0.0198 Epoch: 11 Global Step: 185440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:20,747-Speed 8967.21 samples/sec Loss 4.9882 LearningRate 0.0198 Epoch: 11 Global Step: 185450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:21,843-Speed 9351.08 samples/sec Loss 5.1450 LearningRate 0.0198 Epoch: 11 Global Step: 185460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:22,947-Speed 9281.51 samples/sec Loss 5.1633 LearningRate 0.0197 Epoch: 11 Global Step: 185470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:24,076-Speed 9076.50 samples/sec Loss 5.0451 LearningRate 0.0197 Epoch: 11 Global Step: 185480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:25,156-Speed 9490.94 samples/sec Loss 5.0752 LearningRate 0.0197 Epoch: 11 Global Step: 185490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:26,231-Speed 9528.54 samples/sec Loss 5.0609 LearningRate 0.0197 Epoch: 11 Global Step: 185500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:27,275-Speed 9811.87 samples/sec Loss 5.0356 LearningRate 0.0197 Epoch: 11 Global Step: 185510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:28,321-Speed 9795.29 samples/sec Loss 5.2283 LearningRate 0.0197 Epoch: 11 Global Step: 185520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:29,405-Speed 9454.25 samples/sec Loss 5.1222 LearningRate 0.0197 Epoch: 11 Global Step: 185530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:30,479-Speed 9544.53 samples/sec Loss 5.0278 LearningRate 0.0197 Epoch: 11 Global Step: 185540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:31,548-Speed 9585.22 samples/sec Loss 5.0752 LearningRate 0.0197 Epoch: 11 Global Step: 185550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:32,650-Speed 9292.50 samples/sec Loss 5.1677 LearningRate 0.0197 Epoch: 11 Global Step: 185560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:33,738-Speed 9418.06 samples/sec Loss 5.0854 LearningRate 0.0197 Epoch: 11 Global Step: 185570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:34,853-Speed 9191.02 samples/sec Loss 5.0935 LearningRate 0.0197 Epoch: 11 Global Step: 185580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:35,922-Speed 9584.26 samples/sec Loss 5.0839 LearningRate 0.0197 Epoch: 11 Global Step: 185590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:37,004-Speed 9470.79 samples/sec Loss 5.2100 LearningRate 0.0197 Epoch: 11 Global Step: 185600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:38,091-Speed 9426.48 samples/sec Loss 5.1797 LearningRate 0.0197 Epoch: 11 Global Step: 185610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:39,180-Speed 9405.32 samples/sec Loss 5.0750 LearningRate 0.0197 Epoch: 11 Global Step: 185620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:40,260-Speed 9488.56 samples/sec Loss 5.0557 LearningRate 0.0197 Epoch: 11 Global Step: 185630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:41,338-Speed 9510.62 samples/sec Loss 5.0936 LearningRate 0.0197 Epoch: 11 Global Step: 185640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:42,435-Speed 9340.37 samples/sec Loss 5.1522 LearningRate 0.0197 Epoch: 11 Global Step: 185650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:43,499-Speed 9627.28 samples/sec Loss 5.0860 LearningRate 0.0197 Epoch: 11 Global Step: 185660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:44,589-Speed 9405.05 samples/sec Loss 5.1532 LearningRate 0.0197 Epoch: 11 Global Step: 185670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:45,662-Speed 9543.48 samples/sec Loss 5.1032 LearningRate 0.0197 Epoch: 11 Global Step: 185680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:46,714-Speed 9742.14 samples/sec Loss 5.1292 LearningRate 0.0197 Epoch: 11 Global Step: 185690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:47,787-Speed 9553.02 samples/sec Loss 5.1467 LearningRate 0.0197 Epoch: 11 Global Step: 185700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:48,867-Speed 9479.17 samples/sec Loss 5.0591 LearningRate 0.0197 Epoch: 11 Global Step: 185710 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:01:49,924-Speed 9700.49 samples/sec Loss 5.0276 LearningRate 0.0197 Epoch: 11 Global Step: 185720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:50,994-Speed 9569.18 samples/sec Loss 5.2269 LearningRate 0.0197 Epoch: 11 Global Step: 185730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:52,091-Speed 9340.23 samples/sec Loss 5.0910 LearningRate 0.0197 Epoch: 11 Global Step: 185740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:53,174-Speed 9468.61 samples/sec Loss 5.0691 LearningRate 0.0197 Epoch: 11 Global Step: 185750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:54,251-Speed 9509.38 samples/sec Loss 5.1600 LearningRate 0.0197 Epoch: 11 Global Step: 185760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:01:55,333-Speed 9466.56 samples/sec Loss 5.1115 LearningRate 0.0197 Epoch: 11 Global Step: 185770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:56,436-Speed 9293.42 samples/sec Loss 5.0308 LearningRate 0.0197 Epoch: 11 Global Step: 185780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:57,521-Speed 9444.69 samples/sec Loss 5.0205 LearningRate 0.0197 Epoch: 11 Global Step: 185790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:58,599-Speed 9501.72 samples/sec Loss 5.1337 LearningRate 0.0197 Epoch: 11 Global Step: 185800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:01:59,679-Speed 9495.50 samples/sec Loss 5.1236 LearningRate 0.0197 Epoch: 11 Global Step: 185810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:02:00,746-Speed 9598.20 samples/sec Loss 5.1519 LearningRate 0.0197 Epoch: 11 Global Step: 185820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:02:01,828-Speed 9468.59 samples/sec Loss 5.1496 LearningRate 0.0197 Epoch: 11 Global Step: 185830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:02:02,891-Speed 9640.66 samples/sec Loss 5.1714 LearningRate 0.0197 Epoch: 11 Global Step: 185840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:02:03,985-Speed 9363.44 samples/sec Loss 5.1652 LearningRate 0.0196 Epoch: 11 Global Step: 185850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:02:05,048-Speed 9638.43 samples/sec Loss 5.0879 LearningRate 0.0196 Epoch: 11 Global Step: 185860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:02:06,139-Speed 9387.87 samples/sec Loss 5.1479 LearningRate 0.0196 Epoch: 11 Global Step: 185870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:07,220-Speed 9481.18 samples/sec Loss 5.1829 LearningRate 0.0196 Epoch: 11 Global Step: 185880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:08,258-Speed 9877.28 samples/sec Loss 5.1662 LearningRate 0.0196 Epoch: 11 Global Step: 185890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:09,347-Speed 9404.17 samples/sec Loss 5.0926 LearningRate 0.0196 Epoch: 11 Global Step: 185900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:10,413-Speed 9614.51 samples/sec Loss 5.1651 LearningRate 0.0196 Epoch: 11 Global Step: 185910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:11,477-Speed 9628.81 samples/sec Loss 5.1012 LearningRate 0.0196 Epoch: 11 Global Step: 185920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:12,572-Speed 9357.87 samples/sec Loss 5.0824 LearningRate 0.0196 Epoch: 11 Global Step: 185930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:13,631-Speed 9676.89 samples/sec Loss 5.2122 LearningRate 0.0196 Epoch: 11 Global Step: 185940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:14,737-Speed 9262.38 samples/sec Loss 5.1423 LearningRate 0.0196 Epoch: 11 Global Step: 185950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:15,826-Speed 9404.38 samples/sec Loss 5.1980 LearningRate 0.0196 Epoch: 11 Global Step: 185960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:16,929-Speed 9290.79 samples/sec Loss 5.2001 LearningRate 0.0196 Epoch: 11 Global Step: 185970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:02:18,017-Speed 9419.69 samples/sec Loss 5.1583 LearningRate 0.0196 Epoch: 11 Global Step: 185980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:19,078-Speed 9656.59 samples/sec Loss 5.1847 LearningRate 0.0196 Epoch: 11 Global Step: 185990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:20,150-Speed 9562.01 samples/sec Loss 4.9934 LearningRate 0.0196 Epoch: 11 Global Step: 186000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:02:42,012-[lfw][186000]XNorm: 9.011460 Training: 2022-04-11 19:02:42,013-[lfw][186000]Accuracy-Flip: 0.99683+-0.00293 Training: 2022-04-11 19:02:42,014-[lfw][186000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:03:07,309-[cfp_fp][186000]XNorm: 7.702360 Training: 2022-04-11 19:03:07,310-[cfp_fp][186000]Accuracy-Flip: 0.95914+-0.01125 Training: 2022-04-11 19:03:07,311-[cfp_fp][186000]Accuracy-Highest: 0.96643 Training: 2022-04-11 19:03:29,144-[agedb_30][186000]XNorm: 8.867334 Training: 2022-04-11 19:03:29,145-[agedb_30][186000]Accuracy-Flip: 0.96800+-0.01032 Training: 2022-04-11 19:03:29,145-[agedb_30][186000]Accuracy-Highest: 0.96917 Training: 2022-04-11 19:03:30,225-Speed 146.13 samples/sec Loss 5.1681 LearningRate 0.0196 Epoch: 11 Global Step: 186010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:31,306-Speed 9479.55 samples/sec Loss 5.1802 LearningRate 0.0196 Epoch: 11 Global Step: 186020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:32,425-Speed 9154.10 samples/sec Loss 5.1861 LearningRate 0.0196 Epoch: 11 Global Step: 186030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:33,528-Speed 9288.43 samples/sec Loss 5.1569 LearningRate 0.0196 Epoch: 11 Global Step: 186040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:34,644-Speed 9183.66 samples/sec Loss 5.1740 LearningRate 0.0196 Epoch: 11 Global Step: 186050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:35,721-Speed 9508.54 samples/sec Loss 5.1827 LearningRate 0.0196 Epoch: 11 Global Step: 186060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:36,783-Speed 9651.02 samples/sec Loss 5.2189 LearningRate 0.0196 Epoch: 11 Global Step: 186070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:37,858-Speed 9532.58 samples/sec Loss 5.0898 LearningRate 0.0196 Epoch: 11 Global Step: 186080 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:03:38,942-Speed 9457.12 samples/sec Loss 5.1352 LearningRate 0.0196 Epoch: 11 Global Step: 186090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:40,054-Speed 9209.52 samples/sec Loss 5.1464 LearningRate 0.0196 Epoch: 11 Global Step: 186100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:41,135-Speed 9479.59 samples/sec Loss 5.2153 LearningRate 0.0196 Epoch: 11 Global Step: 186110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:42,212-Speed 9513.58 samples/sec Loss 5.1056 LearningRate 0.0196 Epoch: 11 Global Step: 186120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:43,331-Speed 9160.82 samples/sec Loss 5.0775 LearningRate 0.0196 Epoch: 11 Global Step: 186130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:44,435-Speed 9281.56 samples/sec Loss 5.2057 LearningRate 0.0196 Epoch: 11 Global Step: 186140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:45,539-Speed 9275.44 samples/sec Loss 5.1430 LearningRate 0.0196 Epoch: 11 Global Step: 186150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:46,666-Speed 9095.31 samples/sec Loss 5.1776 LearningRate 0.0196 Epoch: 11 Global Step: 186160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:47,708-Speed 9826.12 samples/sec Loss 5.1760 LearningRate 0.0196 Epoch: 11 Global Step: 186170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:48,804-Speed 9351.51 samples/sec Loss 5.1443 LearningRate 0.0196 Epoch: 11 Global Step: 186180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:49,897-Speed 9372.99 samples/sec Loss 5.1530 LearningRate 0.0196 Epoch: 11 Global Step: 186190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:50,977-Speed 9493.55 samples/sec Loss 5.1410 LearningRate 0.0196 Epoch: 11 Global Step: 186200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:52,028-Speed 9745.14 samples/sec Loss 5.1385 LearningRate 0.0196 Epoch: 11 Global Step: 186210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:53,076-Speed 9776.06 samples/sec Loss 5.1293 LearningRate 0.0196 Epoch: 11 Global Step: 186220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:03:54,129-Speed 9732.48 samples/sec Loss 5.0998 LearningRate 0.0195 Epoch: 11 Global Step: 186230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:03:55,197-Speed 9589.08 samples/sec Loss 5.1546 LearningRate 0.0195 Epoch: 11 Global Step: 186240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:03:56,243-Speed 9794.44 samples/sec Loss 5.2717 LearningRate 0.0195 Epoch: 11 Global Step: 186250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:03:57,320-Speed 9519.82 samples/sec Loss 5.1148 LearningRate 0.0195 Epoch: 11 Global Step: 186260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:03:58,383-Speed 9634.83 samples/sec Loss 5.2696 LearningRate 0.0195 Epoch: 11 Global Step: 186270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:03:59,448-Speed 9626.74 samples/sec Loss 5.1545 LearningRate 0.0195 Epoch: 11 Global Step: 186280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:00,541-Speed 9371.86 samples/sec Loss 5.1479 LearningRate 0.0195 Epoch: 11 Global Step: 186290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:01,591-Speed 9758.33 samples/sec Loss 5.1307 LearningRate 0.0195 Epoch: 11 Global Step: 186300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:02,671-Speed 9483.77 samples/sec Loss 5.0937 LearningRate 0.0195 Epoch: 11 Global Step: 186310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:03,718-Speed 9786.85 samples/sec Loss 5.2152 LearningRate 0.0195 Epoch: 11 Global Step: 186320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:04,795-Speed 9515.63 samples/sec Loss 5.1494 LearningRate 0.0195 Epoch: 11 Global Step: 186330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:05,836-Speed 9846.31 samples/sec Loss 5.2243 LearningRate 0.0195 Epoch: 11 Global Step: 186340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:06,911-Speed 9536.59 samples/sec Loss 5.1619 LearningRate 0.0195 Epoch: 11 Global Step: 186350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:07,978-Speed 9596.65 samples/sec Loss 5.1657 LearningRate 0.0195 Epoch: 11 Global Step: 186360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:09,027-Speed 9771.00 samples/sec Loss 5.2244 LearningRate 0.0195 Epoch: 11 Global Step: 186370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:10,084-Speed 9690.59 samples/sec Loss 5.1169 LearningRate 0.0195 Epoch: 11 Global Step: 186380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:11,180-Speed 9348.31 samples/sec Loss 5.1218 LearningRate 0.0195 Epoch: 11 Global Step: 186390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:12,289-Speed 9239.29 samples/sec Loss 5.2180 LearningRate 0.0195 Epoch: 11 Global Step: 186400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:13,387-Speed 9334.85 samples/sec Loss 5.1874 LearningRate 0.0195 Epoch: 11 Global Step: 186410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:14,503-Speed 9181.50 samples/sec Loss 5.2127 LearningRate 0.0195 Epoch: 11 Global Step: 186420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:15,571-Speed 9592.08 samples/sec Loss 5.1609 LearningRate 0.0195 Epoch: 11 Global Step: 186430 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:04:16,641-Speed 9571.60 samples/sec Loss 5.1431 LearningRate 0.0195 Epoch: 11 Global Step: 186440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:17,802-Speed 8828.76 samples/sec Loss 5.0126 LearningRate 0.0195 Epoch: 11 Global Step: 186450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:18,896-Speed 9371.02 samples/sec Loss 5.2837 LearningRate 0.0195 Epoch: 11 Global Step: 186460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:19,940-Speed 9812.64 samples/sec Loss 5.1721 LearningRate 0.0195 Epoch: 11 Global Step: 186470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:21,042-Speed 9291.98 samples/sec Loss 5.1993 LearningRate 0.0195 Epoch: 11 Global Step: 186480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:22,186-Speed 8960.77 samples/sec Loss 5.1844 LearningRate 0.0195 Epoch: 11 Global Step: 186490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:23,240-Speed 9724.26 samples/sec Loss 5.1746 LearningRate 0.0195 Epoch: 11 Global Step: 186500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:24,353-Speed 9200.84 samples/sec Loss 5.2097 LearningRate 0.0195 Epoch: 11 Global Step: 186510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:25,442-Speed 9409.35 samples/sec Loss 5.2783 LearningRate 0.0195 Epoch: 11 Global Step: 186520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:26,524-Speed 9469.66 samples/sec Loss 5.1549 LearningRate 0.0195 Epoch: 11 Global Step: 186530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:27,566-Speed 9834.14 samples/sec Loss 5.0665 LearningRate 0.0195 Epoch: 11 Global Step: 186540 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:04:28,646-Speed 9491.84 samples/sec Loss 5.1049 LearningRate 0.0195 Epoch: 11 Global Step: 186550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:29,699-Speed 9726.71 samples/sec Loss 5.1247 LearningRate 0.0195 Epoch: 11 Global Step: 186560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:30,835-Speed 9022.80 samples/sec Loss 5.1372 LearningRate 0.0195 Epoch: 11 Global Step: 186570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:31,926-Speed 9386.49 samples/sec Loss 5.1746 LearningRate 0.0195 Epoch: 11 Global Step: 186580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:32,989-Speed 9637.91 samples/sec Loss 5.0985 LearningRate 0.0195 Epoch: 11 Global Step: 186590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:34,087-Speed 9337.63 samples/sec Loss 5.1090 LearningRate 0.0194 Epoch: 11 Global Step: 186600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:35,165-Speed 9502.02 samples/sec Loss 5.1451 LearningRate 0.0194 Epoch: 11 Global Step: 186610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:36,255-Speed 9406.50 samples/sec Loss 5.2141 LearningRate 0.0194 Epoch: 11 Global Step: 186620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:37,322-Speed 9594.83 samples/sec Loss 5.1675 LearningRate 0.0194 Epoch: 11 Global Step: 186630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:38,400-Speed 9510.09 samples/sec Loss 5.1987 LearningRate 0.0194 Epoch: 11 Global Step: 186640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:39,486-Speed 9428.42 samples/sec Loss 5.0925 LearningRate 0.0194 Epoch: 11 Global Step: 186650 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:04:40,558-Speed 9556.88 samples/sec Loss 5.1651 LearningRate 0.0194 Epoch: 11 Global Step: 186660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:41,618-Speed 9668.92 samples/sec Loss 5.1761 LearningRate 0.0194 Epoch: 11 Global Step: 186670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:42,711-Speed 9379.48 samples/sec Loss 5.2409 LearningRate 0.0194 Epoch: 11 Global Step: 186680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:43,759-Speed 9775.80 samples/sec Loss 5.3034 LearningRate 0.0194 Epoch: 11 Global Step: 186690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:44,863-Speed 9283.19 samples/sec Loss 5.1645 LearningRate 0.0194 Epoch: 11 Global Step: 186700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:45,931-Speed 9596.62 samples/sec Loss 5.1628 LearningRate 0.0194 Epoch: 11 Global Step: 186710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:47,018-Speed 9428.47 samples/sec Loss 5.2297 LearningRate 0.0194 Epoch: 11 Global Step: 186720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:48,070-Speed 9736.38 samples/sec Loss 5.2121 LearningRate 0.0194 Epoch: 11 Global Step: 186730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:49,164-Speed 9367.86 samples/sec Loss 5.1226 LearningRate 0.0194 Epoch: 11 Global Step: 186740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:50,254-Speed 9396.52 samples/sec Loss 5.1867 LearningRate 0.0194 Epoch: 11 Global Step: 186750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:51,332-Speed 9502.88 samples/sec Loss 5.2327 LearningRate 0.0194 Epoch: 11 Global Step: 186760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:52,503-Speed 8747.50 samples/sec Loss 5.3096 LearningRate 0.0194 Epoch: 11 Global Step: 186770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:53,621-Speed 9164.15 samples/sec Loss 5.1798 LearningRate 0.0194 Epoch: 11 Global Step: 186780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:04:54,724-Speed 9290.15 samples/sec Loss 5.2187 LearningRate 0.0194 Epoch: 11 Global Step: 186790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:55,828-Speed 9282.87 samples/sec Loss 5.1519 LearningRate 0.0194 Epoch: 11 Global Step: 186800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:56,895-Speed 9598.84 samples/sec Loss 5.2383 LearningRate 0.0194 Epoch: 11 Global Step: 186810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:57,992-Speed 9345.85 samples/sec Loss 5.2650 LearningRate 0.0194 Epoch: 11 Global Step: 186820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:04:59,084-Speed 9377.66 samples/sec Loss 5.2053 LearningRate 0.0194 Epoch: 11 Global Step: 186830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:00,196-Speed 9222.17 samples/sec Loss 5.0933 LearningRate 0.0194 Epoch: 11 Global Step: 186840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:01,275-Speed 9501.55 samples/sec Loss 5.2289 LearningRate 0.0194 Epoch: 11 Global Step: 186850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:02,374-Speed 9320.69 samples/sec Loss 5.0884 LearningRate 0.0194 Epoch: 11 Global Step: 186860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:03,441-Speed 9600.07 samples/sec Loss 5.1590 LearningRate 0.0194 Epoch: 11 Global Step: 186870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:04,512-Speed 9571.85 samples/sec Loss 5.1278 LearningRate 0.0194 Epoch: 11 Global Step: 186880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:05,593-Speed 9483.57 samples/sec Loss 5.1674 LearningRate 0.0194 Epoch: 11 Global Step: 186890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:06,701-Speed 9241.90 samples/sec Loss 5.1898 LearningRate 0.0194 Epoch: 11 Global Step: 186900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:07,788-Speed 9425.61 samples/sec Loss 5.2427 LearningRate 0.0194 Epoch: 11 Global Step: 186910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:05:08,931-Speed 8964.63 samples/sec Loss 5.1596 LearningRate 0.0194 Epoch: 11 Global Step: 186920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:05:10,026-Speed 9359.46 samples/sec Loss 5.2205 LearningRate 0.0194 Epoch: 11 Global Step: 186930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:05:11,119-Speed 9372.79 samples/sec Loss 5.1481 LearningRate 0.0194 Epoch: 11 Global Step: 186940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:05:12,200-Speed 9479.84 samples/sec Loss 5.2172 LearningRate 0.0194 Epoch: 11 Global Step: 186950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:05:13,281-Speed 9474.07 samples/sec Loss 5.2515 LearningRate 0.0194 Epoch: 11 Global Step: 186960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:05:14,356-Speed 9532.65 samples/sec Loss 5.2148 LearningRate 0.0194 Epoch: 11 Global Step: 186970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:05:15,431-Speed 9534.57 samples/sec Loss 5.1457 LearningRate 0.0193 Epoch: 11 Global Step: 186980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:05:16,548-Speed 9166.47 samples/sec Loss 5.0909 LearningRate 0.0193 Epoch: 11 Global Step: 186990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:05:17,631-Speed 9468.36 samples/sec Loss 5.1780 LearningRate 0.0193 Epoch: 11 Global Step: 187000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:05:18,708-Speed 9521.39 samples/sec Loss 5.1404 LearningRate 0.0193 Epoch: 11 Global Step: 187010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:19,796-Speed 9416.74 samples/sec Loss 5.2634 LearningRate 0.0193 Epoch: 11 Global Step: 187020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:20,910-Speed 9204.17 samples/sec Loss 5.2157 LearningRate 0.0193 Epoch: 11 Global Step: 187030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:22,013-Speed 9287.24 samples/sec Loss 5.2381 LearningRate 0.0193 Epoch: 11 Global Step: 187040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:23,119-Speed 9265.82 samples/sec Loss 5.2345 LearningRate 0.0193 Epoch: 11 Global Step: 187050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:24,231-Speed 9215.35 samples/sec Loss 5.1943 LearningRate 0.0193 Epoch: 11 Global Step: 187060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:25,315-Speed 9450.09 samples/sec Loss 5.1864 LearningRate 0.0193 Epoch: 11 Global Step: 187070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:26,428-Speed 9203.48 samples/sec Loss 5.1797 LearningRate 0.0193 Epoch: 11 Global Step: 187080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:27,517-Speed 9412.66 samples/sec Loss 5.2031 LearningRate 0.0193 Epoch: 11 Global Step: 187090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:28,646-Speed 9135.21 samples/sec Loss 5.1809 LearningRate 0.0193 Epoch: 11 Global Step: 187100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:29,708-Speed 9650.74 samples/sec Loss 5.2190 LearningRate 0.0193 Epoch: 11 Global Step: 187110 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:05:30,809-Speed 9299.54 samples/sec Loss 5.1051 LearningRate 0.0193 Epoch: 11 Global Step: 187120 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:05:31,881-Speed 9565.45 samples/sec Loss 5.1828 LearningRate 0.0193 Epoch: 11 Global Step: 187130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:32,938-Speed 9686.44 samples/sec Loss 5.2423 LearningRate 0.0193 Epoch: 11 Global Step: 187140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:34,007-Speed 9585.36 samples/sec Loss 5.1669 LearningRate 0.0193 Epoch: 11 Global Step: 187150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:35,165-Speed 8854.51 samples/sec Loss 5.2770 LearningRate 0.0193 Epoch: 11 Global Step: 187160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:36,290-Speed 9101.20 samples/sec Loss 5.2217 LearningRate 0.0193 Epoch: 11 Global Step: 187170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:37,353-Speed 9641.90 samples/sec Loss 5.2392 LearningRate 0.0193 Epoch: 11 Global Step: 187180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:38,434-Speed 9487.16 samples/sec Loss 5.1485 LearningRate 0.0193 Epoch: 11 Global Step: 187190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:39,540-Speed 9260.29 samples/sec Loss 5.1272 LearningRate 0.0193 Epoch: 11 Global Step: 187200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:40,656-Speed 9182.88 samples/sec Loss 5.1396 LearningRate 0.0193 Epoch: 11 Global Step: 187210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:41,758-Speed 9298.94 samples/sec Loss 5.1379 LearningRate 0.0193 Epoch: 11 Global Step: 187220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:42,850-Speed 9380.63 samples/sec Loss 5.1964 LearningRate 0.0193 Epoch: 11 Global Step: 187230 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:05:43,900-Speed 9755.49 samples/sec Loss 5.2207 LearningRate 0.0193 Epoch: 11 Global Step: 187240 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:05:45,007-Speed 9256.30 samples/sec Loss 5.1874 LearningRate 0.0193 Epoch: 11 Global Step: 187250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:46,130-Speed 9127.97 samples/sec Loss 5.1665 LearningRate 0.0193 Epoch: 11 Global Step: 187260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:47,195-Speed 9624.81 samples/sec Loss 5.2414 LearningRate 0.0193 Epoch: 11 Global Step: 187270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:48,287-Speed 9375.97 samples/sec Loss 5.2244 LearningRate 0.0193 Epoch: 11 Global Step: 187280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:49,347-Speed 9668.26 samples/sec Loss 5.2489 LearningRate 0.0193 Epoch: 11 Global Step: 187290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:50,421-Speed 9542.31 samples/sec Loss 5.1882 LearningRate 0.0193 Epoch: 11 Global Step: 187300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:51,500-Speed 9496.07 samples/sec Loss 5.0715 LearningRate 0.0193 Epoch: 11 Global Step: 187310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:52,622-Speed 9135.53 samples/sec Loss 5.1590 LearningRate 0.0193 Epoch: 11 Global Step: 187320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:53,730-Speed 9242.99 samples/sec Loss 5.3053 LearningRate 0.0193 Epoch: 11 Global Step: 187330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:54,832-Speed 9301.67 samples/sec Loss 5.2079 LearningRate 0.0193 Epoch: 11 Global Step: 187340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:55,952-Speed 9147.99 samples/sec Loss 5.3064 LearningRate 0.0193 Epoch: 11 Global Step: 187350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:57,025-Speed 9556.18 samples/sec Loss 5.1101 LearningRate 0.0192 Epoch: 11 Global Step: 187360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:58,131-Speed 9270.30 samples/sec Loss 5.2920 LearningRate 0.0192 Epoch: 11 Global Step: 187370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:05:59,241-Speed 9235.42 samples/sec Loss 5.3131 LearningRate 0.0192 Epoch: 11 Global Step: 187380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:00,368-Speed 9091.69 samples/sec Loss 5.1865 LearningRate 0.0192 Epoch: 11 Global Step: 187390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:01,472-Speed 9279.56 samples/sec Loss 5.2228 LearningRate 0.0192 Epoch: 11 Global Step: 187400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:02,574-Speed 9299.54 samples/sec Loss 5.2727 LearningRate 0.0192 Epoch: 11 Global Step: 187410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:03,660-Speed 9435.13 samples/sec Loss 5.1586 LearningRate 0.0192 Epoch: 11 Global Step: 187420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:04,765-Speed 9271.22 samples/sec Loss 5.1867 LearningRate 0.0192 Epoch: 11 Global Step: 187430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:05,835-Speed 9574.47 samples/sec Loss 5.2155 LearningRate 0.0192 Epoch: 11 Global Step: 187440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:06,907-Speed 9560.53 samples/sec Loss 5.1751 LearningRate 0.0192 Epoch: 11 Global Step: 187450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:07,990-Speed 9457.23 samples/sec Loss 5.1060 LearningRate 0.0192 Epoch: 11 Global Step: 187460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:09,077-Speed 9422.34 samples/sec Loss 5.1787 LearningRate 0.0192 Epoch: 11 Global Step: 187470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:10,161-Speed 9462.55 samples/sec Loss 5.1994 LearningRate 0.0192 Epoch: 11 Global Step: 187480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:11,218-Speed 9687.02 samples/sec Loss 5.1166 LearningRate 0.0192 Epoch: 11 Global Step: 187490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:12,300-Speed 9473.15 samples/sec Loss 5.3164 LearningRate 0.0192 Epoch: 11 Global Step: 187500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:13,375-Speed 9526.03 samples/sec Loss 5.2303 LearningRate 0.0192 Epoch: 11 Global Step: 187510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:14,448-Speed 9551.24 samples/sec Loss 5.2720 LearningRate 0.0192 Epoch: 11 Global Step: 187520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:15,528-Speed 9486.57 samples/sec Loss 5.2664 LearningRate 0.0192 Epoch: 11 Global Step: 187530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:16,649-Speed 9145.69 samples/sec Loss 5.1673 LearningRate 0.0192 Epoch: 11 Global Step: 187540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:17,709-Speed 9659.92 samples/sec Loss 5.1811 LearningRate 0.0192 Epoch: 11 Global Step: 187550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:18,801-Speed 9383.35 samples/sec Loss 5.2257 LearningRate 0.0192 Epoch: 11 Global Step: 187560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:19,866-Speed 9624.12 samples/sec Loss 5.2697 LearningRate 0.0192 Epoch: 11 Global Step: 187570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:20,972-Speed 9264.83 samples/sec Loss 5.1951 LearningRate 0.0192 Epoch: 11 Global Step: 187580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:22,080-Speed 9241.47 samples/sec Loss 5.1825 LearningRate 0.0192 Epoch: 11 Global Step: 187590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:23,203-Speed 9131.88 samples/sec Loss 5.2464 LearningRate 0.0192 Epoch: 11 Global Step: 187600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:24,306-Speed 9286.27 samples/sec Loss 5.1653 LearningRate 0.0192 Epoch: 11 Global Step: 187610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:25,363-Speed 9689.43 samples/sec Loss 5.1581 LearningRate 0.0192 Epoch: 11 Global Step: 187620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:26,438-Speed 9532.27 samples/sec Loss 5.2188 LearningRate 0.0192 Epoch: 11 Global Step: 187630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:27,495-Speed 9697.17 samples/sec Loss 5.2085 LearningRate 0.0192 Epoch: 11 Global Step: 187640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:28,585-Speed 9397.91 samples/sec Loss 5.2183 LearningRate 0.0192 Epoch: 11 Global Step: 187650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:29,684-Speed 9323.15 samples/sec Loss 5.2327 LearningRate 0.0192 Epoch: 11 Global Step: 187660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:30,765-Speed 9477.39 samples/sec Loss 5.1301 LearningRate 0.0192 Epoch: 11 Global Step: 187670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:31,832-Speed 9597.95 samples/sec Loss 5.1947 LearningRate 0.0192 Epoch: 11 Global Step: 187680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:32,884-Speed 9747.34 samples/sec Loss 5.1526 LearningRate 0.0192 Epoch: 11 Global Step: 187690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:33,989-Speed 9273.62 samples/sec Loss 5.2532 LearningRate 0.0192 Epoch: 11 Global Step: 187700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:35,048-Speed 9677.99 samples/sec Loss 5.2980 LearningRate 0.0192 Epoch: 11 Global Step: 187710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:36,168-Speed 9142.46 samples/sec Loss 5.2635 LearningRate 0.0192 Epoch: 11 Global Step: 187720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:37,280-Speed 9215.31 samples/sec Loss 5.2273 LearningRate 0.0192 Epoch: 11 Global Step: 187730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:38,369-Speed 9407.26 samples/sec Loss 5.1462 LearningRate 0.0191 Epoch: 11 Global Step: 187740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:39,462-Speed 9384.43 samples/sec Loss 5.2396 LearningRate 0.0191 Epoch: 11 Global Step: 187750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:40,552-Speed 9398.84 samples/sec Loss 5.3067 LearningRate 0.0191 Epoch: 11 Global Step: 187760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:41,622-Speed 9571.55 samples/sec Loss 5.2318 LearningRate 0.0191 Epoch: 11 Global Step: 187770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:42,711-Speed 9414.25 samples/sec Loss 5.1785 LearningRate 0.0191 Epoch: 11 Global Step: 187780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:43,815-Speed 9279.51 samples/sec Loss 5.2232 LearningRate 0.0191 Epoch: 11 Global Step: 187790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:44,925-Speed 9231.35 samples/sec Loss 5.3055 LearningRate 0.0191 Epoch: 11 Global Step: 187800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:45,980-Speed 9711.51 samples/sec Loss 5.2292 LearningRate 0.0191 Epoch: 11 Global Step: 187810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:47,080-Speed 9313.83 samples/sec Loss 5.2173 LearningRate 0.0191 Epoch: 11 Global Step: 187820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:48,193-Speed 9202.85 samples/sec Loss 5.2936 LearningRate 0.0191 Epoch: 11 Global Step: 187830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:49,296-Speed 9291.07 samples/sec Loss 5.1330 LearningRate 0.0191 Epoch: 11 Global Step: 187840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:50,396-Speed 9319.72 samples/sec Loss 5.2722 LearningRate 0.0191 Epoch: 11 Global Step: 187850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:51,526-Speed 9066.71 samples/sec Loss 5.3366 LearningRate 0.0191 Epoch: 11 Global Step: 187860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:52,641-Speed 9190.64 samples/sec Loss 5.1632 LearningRate 0.0191 Epoch: 11 Global Step: 187870 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:06:53,760-Speed 9154.91 samples/sec Loss 5.2788 LearningRate 0.0191 Epoch: 11 Global Step: 187880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:06:54,861-Speed 9313.23 samples/sec Loss 5.2619 LearningRate 0.0191 Epoch: 11 Global Step: 187890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:55,983-Speed 9126.85 samples/sec Loss 5.2239 LearningRate 0.0191 Epoch: 11 Global Step: 187900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:57,078-Speed 9355.07 samples/sec Loss 5.3058 LearningRate 0.0191 Epoch: 11 Global Step: 187910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:58,213-Speed 9030.20 samples/sec Loss 5.1704 LearningRate 0.0191 Epoch: 11 Global Step: 187920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:06:59,341-Speed 9080.62 samples/sec Loss 5.2504 LearningRate 0.0191 Epoch: 11 Global Step: 187930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:07:00,451-Speed 9230.15 samples/sec Loss 5.1935 LearningRate 0.0191 Epoch: 11 Global Step: 187940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:07:01,538-Speed 9426.09 samples/sec Loss 5.2652 LearningRate 0.0191 Epoch: 11 Global Step: 187950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:07:02,640-Speed 9298.84 samples/sec Loss 5.3076 LearningRate 0.0191 Epoch: 11 Global Step: 187960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:07:03,728-Speed 9427.50 samples/sec Loss 5.1871 LearningRate 0.0191 Epoch: 11 Global Step: 187970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:07:04,817-Speed 9401.81 samples/sec Loss 5.3295 LearningRate 0.0191 Epoch: 11 Global Step: 187980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:07:05,910-Speed 9374.44 samples/sec Loss 5.2901 LearningRate 0.0191 Epoch: 11 Global Step: 187990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:07:07,023-Speed 9211.37 samples/sec Loss 5.2511 LearningRate 0.0191 Epoch: 11 Global Step: 188000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:07:29,105-[lfw][188000]XNorm: 9.039301 Training: 2022-04-11 19:07:29,106-[lfw][188000]Accuracy-Flip: 0.99567+-0.00343 Training: 2022-04-11 19:07:29,106-[lfw][188000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:07:54,447-[cfp_fp][188000]XNorm: 7.775957 Training: 2022-04-11 19:07:54,448-[cfp_fp][188000]Accuracy-Flip: 0.96714+-0.00833 Training: 2022-04-11 19:07:54,448-[cfp_fp][188000]Accuracy-Highest: 0.96714 Training: 2022-04-11 19:08:16,257-[agedb_30][188000]XNorm: 8.748694 Training: 2022-04-11 19:08:16,257-[agedb_30][188000]Accuracy-Flip: 0.96833+-0.00860 Training: 2022-04-11 19:08:16,258-[agedb_30][188000]Accuracy-Highest: 0.96917 Training: 2022-04-11 19:08:17,340-Speed 145.63 samples/sec Loss 5.3247 LearningRate 0.0191 Epoch: 11 Global Step: 188010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:18,377-Speed 9876.20 samples/sec Loss 5.2771 LearningRate 0.0191 Epoch: 11 Global Step: 188020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:19,446-Speed 9588.53 samples/sec Loss 5.2194 LearningRate 0.0191 Epoch: 11 Global Step: 188030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:20,575-Speed 9072.17 samples/sec Loss 5.2292 LearningRate 0.0191 Epoch: 11 Global Step: 188040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:21,638-Speed 9636.21 samples/sec Loss 5.2523 LearningRate 0.0191 Epoch: 11 Global Step: 188050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:22,722-Speed 9456.59 samples/sec Loss 5.2568 LearningRate 0.0191 Epoch: 11 Global Step: 188060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:23,826-Speed 9288.67 samples/sec Loss 5.2644 LearningRate 0.0191 Epoch: 11 Global Step: 188070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:24,913-Speed 9426.37 samples/sec Loss 5.2680 LearningRate 0.0191 Epoch: 11 Global Step: 188080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:08:26,060-Speed 8930.19 samples/sec Loss 5.2308 LearningRate 0.0191 Epoch: 11 Global Step: 188090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:08:27,202-Speed 8969.13 samples/sec Loss 5.2772 LearningRate 0.0191 Epoch: 11 Global Step: 188100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:08:28,310-Speed 9246.15 samples/sec Loss 5.2493 LearningRate 0.0191 Epoch: 11 Global Step: 188110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:08:29,390-Speed 9492.24 samples/sec Loss 5.2727 LearningRate 0.0190 Epoch: 11 Global Step: 188120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:08:30,500-Speed 9232.50 samples/sec Loss 5.2840 LearningRate 0.0190 Epoch: 11 Global Step: 188130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:08:31,566-Speed 9604.94 samples/sec Loss 5.2052 LearningRate 0.0190 Epoch: 11 Global Step: 188140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:08:32,691-Speed 9107.93 samples/sec Loss 5.2507 LearningRate 0.0190 Epoch: 11 Global Step: 188150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:08:33,791-Speed 9317.26 samples/sec Loss 5.2226 LearningRate 0.0190 Epoch: 11 Global Step: 188160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:08:34,925-Speed 9035.37 samples/sec Loss 5.2611 LearningRate 0.0190 Epoch: 11 Global Step: 188170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:08:36,045-Speed 9148.53 samples/sec Loss 5.3138 LearningRate 0.0190 Epoch: 11 Global Step: 188180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:37,136-Speed 9390.51 samples/sec Loss 5.3163 LearningRate 0.0190 Epoch: 11 Global Step: 188190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:38,215-Speed 9497.57 samples/sec Loss 5.2142 LearningRate 0.0190 Epoch: 11 Global Step: 188200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:39,317-Speed 9300.12 samples/sec Loss 5.2320 LearningRate 0.0190 Epoch: 11 Global Step: 188210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:40,411-Speed 9372.45 samples/sec Loss 5.2460 LearningRate 0.0190 Epoch: 11 Global Step: 188220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:41,511-Speed 9306.97 samples/sec Loss 5.2143 LearningRate 0.0190 Epoch: 11 Global Step: 188230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:42,610-Speed 9324.31 samples/sec Loss 5.2117 LearningRate 0.0190 Epoch: 11 Global Step: 188240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:43,741-Speed 9065.67 samples/sec Loss 5.2941 LearningRate 0.0190 Epoch: 11 Global Step: 188250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:44,888-Speed 8927.23 samples/sec Loss 5.2074 LearningRate 0.0190 Epoch: 11 Global Step: 188260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:45,981-Speed 9380.03 samples/sec Loss 5.3042 LearningRate 0.0190 Epoch: 11 Global Step: 188270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:47,038-Speed 9691.26 samples/sec Loss 5.3014 LearningRate 0.0190 Epoch: 11 Global Step: 188280 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:08:48,104-Speed 9606.01 samples/sec Loss 5.2938 LearningRate 0.0190 Epoch: 11 Global Step: 188290 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:08:49,250-Speed 8943.46 samples/sec Loss 5.2320 LearningRate 0.0190 Epoch: 11 Global Step: 188300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:50,322-Speed 9560.02 samples/sec Loss 5.2842 LearningRate 0.0190 Epoch: 11 Global Step: 188310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:51,395-Speed 9546.87 samples/sec Loss 5.2239 LearningRate 0.0190 Epoch: 11 Global Step: 188320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:52,458-Speed 9641.20 samples/sec Loss 5.2412 LearningRate 0.0190 Epoch: 11 Global Step: 188330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:53,577-Speed 9158.61 samples/sec Loss 5.2793 LearningRate 0.0190 Epoch: 11 Global Step: 188340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:54,660-Speed 9457.95 samples/sec Loss 5.1736 LearningRate 0.0190 Epoch: 11 Global Step: 188350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:55,703-Speed 9820.50 samples/sec Loss 5.3358 LearningRate 0.0190 Epoch: 11 Global Step: 188360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:56,803-Speed 9319.75 samples/sec Loss 5.1981 LearningRate 0.0190 Epoch: 11 Global Step: 188370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:57,857-Speed 9715.17 samples/sec Loss 5.1990 LearningRate 0.0190 Epoch: 11 Global Step: 188380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:08:59,004-Speed 8938.26 samples/sec Loss 5.2183 LearningRate 0.0190 Epoch: 11 Global Step: 188390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:00,071-Speed 9599.78 samples/sec Loss 5.2290 LearningRate 0.0190 Epoch: 11 Global Step: 188400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:01,178-Speed 9254.91 samples/sec Loss 5.2118 LearningRate 0.0190 Epoch: 11 Global Step: 188410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:02,279-Speed 9305.76 samples/sec Loss 5.1839 LearningRate 0.0190 Epoch: 11 Global Step: 188420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:03,374-Speed 9360.56 samples/sec Loss 5.2833 LearningRate 0.0190 Epoch: 11 Global Step: 188430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:04,468-Speed 9371.29 samples/sec Loss 5.2405 LearningRate 0.0190 Epoch: 11 Global Step: 188440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:05,569-Speed 9301.33 samples/sec Loss 5.2701 LearningRate 0.0190 Epoch: 11 Global Step: 188450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:06,649-Speed 9492.35 samples/sec Loss 5.1640 LearningRate 0.0190 Epoch: 11 Global Step: 188460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:07,712-Speed 9637.39 samples/sec Loss 5.2224 LearningRate 0.0190 Epoch: 11 Global Step: 188470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:08,832-Speed 9143.87 samples/sec Loss 5.2517 LearningRate 0.0190 Epoch: 11 Global Step: 188480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:09,949-Speed 9182.24 samples/sec Loss 5.2239 LearningRate 0.0190 Epoch: 11 Global Step: 188490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:11,025-Speed 9519.73 samples/sec Loss 5.1516 LearningRate 0.0190 Epoch: 11 Global Step: 188500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:12,108-Speed 9461.26 samples/sec Loss 5.1849 LearningRate 0.0189 Epoch: 11 Global Step: 188510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:13,174-Speed 9606.96 samples/sec Loss 5.2459 LearningRate 0.0189 Epoch: 11 Global Step: 188520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:14,297-Speed 9126.68 samples/sec Loss 5.3310 LearningRate 0.0189 Epoch: 11 Global Step: 188530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:15,446-Speed 8912.20 samples/sec Loss 5.3189 LearningRate 0.0189 Epoch: 11 Global Step: 188540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:16,598-Speed 8898.54 samples/sec Loss 5.2676 LearningRate 0.0189 Epoch: 11 Global Step: 188550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:17,738-Speed 8985.22 samples/sec Loss 5.3226 LearningRate 0.0189 Epoch: 11 Global Step: 188560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:18,839-Speed 9305.69 samples/sec Loss 5.2121 LearningRate 0.0189 Epoch: 11 Global Step: 188570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:19,934-Speed 9358.01 samples/sec Loss 5.3260 LearningRate 0.0189 Epoch: 11 Global Step: 188580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:21,017-Speed 9458.96 samples/sec Loss 5.2246 LearningRate 0.0189 Epoch: 11 Global Step: 188590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:22,115-Speed 9338.61 samples/sec Loss 5.2754 LearningRate 0.0189 Epoch: 11 Global Step: 188600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:23,209-Speed 9361.51 samples/sec Loss 5.1984 LearningRate 0.0189 Epoch: 11 Global Step: 188610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:24,312-Speed 9292.44 samples/sec Loss 5.2842 LearningRate 0.0189 Epoch: 11 Global Step: 188620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:25,416-Speed 9281.22 samples/sec Loss 5.4052 LearningRate 0.0189 Epoch: 11 Global Step: 188630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:26,562-Speed 8935.93 samples/sec Loss 5.2235 LearningRate 0.0189 Epoch: 11 Global Step: 188640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:27,664-Speed 9304.60 samples/sec Loss 5.1447 LearningRate 0.0189 Epoch: 11 Global Step: 188650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:28,770-Speed 9261.17 samples/sec Loss 5.2056 LearningRate 0.0189 Epoch: 11 Global Step: 188660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:29,875-Speed 9272.24 samples/sec Loss 5.2808 LearningRate 0.0189 Epoch: 11 Global Step: 188670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:30,959-Speed 9455.40 samples/sec Loss 5.1513 LearningRate 0.0189 Epoch: 11 Global Step: 188680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:32,051-Speed 9379.23 samples/sec Loss 5.3337 LearningRate 0.0189 Epoch: 11 Global Step: 188690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:33,123-Speed 9558.41 samples/sec Loss 5.1918 LearningRate 0.0189 Epoch: 11 Global Step: 188700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:34,272-Speed 8923.60 samples/sec Loss 5.1939 LearningRate 0.0189 Epoch: 11 Global Step: 188710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:35,326-Speed 9712.90 samples/sec Loss 5.2718 LearningRate 0.0189 Epoch: 11 Global Step: 188720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:36,411-Speed 9443.68 samples/sec Loss 5.2465 LearningRate 0.0189 Epoch: 11 Global Step: 188730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:37,516-Speed 9272.62 samples/sec Loss 5.2918 LearningRate 0.0189 Epoch: 11 Global Step: 188740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:38,653-Speed 9016.63 samples/sec Loss 5.3135 LearningRate 0.0189 Epoch: 11 Global Step: 188750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:39,782-Speed 9080.58 samples/sec Loss 5.4141 LearningRate 0.0189 Epoch: 11 Global Step: 188760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:40,928-Speed 8938.64 samples/sec Loss 5.3203 LearningRate 0.0189 Epoch: 11 Global Step: 188770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:42,017-Speed 9410.65 samples/sec Loss 5.1622 LearningRate 0.0189 Epoch: 11 Global Step: 188780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:43,110-Speed 9380.33 samples/sec Loss 5.2319 LearningRate 0.0189 Epoch: 11 Global Step: 188790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:44,226-Speed 9174.07 samples/sec Loss 5.3614 LearningRate 0.0189 Epoch: 11 Global Step: 188800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:45,327-Speed 9311.31 samples/sec Loss 5.3573 LearningRate 0.0189 Epoch: 11 Global Step: 188810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:46,406-Speed 9496.65 samples/sec Loss 5.2538 LearningRate 0.0189 Epoch: 11 Global Step: 188820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:47,527-Speed 9136.64 samples/sec Loss 5.2112 LearningRate 0.0189 Epoch: 11 Global Step: 188830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:48,690-Speed 8813.19 samples/sec Loss 5.2966 LearningRate 0.0189 Epoch: 11 Global Step: 188840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:49,787-Speed 9339.07 samples/sec Loss 5.2452 LearningRate 0.0189 Epoch: 11 Global Step: 188850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:50,851-Speed 9625.20 samples/sec Loss 5.3001 LearningRate 0.0189 Epoch: 11 Global Step: 188860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:51,941-Speed 9403.61 samples/sec Loss 5.3303 LearningRate 0.0189 Epoch: 11 Global Step: 188870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:53,031-Speed 9403.51 samples/sec Loss 5.2718 LearningRate 0.0189 Epoch: 11 Global Step: 188880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:54,135-Speed 9277.92 samples/sec Loss 5.2866 LearningRate 0.0188 Epoch: 11 Global Step: 188890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:09:55,228-Speed 9370.61 samples/sec Loss 5.1316 LearningRate 0.0188 Epoch: 11 Global Step: 188900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:56,324-Speed 9353.81 samples/sec Loss 5.2246 LearningRate 0.0188 Epoch: 11 Global Step: 188910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:57,425-Speed 9302.07 samples/sec Loss 5.3523 LearningRate 0.0188 Epoch: 11 Global Step: 188920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:58,543-Speed 9162.05 samples/sec Loss 5.2675 LearningRate 0.0188 Epoch: 11 Global Step: 188930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:09:59,632-Speed 9413.36 samples/sec Loss 5.2385 LearningRate 0.0188 Epoch: 11 Global Step: 188940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:00,693-Speed 9657.80 samples/sec Loss 5.2606 LearningRate 0.0188 Epoch: 11 Global Step: 188950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:01,769-Speed 9522.60 samples/sec Loss 5.3541 LearningRate 0.0188 Epoch: 11 Global Step: 188960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:02,829-Speed 9662.79 samples/sec Loss 5.2803 LearningRate 0.0188 Epoch: 11 Global Step: 188970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:03,892-Speed 9642.34 samples/sec Loss 5.2778 LearningRate 0.0188 Epoch: 11 Global Step: 188980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:04,930-Speed 9877.88 samples/sec Loss 5.3208 LearningRate 0.0188 Epoch: 11 Global Step: 188990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:06,007-Speed 9510.07 samples/sec Loss 5.1922 LearningRate 0.0188 Epoch: 11 Global Step: 189000 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:10:07,101-Speed 9368.68 samples/sec Loss 5.2578 LearningRate 0.0188 Epoch: 11 Global Step: 189010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:08,172-Speed 9559.15 samples/sec Loss 5.2504 LearningRate 0.0188 Epoch: 11 Global Step: 189020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:09,231-Speed 9678.29 samples/sec Loss 5.3580 LearningRate 0.0188 Epoch: 11 Global Step: 189030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:10,312-Speed 9478.43 samples/sec Loss 5.2344 LearningRate 0.0188 Epoch: 11 Global Step: 189040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:11,393-Speed 9480.29 samples/sec Loss 5.2877 LearningRate 0.0188 Epoch: 11 Global Step: 189050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:12,448-Speed 9711.81 samples/sec Loss 5.4023 LearningRate 0.0188 Epoch: 11 Global Step: 189060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:13,532-Speed 9451.87 samples/sec Loss 5.4113 LearningRate 0.0188 Epoch: 11 Global Step: 189070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:14,674-Speed 8970.10 samples/sec Loss 5.2168 LearningRate 0.0188 Epoch: 11 Global Step: 189080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:15,769-Speed 9358.59 samples/sec Loss 5.2855 LearningRate 0.0188 Epoch: 11 Global Step: 189090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:16,875-Speed 9264.08 samples/sec Loss 5.1798 LearningRate 0.0188 Epoch: 11 Global Step: 189100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:18,020-Speed 8954.29 samples/sec Loss 5.1877 LearningRate 0.0188 Epoch: 11 Global Step: 189110 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:10:19,101-Speed 9478.93 samples/sec Loss 5.2395 LearningRate 0.0188 Epoch: 11 Global Step: 189120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:20,174-Speed 9547.61 samples/sec Loss 5.2987 LearningRate 0.0188 Epoch: 11 Global Step: 189130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:21,223-Speed 9765.93 samples/sec Loss 5.2852 LearningRate 0.0188 Epoch: 11 Global Step: 189140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:22,308-Speed 9443.94 samples/sec Loss 5.3209 LearningRate 0.0188 Epoch: 11 Global Step: 189150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:23,444-Speed 9021.28 samples/sec Loss 5.2837 LearningRate 0.0188 Epoch: 11 Global Step: 189160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:24,548-Speed 9275.96 samples/sec Loss 5.4039 LearningRate 0.0188 Epoch: 11 Global Step: 189170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:25,630-Speed 9466.36 samples/sec Loss 5.3550 LearningRate 0.0188 Epoch: 11 Global Step: 189180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:26,721-Speed 9397.05 samples/sec Loss 5.3098 LearningRate 0.0188 Epoch: 11 Global Step: 189190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:27,848-Speed 9091.18 samples/sec Loss 5.2405 LearningRate 0.0188 Epoch: 11 Global Step: 189200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:28,930-Speed 9463.97 samples/sec Loss 5.2910 LearningRate 0.0188 Epoch: 11 Global Step: 189210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:30,002-Speed 9563.16 samples/sec Loss 5.2805 LearningRate 0.0188 Epoch: 11 Global Step: 189220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:31,076-Speed 9541.72 samples/sec Loss 5.2957 LearningRate 0.0188 Epoch: 11 Global Step: 189230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:32,177-Speed 9304.06 samples/sec Loss 5.3087 LearningRate 0.0188 Epoch: 11 Global Step: 189240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:33,246-Speed 9592.32 samples/sec Loss 5.4194 LearningRate 0.0188 Epoch: 11 Global Step: 189250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:34,357-Speed 9219.84 samples/sec Loss 5.3532 LearningRate 0.0188 Epoch: 11 Global Step: 189260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:35,481-Speed 9119.08 samples/sec Loss 5.3806 LearningRate 0.0188 Epoch: 11 Global Step: 189270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:36,579-Speed 9333.60 samples/sec Loss 5.3111 LearningRate 0.0187 Epoch: 11 Global Step: 189280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:37,659-Speed 9487.97 samples/sec Loss 5.2096 LearningRate 0.0187 Epoch: 11 Global Step: 189290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:38,702-Speed 9824.76 samples/sec Loss 5.3095 LearningRate 0.0187 Epoch: 11 Global Step: 189300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:39,793-Speed 9396.25 samples/sec Loss 5.2635 LearningRate 0.0187 Epoch: 11 Global Step: 189310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:40,845-Speed 9739.32 samples/sec Loss 5.3474 LearningRate 0.0187 Epoch: 11 Global Step: 189320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:41,927-Speed 9467.15 samples/sec Loss 5.2220 LearningRate 0.0187 Epoch: 11 Global Step: 189330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:43,029-Speed 9296.90 samples/sec Loss 5.3127 LearningRate 0.0187 Epoch: 11 Global Step: 189340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:44,092-Speed 9644.45 samples/sec Loss 5.2431 LearningRate 0.0187 Epoch: 11 Global Step: 189350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:45,175-Speed 9451.88 samples/sec Loss 5.2524 LearningRate 0.0187 Epoch: 11 Global Step: 189360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:46,261-Speed 9434.98 samples/sec Loss 5.3067 LearningRate 0.0187 Epoch: 11 Global Step: 189370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:47,336-Speed 9531.02 samples/sec Loss 5.2776 LearningRate 0.0187 Epoch: 11 Global Step: 189380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:48,420-Speed 9456.00 samples/sec Loss 5.3049 LearningRate 0.0187 Epoch: 11 Global Step: 189390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:49,479-Speed 9675.84 samples/sec Loss 5.3005 LearningRate 0.0187 Epoch: 11 Global Step: 189400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:50,547-Speed 9589.57 samples/sec Loss 5.3175 LearningRate 0.0187 Epoch: 11 Global Step: 189410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:51,668-Speed 9141.47 samples/sec Loss 5.3716 LearningRate 0.0187 Epoch: 11 Global Step: 189420 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:10:52,763-Speed 9361.48 samples/sec Loss 5.1858 LearningRate 0.0187 Epoch: 11 Global Step: 189430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:53,810-Speed 9788.02 samples/sec Loss 5.3559 LearningRate 0.0187 Epoch: 11 Global Step: 189440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:54,921-Speed 9218.00 samples/sec Loss 5.3972 LearningRate 0.0187 Epoch: 11 Global Step: 189450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:56,017-Speed 9355.47 samples/sec Loss 5.2252 LearningRate 0.0187 Epoch: 11 Global Step: 189460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:57,124-Speed 9255.63 samples/sec Loss 5.3120 LearningRate 0.0187 Epoch: 11 Global Step: 189470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:58,200-Speed 9518.00 samples/sec Loss 5.2938 LearningRate 0.0187 Epoch: 11 Global Step: 189480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:10:59,268-Speed 9591.84 samples/sec Loss 5.3452 LearningRate 0.0187 Epoch: 11 Global Step: 189490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:00,353-Speed 9443.32 samples/sec Loss 5.3469 LearningRate 0.0187 Epoch: 11 Global Step: 189500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:01,429-Speed 9522.82 samples/sec Loss 5.3267 LearningRate 0.0187 Epoch: 11 Global Step: 189510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:02,542-Speed 9206.65 samples/sec Loss 5.3377 LearningRate 0.0187 Epoch: 11 Global Step: 189520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:03,662-Speed 9149.92 samples/sec Loss 5.2778 LearningRate 0.0187 Epoch: 11 Global Step: 189530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:11:04,728-Speed 9612.12 samples/sec Loss 5.3181 LearningRate 0.0187 Epoch: 11 Global Step: 189540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:05,834-Speed 9267.27 samples/sec Loss 5.4650 LearningRate 0.0187 Epoch: 11 Global Step: 189550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:06,915-Speed 9474.58 samples/sec Loss 5.3229 LearningRate 0.0187 Epoch: 11 Global Step: 189560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:07,999-Speed 9449.01 samples/sec Loss 5.3104 LearningRate 0.0187 Epoch: 11 Global Step: 189570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:09,086-Speed 9430.21 samples/sec Loss 5.2404 LearningRate 0.0187 Epoch: 11 Global Step: 189580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:10,148-Speed 9651.04 samples/sec Loss 5.2402 LearningRate 0.0187 Epoch: 11 Global Step: 189590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:11,234-Speed 9433.44 samples/sec Loss 5.2943 LearningRate 0.0187 Epoch: 11 Global Step: 189600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:12,312-Speed 9497.69 samples/sec Loss 5.3422 LearningRate 0.0187 Epoch: 11 Global Step: 189610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:13,429-Speed 9181.53 samples/sec Loss 5.2363 LearningRate 0.0187 Epoch: 11 Global Step: 189620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:14,483-Speed 9715.98 samples/sec Loss 5.2145 LearningRate 0.0187 Epoch: 11 Global Step: 189630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:15,536-Speed 9734.15 samples/sec Loss 5.2122 LearningRate 0.0187 Epoch: 11 Global Step: 189640 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:11:16,585-Speed 9767.92 samples/sec Loss 5.2068 LearningRate 0.0187 Epoch: 11 Global Step: 189650 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:11:17,715-Speed 9064.86 samples/sec Loss 5.3200 LearningRate 0.0186 Epoch: 11 Global Step: 189660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:18,790-Speed 9527.29 samples/sec Loss 5.2646 LearningRate 0.0186 Epoch: 11 Global Step: 189670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:19,843-Speed 9731.73 samples/sec Loss 5.3346 LearningRate 0.0186 Epoch: 11 Global Step: 189680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:20,886-Speed 9828.66 samples/sec Loss 5.2698 LearningRate 0.0186 Epoch: 11 Global Step: 189690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:21,960-Speed 9539.50 samples/sec Loss 5.2026 LearningRate 0.0186 Epoch: 11 Global Step: 189700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:23,058-Speed 9332.36 samples/sec Loss 5.3650 LearningRate 0.0186 Epoch: 11 Global Step: 189710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:24,198-Speed 8983.81 samples/sec Loss 5.2578 LearningRate 0.0186 Epoch: 11 Global Step: 189720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:25,356-Speed 8851.72 samples/sec Loss 5.3737 LearningRate 0.0186 Epoch: 11 Global Step: 189730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:26,429-Speed 9546.08 samples/sec Loss 5.3047 LearningRate 0.0186 Epoch: 11 Global Step: 189740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:27,540-Speed 9227.16 samples/sec Loss 5.1863 LearningRate 0.0186 Epoch: 11 Global Step: 189750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:28,670-Speed 9061.45 samples/sec Loss 5.3314 LearningRate 0.0186 Epoch: 11 Global Step: 189760 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:11:29,788-Speed 9166.59 samples/sec Loss 5.2887 LearningRate 0.0186 Epoch: 11 Global Step: 189770 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:11:30,853-Speed 9617.82 samples/sec Loss 5.2092 LearningRate 0.0186 Epoch: 11 Global Step: 189780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:11:31,925-Speed 9568.63 samples/sec Loss 5.3269 LearningRate 0.0186 Epoch: 11 Global Step: 189790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:11:33,016-Speed 9389.16 samples/sec Loss 5.1980 LearningRate 0.0186 Epoch: 11 Global Step: 189800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:34,134-Speed 9174.79 samples/sec Loss 5.3936 LearningRate 0.0186 Epoch: 11 Global Step: 189810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:35,208-Speed 9532.40 samples/sec Loss 5.3672 LearningRate 0.0186 Epoch: 11 Global Step: 189820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:36,293-Speed 9446.68 samples/sec Loss 5.3191 LearningRate 0.0186 Epoch: 11 Global Step: 189830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:37,396-Speed 9292.16 samples/sec Loss 5.2029 LearningRate 0.0186 Epoch: 11 Global Step: 189840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:38,480-Speed 9450.93 samples/sec Loss 5.2784 LearningRate 0.0186 Epoch: 11 Global Step: 189850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:39,576-Speed 9347.14 samples/sec Loss 5.2437 LearningRate 0.0186 Epoch: 11 Global Step: 189860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:40,695-Speed 9162.46 samples/sec Loss 5.3162 LearningRate 0.0186 Epoch: 11 Global Step: 189870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:41,772-Speed 9506.25 samples/sec Loss 5.3530 LearningRate 0.0186 Epoch: 11 Global Step: 189880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:42,856-Speed 9455.13 samples/sec Loss 5.3523 LearningRate 0.0186 Epoch: 11 Global Step: 189890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:43,890-Speed 9906.34 samples/sec Loss 5.3453 LearningRate 0.0186 Epoch: 11 Global Step: 189900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:45,034-Speed 8969.70 samples/sec Loss 5.2741 LearningRate 0.0186 Epoch: 11 Global Step: 189910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:46,091-Speed 9690.94 samples/sec Loss 5.2728 LearningRate 0.0186 Epoch: 11 Global Step: 189920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:47,168-Speed 9510.86 samples/sec Loss 5.2408 LearningRate 0.0186 Epoch: 11 Global Step: 189930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:48,262-Speed 9364.42 samples/sec Loss 5.3175 LearningRate 0.0186 Epoch: 11 Global Step: 189940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:49,343-Speed 9475.01 samples/sec Loss 5.2981 LearningRate 0.0186 Epoch: 11 Global Step: 189950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:50,446-Speed 9293.99 samples/sec Loss 5.3043 LearningRate 0.0186 Epoch: 11 Global Step: 189960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:51,508-Speed 9647.08 samples/sec Loss 5.3073 LearningRate 0.0186 Epoch: 11 Global Step: 189970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:52,626-Speed 9168.84 samples/sec Loss 5.2534 LearningRate 0.0186 Epoch: 11 Global Step: 189980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:53,685-Speed 9673.65 samples/sec Loss 5.2781 LearningRate 0.0186 Epoch: 11 Global Step: 189990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:11:54,759-Speed 9541.81 samples/sec Loss 5.3541 LearningRate 0.0186 Epoch: 11 Global Step: 190000 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:12:16,931-[lfw][190000]XNorm: 8.862163 Training: 2022-04-11 19:12:16,932-[lfw][190000]Accuracy-Flip: 0.99633+-0.00287 Training: 2022-04-11 19:12:16,933-[lfw][190000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:12:42,341-[cfp_fp][190000]XNorm: 7.633612 Training: 2022-04-11 19:12:42,342-[cfp_fp][190000]Accuracy-Flip: 0.96400+-0.01051 Training: 2022-04-11 19:12:42,342-[cfp_fp][190000]Accuracy-Highest: 0.96714 Training: 2022-04-11 19:13:04,211-[agedb_30][190000]XNorm: 8.639777 Training: 2022-04-11 19:13:04,212-[agedb_30][190000]Accuracy-Flip: 0.96600+-0.00943 Training: 2022-04-11 19:13:04,212-[agedb_30][190000]Accuracy-Highest: 0.96917 Training: 2022-04-11 19:13:05,282-Speed 145.20 samples/sec Loss 5.3711 LearningRate 0.0186 Epoch: 11 Global Step: 190010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:06,338-Speed 9701.50 samples/sec Loss 5.3241 LearningRate 0.0186 Epoch: 11 Global Step: 190020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:07,384-Speed 9796.73 samples/sec Loss 5.3147 LearningRate 0.0186 Epoch: 11 Global Step: 190030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:08,468-Speed 9450.17 samples/sec Loss 5.4267 LearningRate 0.0186 Epoch: 11 Global Step: 190040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:09,545-Speed 9513.05 samples/sec Loss 5.3927 LearningRate 0.0185 Epoch: 11 Global Step: 190050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:10,683-Speed 9004.15 samples/sec Loss 5.3012 LearningRate 0.0185 Epoch: 11 Global Step: 190060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:11,809-Speed 9107.54 samples/sec Loss 5.2620 LearningRate 0.0185 Epoch: 11 Global Step: 190070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:12,895-Speed 9430.85 samples/sec Loss 5.2355 LearningRate 0.0185 Epoch: 11 Global Step: 190080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:13,972-Speed 9512.34 samples/sec Loss 5.3131 LearningRate 0.0185 Epoch: 11 Global Step: 190090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:15,055-Speed 9466.48 samples/sec Loss 5.3228 LearningRate 0.0185 Epoch: 11 Global Step: 190100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:16,139-Speed 9451.88 samples/sec Loss 5.3350 LearningRate 0.0185 Epoch: 11 Global Step: 190110 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:13:17,204-Speed 9618.85 samples/sec Loss 5.2680 LearningRate 0.0185 Epoch: 11 Global Step: 190120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:18,320-Speed 9180.12 samples/sec Loss 5.2702 LearningRate 0.0185 Epoch: 11 Global Step: 190130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:19,432-Speed 9209.08 samples/sec Loss 5.2480 LearningRate 0.0185 Epoch: 11 Global Step: 190140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:20,523-Speed 9398.40 samples/sec Loss 5.3839 LearningRate 0.0185 Epoch: 11 Global Step: 190150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:21,562-Speed 9867.87 samples/sec Loss 5.3831 LearningRate 0.0185 Epoch: 11 Global Step: 190160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:22,658-Speed 9341.64 samples/sec Loss 5.3353 LearningRate 0.0185 Epoch: 11 Global Step: 190170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:23,720-Speed 9653.74 samples/sec Loss 5.2403 LearningRate 0.0185 Epoch: 11 Global Step: 190180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:24,799-Speed 9493.04 samples/sec Loss 5.2332 LearningRate 0.0185 Epoch: 11 Global Step: 190190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:25,880-Speed 9475.03 samples/sec Loss 5.3316 LearningRate 0.0185 Epoch: 11 Global Step: 190200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:26,964-Speed 9455.59 samples/sec Loss 5.3982 LearningRate 0.0185 Epoch: 11 Global Step: 190210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:28,078-Speed 9201.18 samples/sec Loss 5.2947 LearningRate 0.0185 Epoch: 11 Global Step: 190220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:29,161-Speed 9461.23 samples/sec Loss 5.2630 LearningRate 0.0185 Epoch: 11 Global Step: 190230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:30,268-Speed 9256.65 samples/sec Loss 5.2325 LearningRate 0.0185 Epoch: 11 Global Step: 190240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:31,390-Speed 9128.47 samples/sec Loss 5.3332 LearningRate 0.0185 Epoch: 11 Global Step: 190250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:32,468-Speed 9510.58 samples/sec Loss 5.2981 LearningRate 0.0185 Epoch: 11 Global Step: 190260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:33,519-Speed 9745.69 samples/sec Loss 5.2920 LearningRate 0.0185 Epoch: 11 Global Step: 190270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:34,569-Speed 9754.26 samples/sec Loss 5.2880 LearningRate 0.0185 Epoch: 11 Global Step: 190280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:35,659-Speed 9400.47 samples/sec Loss 5.3440 LearningRate 0.0185 Epoch: 11 Global Step: 190290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:36,767-Speed 9247.56 samples/sec Loss 5.3224 LearningRate 0.0185 Epoch: 11 Global Step: 190300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:37,868-Speed 9310.38 samples/sec Loss 5.4176 LearningRate 0.0185 Epoch: 11 Global Step: 190310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:38,940-Speed 9553.36 samples/sec Loss 5.2955 LearningRate 0.0185 Epoch: 11 Global Step: 190320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:40,013-Speed 9553.21 samples/sec Loss 5.3723 LearningRate 0.0185 Epoch: 11 Global Step: 190330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:41,047-Speed 9911.52 samples/sec Loss 5.3527 LearningRate 0.0185 Epoch: 11 Global Step: 190340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:42,123-Speed 9523.88 samples/sec Loss 5.3577 LearningRate 0.0185 Epoch: 11 Global Step: 190350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:43,221-Speed 9326.90 samples/sec Loss 5.3303 LearningRate 0.0185 Epoch: 11 Global Step: 190360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:44,296-Speed 9536.42 samples/sec Loss 5.2386 LearningRate 0.0185 Epoch: 11 Global Step: 190370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:45,357-Speed 9649.11 samples/sec Loss 5.3580 LearningRate 0.0185 Epoch: 11 Global Step: 190380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:46,430-Speed 9554.90 samples/sec Loss 5.3189 LearningRate 0.0185 Epoch: 11 Global Step: 190390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:47,511-Speed 9474.70 samples/sec Loss 5.2921 LearningRate 0.0185 Epoch: 11 Global Step: 190400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:48,625-Speed 9201.54 samples/sec Loss 5.3715 LearningRate 0.0185 Epoch: 11 Global Step: 190410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:49,719-Speed 9360.86 samples/sec Loss 5.1784 LearningRate 0.0185 Epoch: 11 Global Step: 190420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:50,799-Speed 9483.76 samples/sec Loss 5.2798 LearningRate 0.0185 Epoch: 11 Global Step: 190430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:13:51,892-Speed 9377.99 samples/sec Loss 5.2741 LearningRate 0.0184 Epoch: 11 Global Step: 190440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:52,994-Speed 9294.68 samples/sec Loss 5.3282 LearningRate 0.0184 Epoch: 11 Global Step: 190450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:54,117-Speed 9129.54 samples/sec Loss 5.2974 LearningRate 0.0184 Epoch: 11 Global Step: 190460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:55,166-Speed 9768.80 samples/sec Loss 5.3205 LearningRate 0.0184 Epoch: 11 Global Step: 190470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:56,247-Speed 9477.72 samples/sec Loss 5.2386 LearningRate 0.0184 Epoch: 11 Global Step: 190480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:57,440-Speed 8588.23 samples/sec Loss 5.2077 LearningRate 0.0184 Epoch: 11 Global Step: 190490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:58,555-Speed 9187.24 samples/sec Loss 5.2158 LearningRate 0.0184 Epoch: 11 Global Step: 190500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:13:59,631-Speed 9528.33 samples/sec Loss 5.2393 LearningRate 0.0184 Epoch: 11 Global Step: 190510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:00,696-Speed 9621.21 samples/sec Loss 5.2125 LearningRate 0.0184 Epoch: 11 Global Step: 190520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:01,797-Speed 9303.74 samples/sec Loss 5.2676 LearningRate 0.0184 Epoch: 11 Global Step: 190530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:02,872-Speed 9532.49 samples/sec Loss 5.3209 LearningRate 0.0184 Epoch: 11 Global Step: 190540 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:14:03,962-Speed 9391.73 samples/sec Loss 5.3756 LearningRate 0.0184 Epoch: 11 Global Step: 190550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:05,045-Speed 9465.83 samples/sec Loss 5.3535 LearningRate 0.0184 Epoch: 11 Global Step: 190560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:06,102-Speed 9697.13 samples/sec Loss 5.1801 LearningRate 0.0184 Epoch: 11 Global Step: 190570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:07,191-Speed 9406.82 samples/sec Loss 5.2848 LearningRate 0.0184 Epoch: 11 Global Step: 190580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:08,299-Speed 9247.73 samples/sec Loss 5.2394 LearningRate 0.0184 Epoch: 11 Global Step: 190590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:09,394-Speed 9355.72 samples/sec Loss 5.2571 LearningRate 0.0184 Epoch: 11 Global Step: 190600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:10,472-Speed 9499.90 samples/sec Loss 5.3032 LearningRate 0.0184 Epoch: 11 Global Step: 190610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:11,554-Speed 9480.06 samples/sec Loss 5.3561 LearningRate 0.0184 Epoch: 11 Global Step: 190620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:12,644-Speed 9393.38 samples/sec Loss 5.3462 LearningRate 0.0184 Epoch: 11 Global Step: 190630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:13,753-Speed 9242.86 samples/sec Loss 5.2984 LearningRate 0.0184 Epoch: 11 Global Step: 190640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:14,848-Speed 9359.08 samples/sec Loss 5.2770 LearningRate 0.0184 Epoch: 11 Global Step: 190650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:15,940-Speed 9384.04 samples/sec Loss 5.2776 LearningRate 0.0184 Epoch: 11 Global Step: 190660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:17,015-Speed 9529.30 samples/sec Loss 5.2890 LearningRate 0.0184 Epoch: 11 Global Step: 190670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:18,073-Speed 9684.11 samples/sec Loss 5.2835 LearningRate 0.0184 Epoch: 11 Global Step: 190680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:19,191-Speed 9170.01 samples/sec Loss 5.3129 LearningRate 0.0184 Epoch: 11 Global Step: 190690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:20,265-Speed 9532.82 samples/sec Loss 5.2844 LearningRate 0.0184 Epoch: 11 Global Step: 190700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:21,336-Speed 9571.03 samples/sec Loss 5.3751 LearningRate 0.0184 Epoch: 11 Global Step: 190710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:22,393-Speed 9697.85 samples/sec Loss 5.3112 LearningRate 0.0184 Epoch: 11 Global Step: 190720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:23,465-Speed 9559.02 samples/sec Loss 5.3300 LearningRate 0.0184 Epoch: 11 Global Step: 190730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:24,535-Speed 9575.63 samples/sec Loss 5.4223 LearningRate 0.0184 Epoch: 11 Global Step: 190740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:25,637-Speed 9292.73 samples/sec Loss 5.2487 LearningRate 0.0184 Epoch: 11 Global Step: 190750 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:14:26,683-Speed 9796.05 samples/sec Loss 5.4037 LearningRate 0.0184 Epoch: 11 Global Step: 190760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:27,796-Speed 9205.73 samples/sec Loss 5.2982 LearningRate 0.0184 Epoch: 11 Global Step: 190770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:28,902-Speed 9270.80 samples/sec Loss 5.3087 LearningRate 0.0184 Epoch: 11 Global Step: 190780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:29,987-Speed 9439.85 samples/sec Loss 5.2909 LearningRate 0.0184 Epoch: 11 Global Step: 190790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:31,081-Speed 9363.18 samples/sec Loss 5.3069 LearningRate 0.0184 Epoch: 11 Global Step: 190800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:32,183-Speed 9298.93 samples/sec Loss 5.4106 LearningRate 0.0184 Epoch: 11 Global Step: 190810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:33,271-Speed 9423.73 samples/sec Loss 5.3226 LearningRate 0.0184 Epoch: 11 Global Step: 190820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:34,336-Speed 9623.47 samples/sec Loss 5.2384 LearningRate 0.0183 Epoch: 11 Global Step: 190830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:35,414-Speed 9503.39 samples/sec Loss 5.3702 LearningRate 0.0183 Epoch: 11 Global Step: 190840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:36,480-Speed 9609.54 samples/sec Loss 5.2699 LearningRate 0.0183 Epoch: 11 Global Step: 190850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:37,533-Speed 9733.65 samples/sec Loss 5.2772 LearningRate 0.0183 Epoch: 11 Global Step: 190860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:38,589-Speed 9709.63 samples/sec Loss 5.3400 LearningRate 0.0183 Epoch: 11 Global Step: 190870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:39,678-Speed 9413.53 samples/sec Loss 5.3660 LearningRate 0.0183 Epoch: 11 Global Step: 190880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:40,784-Speed 9263.52 samples/sec Loss 5.3965 LearningRate 0.0183 Epoch: 11 Global Step: 190890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:41,873-Speed 9410.93 samples/sec Loss 5.2483 LearningRate 0.0183 Epoch: 11 Global Step: 190900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:42,926-Speed 9727.43 samples/sec Loss 5.2608 LearningRate 0.0183 Epoch: 11 Global Step: 190910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:44,029-Speed 9293.99 samples/sec Loss 5.4057 LearningRate 0.0183 Epoch: 11 Global Step: 190920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:45,149-Speed 9143.19 samples/sec Loss 5.3412 LearningRate 0.0183 Epoch: 11 Global Step: 190930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:46,218-Speed 9585.69 samples/sec Loss 5.2824 LearningRate 0.0183 Epoch: 11 Global Step: 190940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:47,310-Speed 9378.80 samples/sec Loss 5.3598 LearningRate 0.0183 Epoch: 11 Global Step: 190950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:48,405-Speed 9360.44 samples/sec Loss 5.3369 LearningRate 0.0183 Epoch: 11 Global Step: 190960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:14:49,546-Speed 8978.36 samples/sec Loss 5.3622 LearningRate 0.0183 Epoch: 11 Global Step: 190970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:50,606-Speed 9667.74 samples/sec Loss 5.2984 LearningRate 0.0183 Epoch: 11 Global Step: 190980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:51,734-Speed 9089.68 samples/sec Loss 5.3086 LearningRate 0.0183 Epoch: 11 Global Step: 190990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:52,833-Speed 9324.92 samples/sec Loss 5.2995 LearningRate 0.0183 Epoch: 11 Global Step: 191000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:53,987-Speed 8878.23 samples/sec Loss 5.2706 LearningRate 0.0183 Epoch: 11 Global Step: 191010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:55,073-Speed 9428.93 samples/sec Loss 5.3972 LearningRate 0.0183 Epoch: 11 Global Step: 191020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:56,192-Speed 9156.73 samples/sec Loss 5.3439 LearningRate 0.0183 Epoch: 11 Global Step: 191030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:57,333-Speed 8981.52 samples/sec Loss 5.2638 LearningRate 0.0183 Epoch: 11 Global Step: 191040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:58,478-Speed 8953.76 samples/sec Loss 5.3190 LearningRate 0.0183 Epoch: 11 Global Step: 191050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:14:59,596-Speed 9165.01 samples/sec Loss 5.2627 LearningRate 0.0183 Epoch: 11 Global Step: 191060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:00,676-Speed 9485.44 samples/sec Loss 5.2868 LearningRate 0.0183 Epoch: 11 Global Step: 191070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:01,768-Speed 9380.53 samples/sec Loss 5.2782 LearningRate 0.0183 Epoch: 11 Global Step: 191080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:02,886-Speed 9164.43 samples/sec Loss 5.3503 LearningRate 0.0183 Epoch: 11 Global Step: 191090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:03,959-Speed 9543.46 samples/sec Loss 5.3418 LearningRate 0.0183 Epoch: 11 Global Step: 191100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:05,066-Speed 9257.23 samples/sec Loss 5.3560 LearningRate 0.0183 Epoch: 11 Global Step: 191110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:06,165-Speed 9327.70 samples/sec Loss 5.2957 LearningRate 0.0183 Epoch: 11 Global Step: 191120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:07,244-Speed 9489.74 samples/sec Loss 5.3684 LearningRate 0.0183 Epoch: 11 Global Step: 191130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:08,307-Speed 9638.94 samples/sec Loss 5.3203 LearningRate 0.0183 Epoch: 11 Global Step: 191140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:09,367-Speed 9669.03 samples/sec Loss 5.3671 LearningRate 0.0183 Epoch: 11 Global Step: 191150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:10,399-Speed 9930.62 samples/sec Loss 5.3635 LearningRate 0.0183 Epoch: 11 Global Step: 191160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:11,491-Speed 9385.81 samples/sec Loss 5.3402 LearningRate 0.0183 Epoch: 11 Global Step: 191170 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:15:12,598-Speed 9257.14 samples/sec Loss 5.3543 LearningRate 0.0183 Epoch: 11 Global Step: 191180 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:15:13,689-Speed 9391.70 samples/sec Loss 5.2437 LearningRate 0.0183 Epoch: 11 Global Step: 191190 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:15:14,740-Speed 9744.25 samples/sec Loss 5.3332 LearningRate 0.0183 Epoch: 11 Global Step: 191200 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:15:15,782-Speed 9839.93 samples/sec Loss 5.3840 LearningRate 0.0183 Epoch: 11 Global Step: 191210 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:15:16,849-Speed 9596.58 samples/sec Loss 5.1738 LearningRate 0.0182 Epoch: 11 Global Step: 191220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:17,942-Speed 9390.33 samples/sec Loss 5.2178 LearningRate 0.0182 Epoch: 11 Global Step: 191230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:19,033-Speed 9389.38 samples/sec Loss 5.3724 LearningRate 0.0182 Epoch: 11 Global Step: 191240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:20,090-Speed 9697.27 samples/sec Loss 5.2960 LearningRate 0.0182 Epoch: 11 Global Step: 191250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:21,208-Speed 9163.16 samples/sec Loss 5.3022 LearningRate 0.0182 Epoch: 11 Global Step: 191260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:22,314-Speed 9269.36 samples/sec Loss 5.2726 LearningRate 0.0182 Epoch: 11 Global Step: 191270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:23,388-Speed 9542.07 samples/sec Loss 5.3380 LearningRate 0.0182 Epoch: 11 Global Step: 191280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:24,514-Speed 9094.60 samples/sec Loss 5.2719 LearningRate 0.0182 Epoch: 11 Global Step: 191290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:25,615-Speed 9308.45 samples/sec Loss 5.3522 LearningRate 0.0182 Epoch: 11 Global Step: 191300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:26,693-Speed 9502.68 samples/sec Loss 5.3199 LearningRate 0.0182 Epoch: 11 Global Step: 191310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:27,805-Speed 9213.09 samples/sec Loss 5.2440 LearningRate 0.0182 Epoch: 11 Global Step: 191320 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:15:28,887-Speed 9471.42 samples/sec Loss 5.3157 LearningRate 0.0182 Epoch: 11 Global Step: 191330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:29,956-Speed 9590.18 samples/sec Loss 5.3752 LearningRate 0.0182 Epoch: 11 Global Step: 191340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:31,020-Speed 9632.92 samples/sec Loss 5.3124 LearningRate 0.0182 Epoch: 11 Global Step: 191350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:32,143-Speed 9122.60 samples/sec Loss 5.2294 LearningRate 0.0182 Epoch: 11 Global Step: 191360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:33,241-Speed 9334.98 samples/sec Loss 5.3882 LearningRate 0.0182 Epoch: 11 Global Step: 191370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:34,316-Speed 9531.97 samples/sec Loss 5.3240 LearningRate 0.0182 Epoch: 11 Global Step: 191380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:35,396-Speed 9484.91 samples/sec Loss 5.3311 LearningRate 0.0182 Epoch: 11 Global Step: 191390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:36,453-Speed 9688.17 samples/sec Loss 5.3760 LearningRate 0.0182 Epoch: 11 Global Step: 191400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:37,544-Speed 9388.58 samples/sec Loss 5.3344 LearningRate 0.0182 Epoch: 11 Global Step: 191410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:38,628-Speed 9459.16 samples/sec Loss 5.3525 LearningRate 0.0182 Epoch: 11 Global Step: 191420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:39,706-Speed 9501.82 samples/sec Loss 5.2809 LearningRate 0.0182 Epoch: 11 Global Step: 191430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:40,812-Speed 9263.71 samples/sec Loss 5.3830 LearningRate 0.0182 Epoch: 11 Global Step: 191440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:41,890-Speed 9510.35 samples/sec Loss 5.4146 LearningRate 0.0182 Epoch: 11 Global Step: 191450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:42,959-Speed 9581.65 samples/sec Loss 5.3499 LearningRate 0.0182 Epoch: 11 Global Step: 191460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:44,044-Speed 9444.46 samples/sec Loss 5.2652 LearningRate 0.0182 Epoch: 11 Global Step: 191470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:45,115-Speed 9566.28 samples/sec Loss 5.3790 LearningRate 0.0182 Epoch: 11 Global Step: 191480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:46,179-Speed 9628.73 samples/sec Loss 5.3272 LearningRate 0.0182 Epoch: 11 Global Step: 191490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:47,237-Speed 9683.16 samples/sec Loss 5.4226 LearningRate 0.0182 Epoch: 11 Global Step: 191500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:48,333-Speed 9352.22 samples/sec Loss 5.3644 LearningRate 0.0182 Epoch: 11 Global Step: 191510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:49,464-Speed 9056.99 samples/sec Loss 5.2831 LearningRate 0.0182 Epoch: 11 Global Step: 191520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:50,550-Speed 9440.19 samples/sec Loss 5.3913 LearningRate 0.0182 Epoch: 11 Global Step: 191530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:15:51,618-Speed 9590.97 samples/sec Loss 5.2899 LearningRate 0.0182 Epoch: 11 Global Step: 191540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:52,706-Speed 9416.10 samples/sec Loss 5.3544 LearningRate 0.0182 Epoch: 11 Global Step: 191550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:53,751-Speed 9810.84 samples/sec Loss 5.3003 LearningRate 0.0182 Epoch: 11 Global Step: 191560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:54,839-Speed 9417.91 samples/sec Loss 5.3840 LearningRate 0.0182 Epoch: 11 Global Step: 191570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:55,913-Speed 9537.06 samples/sec Loss 5.4249 LearningRate 0.0182 Epoch: 11 Global Step: 191580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:56,978-Speed 9620.38 samples/sec Loss 5.3205 LearningRate 0.0182 Epoch: 11 Global Step: 191590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:58,082-Speed 9284.06 samples/sec Loss 5.2844 LearningRate 0.0182 Epoch: 11 Global Step: 191600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:15:59,159-Speed 9513.24 samples/sec Loss 5.2482 LearningRate 0.0181 Epoch: 11 Global Step: 191610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:00,239-Speed 9481.82 samples/sec Loss 5.2203 LearningRate 0.0181 Epoch: 11 Global Step: 191620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:01,341-Speed 9303.31 samples/sec Loss 5.3102 LearningRate 0.0181 Epoch: 11 Global Step: 191630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:02,424-Speed 9459.97 samples/sec Loss 5.3041 LearningRate 0.0181 Epoch: 11 Global Step: 191640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:03,498-Speed 9536.23 samples/sec Loss 5.4088 LearningRate 0.0181 Epoch: 11 Global Step: 191650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:04,574-Speed 9523.95 samples/sec Loss 5.2871 LearningRate 0.0181 Epoch: 11 Global Step: 191660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:05,654-Speed 9489.65 samples/sec Loss 5.3009 LearningRate 0.0181 Epoch: 11 Global Step: 191670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:06,743-Speed 9407.81 samples/sec Loss 5.3276 LearningRate 0.0181 Epoch: 11 Global Step: 191680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:07,855-Speed 9213.98 samples/sec Loss 5.2352 LearningRate 0.0181 Epoch: 11 Global Step: 191690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:08,908-Speed 9735.59 samples/sec Loss 5.3844 LearningRate 0.0181 Epoch: 11 Global Step: 191700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:09,986-Speed 9508.07 samples/sec Loss 5.2869 LearningRate 0.0181 Epoch: 11 Global Step: 191710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:11,090-Speed 9282.16 samples/sec Loss 5.3891 LearningRate 0.0181 Epoch: 11 Global Step: 191720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:12,212-Speed 9130.09 samples/sec Loss 5.2683 LearningRate 0.0181 Epoch: 11 Global Step: 191730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:13,310-Speed 9328.88 samples/sec Loss 5.2802 LearningRate 0.0181 Epoch: 11 Global Step: 191740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:14,405-Speed 9357.10 samples/sec Loss 5.1986 LearningRate 0.0181 Epoch: 11 Global Step: 191750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:15,489-Speed 9457.40 samples/sec Loss 5.3372 LearningRate 0.0181 Epoch: 11 Global Step: 191760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:16,573-Speed 9446.49 samples/sec Loss 5.3119 LearningRate 0.0181 Epoch: 11 Global Step: 191770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:17,684-Speed 9230.66 samples/sec Loss 5.3379 LearningRate 0.0181 Epoch: 11 Global Step: 191780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:18,771-Speed 9433.03 samples/sec Loss 5.3305 LearningRate 0.0181 Epoch: 11 Global Step: 191790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:19,868-Speed 9339.40 samples/sec Loss 5.3308 LearningRate 0.0181 Epoch: 11 Global Step: 191800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:20,953-Speed 9437.67 samples/sec Loss 5.3020 LearningRate 0.0181 Epoch: 11 Global Step: 191810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:22,050-Speed 9345.40 samples/sec Loss 5.3663 LearningRate 0.0181 Epoch: 11 Global Step: 191820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:23,158-Speed 9248.70 samples/sec Loss 5.2797 LearningRate 0.0181 Epoch: 11 Global Step: 191830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:24,217-Speed 9680.63 samples/sec Loss 5.4519 LearningRate 0.0181 Epoch: 11 Global Step: 191840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:25,290-Speed 9544.91 samples/sec Loss 5.3555 LearningRate 0.0181 Epoch: 11 Global Step: 191850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:26,345-Speed 9711.51 samples/sec Loss 5.4943 LearningRate 0.0181 Epoch: 11 Global Step: 191860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:27,417-Speed 9562.24 samples/sec Loss 5.1945 LearningRate 0.0181 Epoch: 11 Global Step: 191870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:28,548-Speed 9066.51 samples/sec Loss 5.2661 LearningRate 0.0181 Epoch: 11 Global Step: 191880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:29,634-Speed 9432.05 samples/sec Loss 5.2655 LearningRate 0.0181 Epoch: 11 Global Step: 191890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:30,713-Speed 9491.38 samples/sec Loss 5.3083 LearningRate 0.0181 Epoch: 11 Global Step: 191900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:31,801-Speed 9417.23 samples/sec Loss 5.2777 LearningRate 0.0181 Epoch: 11 Global Step: 191910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:32,897-Speed 9349.58 samples/sec Loss 5.3146 LearningRate 0.0181 Epoch: 11 Global Step: 191920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:16:33,980-Speed 9459.41 samples/sec Loss 5.3309 LearningRate 0.0181 Epoch: 11 Global Step: 191930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:35,068-Speed 9415.58 samples/sec Loss 5.3824 LearningRate 0.0181 Epoch: 11 Global Step: 191940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:36,153-Speed 9444.22 samples/sec Loss 5.3621 LearningRate 0.0181 Epoch: 11 Global Step: 191950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:37,282-Speed 9081.26 samples/sec Loss 5.2586 LearningRate 0.0181 Epoch: 11 Global Step: 191960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:38,373-Speed 9388.96 samples/sec Loss 5.3996 LearningRate 0.0181 Epoch: 11 Global Step: 191970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:39,462-Speed 9404.30 samples/sec Loss 5.3694 LearningRate 0.0181 Epoch: 11 Global Step: 191980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:40,518-Speed 9714.01 samples/sec Loss 5.3176 LearningRate 0.0181 Epoch: 11 Global Step: 191990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:16:41,592-Speed 9532.20 samples/sec Loss 5.3646 LearningRate 0.0180 Epoch: 11 Global Step: 192000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:17:03,354-[lfw][192000]XNorm: 8.933503 Training: 2022-04-11 19:17:03,354-[lfw][192000]Accuracy-Flip: 0.99617+-0.00269 Training: 2022-04-11 19:17:03,355-[lfw][192000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:17:28,532-[cfp_fp][192000]XNorm: 7.654611 Training: 2022-04-11 19:17:28,533-[cfp_fp][192000]Accuracy-Flip: 0.96457+-0.00951 Training: 2022-04-11 19:17:28,533-[cfp_fp][192000]Accuracy-Highest: 0.96714 Training: 2022-04-11 19:17:50,241-[agedb_30][192000]XNorm: 8.630226 Training: 2022-04-11 19:17:50,242-[agedb_30][192000]Accuracy-Flip: 0.96733+-0.00863 Training: 2022-04-11 19:17:50,243-[agedb_30][192000]Accuracy-Highest: 0.96917 Training: 2022-04-11 19:17:51,327-Speed 146.84 samples/sec Loss 5.3863 LearningRate 0.0180 Epoch: 11 Global Step: 192010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:17:52,377-Speed 9754.60 samples/sec Loss 5.2680 LearningRate 0.0180 Epoch: 11 Global Step: 192020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:17:53,441-Speed 9624.16 samples/sec Loss 5.3513 LearningRate 0.0180 Epoch: 11 Global Step: 192030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:17:54,484-Speed 9828.64 samples/sec Loss 5.3244 LearningRate 0.0180 Epoch: 11 Global Step: 192040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:17:55,585-Speed 9304.13 samples/sec Loss 5.3055 LearningRate 0.0180 Epoch: 11 Global Step: 192050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:17:56,664-Speed 9495.56 samples/sec Loss 5.2687 LearningRate 0.0180 Epoch: 11 Global Step: 192060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:17:57,777-Speed 9208.61 samples/sec Loss 5.2554 LearningRate 0.0180 Epoch: 11 Global Step: 192070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:17:58,869-Speed 9383.40 samples/sec Loss 5.3774 LearningRate 0.0180 Epoch: 11 Global Step: 192080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:17:59,958-Speed 9406.90 samples/sec Loss 5.3235 LearningRate 0.0180 Epoch: 11 Global Step: 192090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:01,073-Speed 9191.42 samples/sec Loss 5.3135 LearningRate 0.0180 Epoch: 11 Global Step: 192100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:02,182-Speed 9234.52 samples/sec Loss 5.2761 LearningRate 0.0180 Epoch: 11 Global Step: 192110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:03,294-Speed 9220.11 samples/sec Loss 5.4014 LearningRate 0.0180 Epoch: 11 Global Step: 192120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:04,400-Speed 9264.33 samples/sec Loss 5.3180 LearningRate 0.0180 Epoch: 11 Global Step: 192130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:05,482-Speed 9466.96 samples/sec Loss 5.4617 LearningRate 0.0180 Epoch: 11 Global Step: 192140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:06,556-Speed 9543.52 samples/sec Loss 5.3350 LearningRate 0.0180 Epoch: 11 Global Step: 192150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:07,580-Speed 10004.87 samples/sec Loss 5.3899 LearningRate 0.0180 Epoch: 11 Global Step: 192160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:08,691-Speed 9224.76 samples/sec Loss 5.3480 LearningRate 0.0180 Epoch: 11 Global Step: 192170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:09,787-Speed 9347.26 samples/sec Loss 5.2993 LearningRate 0.0180 Epoch: 11 Global Step: 192180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:10,914-Speed 9089.90 samples/sec Loss 5.3740 LearningRate 0.0180 Epoch: 11 Global Step: 192190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:11,974-Speed 9669.17 samples/sec Loss 5.3223 LearningRate 0.0180 Epoch: 11 Global Step: 192200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:13,065-Speed 9394.82 samples/sec Loss 5.3975 LearningRate 0.0180 Epoch: 11 Global Step: 192210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:14,181-Speed 9176.99 samples/sec Loss 5.3242 LearningRate 0.0180 Epoch: 11 Global Step: 192220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:15,267-Speed 9441.46 samples/sec Loss 5.3362 LearningRate 0.0180 Epoch: 11 Global Step: 192230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:16,342-Speed 9525.12 samples/sec Loss 5.4185 LearningRate 0.0180 Epoch: 11 Global Step: 192240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:17,409-Speed 9603.31 samples/sec Loss 5.3456 LearningRate 0.0180 Epoch: 11 Global Step: 192250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:18,466-Speed 9689.85 samples/sec Loss 5.3414 LearningRate 0.0180 Epoch: 11 Global Step: 192260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:19,539-Speed 9553.16 samples/sec Loss 5.3279 LearningRate 0.0180 Epoch: 11 Global Step: 192270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:20,619-Speed 9488.92 samples/sec Loss 5.3146 LearningRate 0.0180 Epoch: 11 Global Step: 192280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:21,681-Speed 9640.79 samples/sec Loss 5.4189 LearningRate 0.0180 Epoch: 11 Global Step: 192290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:22,790-Speed 9249.01 samples/sec Loss 5.3884 LearningRate 0.0180 Epoch: 11 Global Step: 192300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:23,856-Speed 9607.02 samples/sec Loss 5.3051 LearningRate 0.0180 Epoch: 11 Global Step: 192310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:24,954-Speed 9329.27 samples/sec Loss 5.3472 LearningRate 0.0180 Epoch: 11 Global Step: 192320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:26,029-Speed 9529.70 samples/sec Loss 5.4177 LearningRate 0.0180 Epoch: 11 Global Step: 192330 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:18:27,112-Speed 9461.77 samples/sec Loss 5.4341 LearningRate 0.0180 Epoch: 11 Global Step: 192340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:28,169-Speed 9693.91 samples/sec Loss 5.2477 LearningRate 0.0180 Epoch: 11 Global Step: 192350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:29,263-Speed 9366.65 samples/sec Loss 5.2716 LearningRate 0.0180 Epoch: 11 Global Step: 192360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:30,325-Speed 9646.24 samples/sec Loss 5.3689 LearningRate 0.0180 Epoch: 11 Global Step: 192370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:31,423-Speed 9334.04 samples/sec Loss 5.3502 LearningRate 0.0180 Epoch: 11 Global Step: 192380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:32,533-Speed 9236.39 samples/sec Loss 5.3847 LearningRate 0.0179 Epoch: 11 Global Step: 192390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:33,665-Speed 9054.45 samples/sec Loss 5.3093 LearningRate 0.0179 Epoch: 11 Global Step: 192400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:34,759-Speed 9362.96 samples/sec Loss 5.4060 LearningRate 0.0179 Epoch: 11 Global Step: 192410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:35,807-Speed 9771.08 samples/sec Loss 5.3398 LearningRate 0.0179 Epoch: 11 Global Step: 192420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:36,902-Speed 9362.12 samples/sec Loss 5.2411 LearningRate 0.0179 Epoch: 11 Global Step: 192430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:38,000-Speed 9327.26 samples/sec Loss 5.3433 LearningRate 0.0179 Epoch: 11 Global Step: 192440 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:18:39,105-Speed 9271.68 samples/sec Loss 5.2754 LearningRate 0.0179 Epoch: 11 Global Step: 192450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:40,169-Speed 9638.94 samples/sec Loss 5.3091 LearningRate 0.0179 Epoch: 11 Global Step: 192460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:41,268-Speed 9322.48 samples/sec Loss 5.4040 LearningRate 0.0179 Epoch: 11 Global Step: 192470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:42,365-Speed 9334.51 samples/sec Loss 5.3491 LearningRate 0.0179 Epoch: 11 Global Step: 192480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:43,463-Speed 9336.77 samples/sec Loss 5.2475 LearningRate 0.0179 Epoch: 11 Global Step: 192490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:44,548-Speed 9441.04 samples/sec Loss 5.3496 LearningRate 0.0179 Epoch: 11 Global Step: 192500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:45,611-Speed 9636.79 samples/sec Loss 5.3466 LearningRate 0.0179 Epoch: 11 Global Step: 192510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:46,728-Speed 9175.41 samples/sec Loss 5.3935 LearningRate 0.0179 Epoch: 11 Global Step: 192520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:47,856-Speed 9078.83 samples/sec Loss 5.4080 LearningRate 0.0179 Epoch: 11 Global Step: 192530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:48,958-Speed 9302.94 samples/sec Loss 5.2791 LearningRate 0.0179 Epoch: 11 Global Step: 192540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:50,045-Speed 9424.49 samples/sec Loss 5.3891 LearningRate 0.0179 Epoch: 11 Global Step: 192550 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:18:51,104-Speed 9674.64 samples/sec Loss 5.2669 LearningRate 0.0179 Epoch: 11 Global Step: 192560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:52,177-Speed 9549.74 samples/sec Loss 5.3302 LearningRate 0.0179 Epoch: 11 Global Step: 192570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:53,295-Speed 9167.21 samples/sec Loss 5.2674 LearningRate 0.0179 Epoch: 11 Global Step: 192580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:54,358-Speed 9638.54 samples/sec Loss 5.3898 LearningRate 0.0179 Epoch: 11 Global Step: 192590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:55,458-Speed 9319.90 samples/sec Loss 5.4121 LearningRate 0.0179 Epoch: 11 Global Step: 192600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:56,541-Speed 9460.97 samples/sec Loss 5.3457 LearningRate 0.0179 Epoch: 11 Global Step: 192610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:57,611-Speed 9574.23 samples/sec Loss 5.2491 LearningRate 0.0179 Epoch: 11 Global Step: 192620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:58,676-Speed 9619.08 samples/sec Loss 5.3550 LearningRate 0.0179 Epoch: 11 Global Step: 192630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:18:59,759-Speed 9461.73 samples/sec Loss 5.3009 LearningRate 0.0179 Epoch: 11 Global Step: 192640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:00,832-Speed 9550.61 samples/sec Loss 5.2979 LearningRate 0.0179 Epoch: 11 Global Step: 192650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:01,906-Speed 9540.83 samples/sec Loss 5.3693 LearningRate 0.0179 Epoch: 11 Global Step: 192660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:03,056-Speed 8907.83 samples/sec Loss 5.3801 LearningRate 0.0179 Epoch: 11 Global Step: 192670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:04,221-Speed 8797.44 samples/sec Loss 5.2786 LearningRate 0.0179 Epoch: 11 Global Step: 192680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:05,291-Speed 9580.76 samples/sec Loss 5.3172 LearningRate 0.0179 Epoch: 11 Global Step: 192690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:06,330-Speed 9863.87 samples/sec Loss 5.3615 LearningRate 0.0179 Epoch: 11 Global Step: 192700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:07,400-Speed 9572.51 samples/sec Loss 5.3296 LearningRate 0.0179 Epoch: 11 Global Step: 192710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:08,487-Speed 9426.38 samples/sec Loss 5.3271 LearningRate 0.0179 Epoch: 11 Global Step: 192720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:09,549-Speed 9646.97 samples/sec Loss 5.3706 LearningRate 0.0179 Epoch: 11 Global Step: 192730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:10,629-Speed 9491.46 samples/sec Loss 5.3100 LearningRate 0.0179 Epoch: 11 Global Step: 192740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:11,712-Speed 9458.59 samples/sec Loss 5.3721 LearningRate 0.0179 Epoch: 11 Global Step: 192750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:12,793-Speed 9478.70 samples/sec Loss 5.3657 LearningRate 0.0179 Epoch: 11 Global Step: 192760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:13,902-Speed 9236.25 samples/sec Loss 5.2774 LearningRate 0.0179 Epoch: 11 Global Step: 192770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:19:14,972-Speed 9575.07 samples/sec Loss 5.4543 LearningRate 0.0179 Epoch: 11 Global Step: 192780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:19:16,082-Speed 9232.55 samples/sec Loss 5.2903 LearningRate 0.0178 Epoch: 11 Global Step: 192790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:19:17,172-Speed 9405.33 samples/sec Loss 5.2450 LearningRate 0.0178 Epoch: 11 Global Step: 192800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:19:18,276-Speed 9274.34 samples/sec Loss 5.2898 LearningRate 0.0178 Epoch: 11 Global Step: 192810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:19:19,366-Speed 9399.38 samples/sec Loss 5.3429 LearningRate 0.0178 Epoch: 11 Global Step: 192820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:19:20,465-Speed 9326.86 samples/sec Loss 5.4385 LearningRate 0.0178 Epoch: 11 Global Step: 192830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:19:21,548-Speed 9462.77 samples/sec Loss 5.4323 LearningRate 0.0178 Epoch: 11 Global Step: 192840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:19:22,639-Speed 9415.19 samples/sec Loss 5.2882 LearningRate 0.0178 Epoch: 11 Global Step: 192850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:19:23,733-Speed 9364.26 samples/sec Loss 5.3096 LearningRate 0.0178 Epoch: 11 Global Step: 192860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:19:24,804-Speed 9567.29 samples/sec Loss 5.4348 LearningRate 0.0178 Epoch: 11 Global Step: 192870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:25,900-Speed 9342.12 samples/sec Loss 5.4375 LearningRate 0.0178 Epoch: 11 Global Step: 192880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:26,964-Speed 9630.38 samples/sec Loss 5.3322 LearningRate 0.0178 Epoch: 11 Global Step: 192890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:28,070-Speed 9261.22 samples/sec Loss 5.3895 LearningRate 0.0178 Epoch: 11 Global Step: 192900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:29,167-Speed 9347.49 samples/sec Loss 5.4314 LearningRate 0.0178 Epoch: 11 Global Step: 192910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:30,239-Speed 9554.99 samples/sec Loss 5.2926 LearningRate 0.0178 Epoch: 11 Global Step: 192920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:31,315-Speed 9520.11 samples/sec Loss 5.2957 LearningRate 0.0178 Epoch: 11 Global Step: 192930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:32,400-Speed 9444.39 samples/sec Loss 5.4375 LearningRate 0.0178 Epoch: 11 Global Step: 192940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:33,502-Speed 9302.65 samples/sec Loss 5.3463 LearningRate 0.0178 Epoch: 11 Global Step: 192950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:34,588-Speed 9434.93 samples/sec Loss 5.3773 LearningRate 0.0178 Epoch: 11 Global Step: 192960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:35,645-Speed 9695.09 samples/sec Loss 5.2735 LearningRate 0.0178 Epoch: 11 Global Step: 192970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:36,753-Speed 9245.90 samples/sec Loss 5.2984 LearningRate 0.0178 Epoch: 11 Global Step: 192980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:37,862-Speed 9235.32 samples/sec Loss 5.3067 LearningRate 0.0178 Epoch: 11 Global Step: 192990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:38,948-Speed 9436.78 samples/sec Loss 5.3352 LearningRate 0.0178 Epoch: 11 Global Step: 193000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:40,017-Speed 9591.01 samples/sec Loss 5.3855 LearningRate 0.0178 Epoch: 11 Global Step: 193010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:41,070-Speed 9729.44 samples/sec Loss 5.3068 LearningRate 0.0178 Epoch: 11 Global Step: 193020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:42,156-Speed 9440.18 samples/sec Loss 5.3350 LearningRate 0.0178 Epoch: 11 Global Step: 193030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:43,231-Speed 9533.34 samples/sec Loss 5.3217 LearningRate 0.0178 Epoch: 11 Global Step: 193040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:44,310-Speed 9492.43 samples/sec Loss 5.2713 LearningRate 0.0178 Epoch: 11 Global Step: 193050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:45,385-Speed 9529.96 samples/sec Loss 5.3596 LearningRate 0.0178 Epoch: 11 Global Step: 193060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:46,470-Speed 9440.14 samples/sec Loss 5.3755 LearningRate 0.0178 Epoch: 11 Global Step: 193070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:19:47,561-Speed 9394.80 samples/sec Loss 5.2476 LearningRate 0.0178 Epoch: 11 Global Step: 193080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:48,606-Speed 9799.91 samples/sec Loss 5.3291 LearningRate 0.0178 Epoch: 11 Global Step: 193090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:49,687-Speed 9478.31 samples/sec Loss 5.2749 LearningRate 0.0178 Epoch: 11 Global Step: 193100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:50,772-Speed 9446.02 samples/sec Loss 5.3939 LearningRate 0.0178 Epoch: 11 Global Step: 193110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:51,821-Speed 9763.12 samples/sec Loss 5.3953 LearningRate 0.0178 Epoch: 11 Global Step: 193120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:52,883-Speed 9660.30 samples/sec Loss 5.3935 LearningRate 0.0178 Epoch: 11 Global Step: 193130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:53,952-Speed 9578.15 samples/sec Loss 5.3320 LearningRate 0.0178 Epoch: 11 Global Step: 193140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:55,023-Speed 9568.83 samples/sec Loss 5.3811 LearningRate 0.0178 Epoch: 11 Global Step: 193150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:56,094-Speed 9564.55 samples/sec Loss 5.4635 LearningRate 0.0178 Epoch: 11 Global Step: 193160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:57,178-Speed 9454.88 samples/sec Loss 5.2640 LearningRate 0.0178 Epoch: 11 Global Step: 193170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:19:58,277-Speed 9324.46 samples/sec Loss 5.3458 LearningRate 0.0177 Epoch: 11 Global Step: 193180 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:19:59,352-Speed 9533.85 samples/sec Loss 5.2842 LearningRate 0.0177 Epoch: 11 Global Step: 193190 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:20:00,420-Speed 9590.09 samples/sec Loss 5.4196 LearningRate 0.0177 Epoch: 11 Global Step: 193200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:01,518-Speed 9337.24 samples/sec Loss 5.3622 LearningRate 0.0177 Epoch: 11 Global Step: 193210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:02,676-Speed 8842.94 samples/sec Loss 5.2710 LearningRate 0.0177 Epoch: 11 Global Step: 193220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:03,750-Speed 9545.97 samples/sec Loss 5.2965 LearningRate 0.0177 Epoch: 11 Global Step: 193230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:04,838-Speed 9420.63 samples/sec Loss 5.4114 LearningRate 0.0177 Epoch: 11 Global Step: 193240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:05,908-Speed 9569.53 samples/sec Loss 5.2705 LearningRate 0.0177 Epoch: 11 Global Step: 193250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:06,981-Speed 9547.30 samples/sec Loss 5.3459 LearningRate 0.0177 Epoch: 11 Global Step: 193260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:08,071-Speed 9399.32 samples/sec Loss 5.3098 LearningRate 0.0177 Epoch: 11 Global Step: 193270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:09,137-Speed 9618.09 samples/sec Loss 5.3160 LearningRate 0.0177 Epoch: 11 Global Step: 193280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:10,249-Speed 9216.29 samples/sec Loss 5.2802 LearningRate 0.0177 Epoch: 11 Global Step: 193290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:11,332-Speed 9453.89 samples/sec Loss 5.2830 LearningRate 0.0177 Epoch: 11 Global Step: 193300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:12,434-Speed 9304.11 samples/sec Loss 5.3595 LearningRate 0.0177 Epoch: 11 Global Step: 193310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:13,511-Speed 9506.94 samples/sec Loss 5.3567 LearningRate 0.0177 Epoch: 11 Global Step: 193320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:14,628-Speed 9173.35 samples/sec Loss 5.3169 LearningRate 0.0177 Epoch: 11 Global Step: 193330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:15,714-Speed 9434.70 samples/sec Loss 5.3274 LearningRate 0.0177 Epoch: 11 Global Step: 193340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:16,835-Speed 9144.13 samples/sec Loss 5.3251 LearningRate 0.0177 Epoch: 11 Global Step: 193350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:17,921-Speed 9433.46 samples/sec Loss 5.3868 LearningRate 0.0177 Epoch: 11 Global Step: 193360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:18,995-Speed 9544.93 samples/sec Loss 5.2788 LearningRate 0.0177 Epoch: 11 Global Step: 193370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:20,062-Speed 9593.67 samples/sec Loss 5.2986 LearningRate 0.0177 Epoch: 11 Global Step: 193380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:21,131-Speed 9587.06 samples/sec Loss 5.3395 LearningRate 0.0177 Epoch: 11 Global Step: 193390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:22,245-Speed 9202.31 samples/sec Loss 5.2933 LearningRate 0.0177 Epoch: 11 Global Step: 193400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:23,350-Speed 9271.04 samples/sec Loss 5.2925 LearningRate 0.0177 Epoch: 11 Global Step: 193410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:20:24,423-Speed 9555.03 samples/sec Loss 5.3003 LearningRate 0.0177 Epoch: 11 Global Step: 193420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:20:25,527-Speed 9276.06 samples/sec Loss 5.3444 LearningRate 0.0177 Epoch: 11 Global Step: 193430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:20:26,598-Speed 9564.76 samples/sec Loss 5.3252 LearningRate 0.0177 Epoch: 11 Global Step: 193440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:20:27,659-Speed 9661.70 samples/sec Loss 5.3631 LearningRate 0.0177 Epoch: 11 Global Step: 193450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:20:28,781-Speed 9130.98 samples/sec Loss 5.3049 LearningRate 0.0177 Epoch: 11 Global Step: 193460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:20:29,868-Speed 9422.61 samples/sec Loss 5.3501 LearningRate 0.0177 Epoch: 11 Global Step: 193470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:20:30,971-Speed 9287.64 samples/sec Loss 5.3815 LearningRate 0.0177 Epoch: 11 Global Step: 193480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:20:32,094-Speed 9129.19 samples/sec Loss 5.3314 LearningRate 0.0177 Epoch: 11 Global Step: 193490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:20:33,239-Speed 8952.49 samples/sec Loss 5.3252 LearningRate 0.0177 Epoch: 11 Global Step: 193500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:20:34,303-Speed 9628.31 samples/sec Loss 5.2744 LearningRate 0.0177 Epoch: 11 Global Step: 193510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:35,404-Speed 9305.83 samples/sec Loss 5.3026 LearningRate 0.0177 Epoch: 11 Global Step: 193520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:36,489-Speed 9443.46 samples/sec Loss 5.4309 LearningRate 0.0177 Epoch: 11 Global Step: 193530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:37,592-Speed 9291.80 samples/sec Loss 5.3200 LearningRate 0.0177 Epoch: 11 Global Step: 193540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:38,679-Speed 9426.00 samples/sec Loss 5.2606 LearningRate 0.0177 Epoch: 11 Global Step: 193550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:39,756-Speed 9518.39 samples/sec Loss 5.3391 LearningRate 0.0177 Epoch: 11 Global Step: 193560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:40,829-Speed 9551.73 samples/sec Loss 5.5056 LearningRate 0.0177 Epoch: 11 Global Step: 193570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:41,932-Speed 9284.32 samples/sec Loss 5.3241 LearningRate 0.0176 Epoch: 11 Global Step: 193580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:43,021-Speed 9408.91 samples/sec Loss 5.3676 LearningRate 0.0176 Epoch: 11 Global Step: 193590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:44,171-Speed 8908.20 samples/sec Loss 5.3777 LearningRate 0.0176 Epoch: 11 Global Step: 193600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:45,258-Speed 9430.72 samples/sec Loss 5.3890 LearningRate 0.0176 Epoch: 11 Global Step: 193610 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:20:46,321-Speed 9640.51 samples/sec Loss 5.2703 LearningRate 0.0176 Epoch: 11 Global Step: 193620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:47,411-Speed 9404.08 samples/sec Loss 5.3708 LearningRate 0.0176 Epoch: 11 Global Step: 193630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:48,542-Speed 9055.44 samples/sec Loss 5.2796 LearningRate 0.0176 Epoch: 11 Global Step: 193640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:49,574-Speed 9934.96 samples/sec Loss 5.3136 LearningRate 0.0176 Epoch: 11 Global Step: 193650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:50,668-Speed 9363.46 samples/sec Loss 5.3854 LearningRate 0.0176 Epoch: 11 Global Step: 193660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:51,755-Speed 9425.97 samples/sec Loss 5.3826 LearningRate 0.0176 Epoch: 11 Global Step: 193670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:52,818-Speed 9636.83 samples/sec Loss 5.3558 LearningRate 0.0176 Epoch: 11 Global Step: 193680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:53,906-Speed 9418.22 samples/sec Loss 5.3232 LearningRate 0.0176 Epoch: 11 Global Step: 193690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:54,991-Speed 9448.72 samples/sec Loss 5.3178 LearningRate 0.0176 Epoch: 11 Global Step: 193700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:56,065-Speed 9544.64 samples/sec Loss 5.3955 LearningRate 0.0176 Epoch: 11 Global Step: 193710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:57,147-Speed 9470.74 samples/sec Loss 5.2914 LearningRate 0.0176 Epoch: 11 Global Step: 193720 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:20:58,246-Speed 9325.38 samples/sec Loss 5.3142 LearningRate 0.0176 Epoch: 11 Global Step: 193730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:20:59,319-Speed 9550.80 samples/sec Loss 5.4162 LearningRate 0.0176 Epoch: 11 Global Step: 193740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:00,423-Speed 9282.77 samples/sec Loss 5.2799 LearningRate 0.0176 Epoch: 11 Global Step: 193750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:01,538-Speed 9186.88 samples/sec Loss 5.3214 LearningRate 0.0176 Epoch: 11 Global Step: 193760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:02,629-Speed 9385.65 samples/sec Loss 5.2485 LearningRate 0.0176 Epoch: 11 Global Step: 193770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:03,733-Speed 9287.65 samples/sec Loss 5.3417 LearningRate 0.0176 Epoch: 11 Global Step: 193780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:04,791-Speed 9686.74 samples/sec Loss 5.3564 LearningRate 0.0176 Epoch: 11 Global Step: 193790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:05,858-Speed 9595.29 samples/sec Loss 5.3326 LearningRate 0.0176 Epoch: 11 Global Step: 193800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:06,912-Speed 9726.20 samples/sec Loss 5.3913 LearningRate 0.0176 Epoch: 11 Global Step: 193810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:08,002-Speed 9395.16 samples/sec Loss 5.2558 LearningRate 0.0176 Epoch: 11 Global Step: 193820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:09,068-Speed 9617.33 samples/sec Loss 5.3064 LearningRate 0.0176 Epoch: 11 Global Step: 193830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:10,134-Speed 9613.36 samples/sec Loss 5.2773 LearningRate 0.0176 Epoch: 11 Global Step: 193840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:11,196-Speed 9649.35 samples/sec Loss 5.3522 LearningRate 0.0176 Epoch: 11 Global Step: 193850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:12,320-Speed 9111.26 samples/sec Loss 5.2298 LearningRate 0.0176 Epoch: 11 Global Step: 193860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:13,412-Speed 9383.49 samples/sec Loss 5.2720 LearningRate 0.0176 Epoch: 11 Global Step: 193870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:14,506-Speed 9366.40 samples/sec Loss 5.2534 LearningRate 0.0176 Epoch: 11 Global Step: 193880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:15,576-Speed 9582.30 samples/sec Loss 5.4330 LearningRate 0.0176 Epoch: 11 Global Step: 193890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:16,648-Speed 9554.50 samples/sec Loss 5.2794 LearningRate 0.0176 Epoch: 11 Global Step: 193900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:17,737-Speed 9405.21 samples/sec Loss 5.2557 LearningRate 0.0176 Epoch: 11 Global Step: 193910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:18,803-Speed 9612.91 samples/sec Loss 5.3919 LearningRate 0.0176 Epoch: 11 Global Step: 193920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:19,918-Speed 9190.86 samples/sec Loss 5.3759 LearningRate 0.0176 Epoch: 11 Global Step: 193930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:20,965-Speed 9783.24 samples/sec Loss 5.3605 LearningRate 0.0176 Epoch: 11 Global Step: 193940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:22,046-Speed 9482.97 samples/sec Loss 5.3298 LearningRate 0.0176 Epoch: 11 Global Step: 193950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:23,132-Speed 9435.93 samples/sec Loss 5.3590 LearningRate 0.0176 Epoch: 11 Global Step: 193960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:24,217-Speed 9440.15 samples/sec Loss 5.2786 LearningRate 0.0176 Epoch: 11 Global Step: 193970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:25,270-Speed 9733.10 samples/sec Loss 5.4155 LearningRate 0.0175 Epoch: 11 Global Step: 193980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:26,373-Speed 9286.44 samples/sec Loss 5.4080 LearningRate 0.0175 Epoch: 11 Global Step: 193990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:27,477-Speed 9285.59 samples/sec Loss 5.3456 LearningRate 0.0175 Epoch: 11 Global Step: 194000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:21:49,420-[lfw][194000]XNorm: 8.972369 Training: 2022-04-11 19:21:49,421-[lfw][194000]Accuracy-Flip: 0.99650+-0.00283 Training: 2022-04-11 19:21:49,421-[lfw][194000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:22:14,791-[cfp_fp][194000]XNorm: 7.656675 Training: 2022-04-11 19:22:14,792-[cfp_fp][194000]Accuracy-Flip: 0.96457+-0.00514 Training: 2022-04-11 19:22:14,792-[cfp_fp][194000]Accuracy-Highest: 0.96714 Training: 2022-04-11 19:22:36,647-[agedb_30][194000]XNorm: 8.662819 Training: 2022-04-11 19:22:36,648-[agedb_30][194000]Accuracy-Flip: 0.96917+-0.00768 Training: 2022-04-11 19:22:36,648-[agedb_30][194000]Accuracy-Highest: 0.96917 Training: 2022-04-11 19:22:37,725-Speed 145.77 samples/sec Loss 5.3366 LearningRate 0.0175 Epoch: 11 Global Step: 194010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:38,812-Speed 9430.35 samples/sec Loss 5.2909 LearningRate 0.0175 Epoch: 11 Global Step: 194020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:39,887-Speed 9527.31 samples/sec Loss 5.2799 LearningRate 0.0175 Epoch: 11 Global Step: 194030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:40,973-Speed 9437.29 samples/sec Loss 5.3817 LearningRate 0.0175 Epoch: 11 Global Step: 194040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:42,016-Speed 9824.82 samples/sec Loss 5.3158 LearningRate 0.0175 Epoch: 11 Global Step: 194050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:43,058-Speed 9826.66 samples/sec Loss 5.3416 LearningRate 0.0175 Epoch: 11 Global Step: 194060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:44,156-Speed 9334.93 samples/sec Loss 5.3021 LearningRate 0.0175 Epoch: 11 Global Step: 194070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:45,241-Speed 9440.82 samples/sec Loss 5.4085 LearningRate 0.0175 Epoch: 11 Global Step: 194080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:46,327-Speed 9437.12 samples/sec Loss 5.1977 LearningRate 0.0175 Epoch: 11 Global Step: 194090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:47,398-Speed 9564.35 samples/sec Loss 5.3532 LearningRate 0.0175 Epoch: 11 Global Step: 194100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:48,483-Speed 9447.42 samples/sec Loss 5.3133 LearningRate 0.0175 Epoch: 11 Global Step: 194110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:49,515-Speed 9928.34 samples/sec Loss 5.2474 LearningRate 0.0175 Epoch: 11 Global Step: 194120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:50,597-Speed 9466.70 samples/sec Loss 5.2909 LearningRate 0.0175 Epoch: 11 Global Step: 194130 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:22:51,669-Speed 9564.69 samples/sec Loss 5.3413 LearningRate 0.0175 Epoch: 11 Global Step: 194140 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:22:52,729-Speed 9660.98 samples/sec Loss 5.2414 LearningRate 0.0175 Epoch: 11 Global Step: 194150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:53,815-Speed 9436.94 samples/sec Loss 5.2773 LearningRate 0.0175 Epoch: 11 Global Step: 194160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:54,921-Speed 9260.97 samples/sec Loss 5.3599 LearningRate 0.0175 Epoch: 11 Global Step: 194170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:56,013-Speed 9389.03 samples/sec Loss 5.3775 LearningRate 0.0175 Epoch: 11 Global Step: 194180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:57,112-Speed 9320.34 samples/sec Loss 5.3692 LearningRate 0.0175 Epoch: 11 Global Step: 194190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:58,204-Speed 9381.93 samples/sec Loss 5.4333 LearningRate 0.0175 Epoch: 11 Global Step: 194200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:22:59,306-Speed 9298.93 samples/sec Loss 5.3229 LearningRate 0.0175 Epoch: 11 Global Step: 194210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:00,397-Speed 9387.17 samples/sec Loss 5.3339 LearningRate 0.0175 Epoch: 11 Global Step: 194220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:01,475-Speed 9511.90 samples/sec Loss 5.2864 LearningRate 0.0175 Epoch: 11 Global Step: 194230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:02,557-Speed 9472.73 samples/sec Loss 5.3168 LearningRate 0.0175 Epoch: 11 Global Step: 194240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:03,663-Speed 9265.98 samples/sec Loss 5.2960 LearningRate 0.0175 Epoch: 11 Global Step: 194250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:04,743-Speed 9491.69 samples/sec Loss 5.3560 LearningRate 0.0175 Epoch: 11 Global Step: 194260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:05,865-Speed 9130.83 samples/sec Loss 5.2549 LearningRate 0.0175 Epoch: 11 Global Step: 194270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:06,944-Speed 9490.07 samples/sec Loss 5.2016 LearningRate 0.0175 Epoch: 11 Global Step: 194280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:08,031-Speed 9430.44 samples/sec Loss 5.2935 LearningRate 0.0175 Epoch: 11 Global Step: 194290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:09,192-Speed 8824.99 samples/sec Loss 5.4205 LearningRate 0.0175 Epoch: 11 Global Step: 194300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:10,255-Speed 9640.70 samples/sec Loss 5.2492 LearningRate 0.0175 Epoch: 11 Global Step: 194310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:11,360-Speed 9271.12 samples/sec Loss 5.4284 LearningRate 0.0175 Epoch: 11 Global Step: 194320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:12,468-Speed 9251.23 samples/sec Loss 5.3505 LearningRate 0.0175 Epoch: 11 Global Step: 194330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:13,554-Speed 9427.71 samples/sec Loss 5.3615 LearningRate 0.0175 Epoch: 11 Global Step: 194340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:14,643-Speed 9409.77 samples/sec Loss 5.3953 LearningRate 0.0175 Epoch: 11 Global Step: 194350 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:23:15,716-Speed 9551.87 samples/sec Loss 5.3451 LearningRate 0.0175 Epoch: 11 Global Step: 194360 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:23:16,802-Speed 9428.55 samples/sec Loss 5.2623 LearningRate 0.0175 Epoch: 11 Global Step: 194370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:23:17,957-Speed 8872.97 samples/sec Loss 5.3044 LearningRate 0.0174 Epoch: 11 Global Step: 194380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:19,038-Speed 9498.30 samples/sec Loss 5.3682 LearningRate 0.0174 Epoch: 11 Global Step: 194390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:20,095-Speed 9696.10 samples/sec Loss 5.3055 LearningRate 0.0174 Epoch: 11 Global Step: 194400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:21,153-Speed 9686.79 samples/sec Loss 5.3917 LearningRate 0.0174 Epoch: 11 Global Step: 194410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:22,219-Speed 9607.43 samples/sec Loss 5.4210 LearningRate 0.0174 Epoch: 11 Global Step: 194420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:23,325-Speed 9267.21 samples/sec Loss 5.3116 LearningRate 0.0174 Epoch: 11 Global Step: 194430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:24,419-Speed 9366.04 samples/sec Loss 5.3041 LearningRate 0.0174 Epoch: 11 Global Step: 194440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:25,457-Speed 9872.73 samples/sec Loss 5.3951 LearningRate 0.0174 Epoch: 11 Global Step: 194450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:26,533-Speed 9519.90 samples/sec Loss 5.3668 LearningRate 0.0174 Epoch: 11 Global Step: 194460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:27,587-Speed 9726.12 samples/sec Loss 5.4180 LearningRate 0.0174 Epoch: 11 Global Step: 194470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:28,686-Speed 9319.98 samples/sec Loss 5.3128 LearningRate 0.0174 Epoch: 11 Global Step: 194480 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:23:29,770-Speed 9454.60 samples/sec Loss 5.2584 LearningRate 0.0174 Epoch: 11 Global Step: 194490 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:23:30,850-Speed 9479.85 samples/sec Loss 5.2719 LearningRate 0.0174 Epoch: 11 Global Step: 194500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:31,909-Speed 9680.18 samples/sec Loss 5.3540 LearningRate 0.0174 Epoch: 11 Global Step: 194510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:33,009-Speed 9310.32 samples/sec Loss 5.4131 LearningRate 0.0174 Epoch: 11 Global Step: 194520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:34,080-Speed 9570.59 samples/sec Loss 5.3441 LearningRate 0.0174 Epoch: 11 Global Step: 194530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:35,174-Speed 9367.23 samples/sec Loss 5.3601 LearningRate 0.0174 Epoch: 11 Global Step: 194540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:36,259-Speed 9438.28 samples/sec Loss 5.4718 LearningRate 0.0174 Epoch: 11 Global Step: 194550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:37,365-Speed 9266.73 samples/sec Loss 5.2953 LearningRate 0.0174 Epoch: 11 Global Step: 194560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:38,448-Speed 9464.36 samples/sec Loss 5.3395 LearningRate 0.0174 Epoch: 11 Global Step: 194570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:39,549-Speed 9303.43 samples/sec Loss 5.4915 LearningRate 0.0174 Epoch: 11 Global Step: 194580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:40,609-Speed 9672.14 samples/sec Loss 5.3562 LearningRate 0.0174 Epoch: 11 Global Step: 194590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:41,679-Speed 9572.27 samples/sec Loss 5.3021 LearningRate 0.0174 Epoch: 11 Global Step: 194600 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:23:42,749-Speed 9574.92 samples/sec Loss 5.3188 LearningRate 0.0174 Epoch: 11 Global Step: 194610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:43,832-Speed 9464.85 samples/sec Loss 5.3548 LearningRate 0.0174 Epoch: 11 Global Step: 194620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:44,942-Speed 9228.00 samples/sec Loss 5.3393 LearningRate 0.0174 Epoch: 11 Global Step: 194630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:46,057-Speed 9186.73 samples/sec Loss 5.3449 LearningRate 0.0174 Epoch: 11 Global Step: 194640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:47,113-Speed 9704.48 samples/sec Loss 5.2528 LearningRate 0.0174 Epoch: 11 Global Step: 194650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:48,218-Speed 9270.86 samples/sec Loss 5.2933 LearningRate 0.0174 Epoch: 11 Global Step: 194660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:49,302-Speed 9450.85 samples/sec Loss 5.3762 LearningRate 0.0174 Epoch: 11 Global Step: 194670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:50,388-Speed 9435.43 samples/sec Loss 5.3290 LearningRate 0.0174 Epoch: 11 Global Step: 194680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:51,461-Speed 9554.78 samples/sec Loss 5.3173 LearningRate 0.0174 Epoch: 11 Global Step: 194690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:52,543-Speed 9464.77 samples/sec Loss 5.3289 LearningRate 0.0174 Epoch: 11 Global Step: 194700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:53,659-Speed 9177.46 samples/sec Loss 5.2720 LearningRate 0.0174 Epoch: 11 Global Step: 194710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:54,709-Speed 9762.33 samples/sec Loss 5.3691 LearningRate 0.0174 Epoch: 11 Global Step: 194720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:55,816-Speed 9255.95 samples/sec Loss 5.4574 LearningRate 0.0174 Epoch: 11 Global Step: 194730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:23:56,891-Speed 9535.13 samples/sec Loss 5.3782 LearningRate 0.0174 Epoch: 11 Global Step: 194740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:23:58,039-Speed 8925.42 samples/sec Loss 5.3010 LearningRate 0.0174 Epoch: 11 Global Step: 194750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:23:59,166-Speed 9090.68 samples/sec Loss 5.3153 LearningRate 0.0174 Epoch: 11 Global Step: 194760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:24:00,236-Speed 9580.42 samples/sec Loss 5.3414 LearningRate 0.0174 Epoch: 11 Global Step: 194770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:24:01,297-Speed 9651.06 samples/sec Loss 5.4597 LearningRate 0.0173 Epoch: 11 Global Step: 194780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:24:02,403-Speed 9271.01 samples/sec Loss 5.3203 LearningRate 0.0173 Epoch: 11 Global Step: 194790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:24:03,501-Speed 9327.16 samples/sec Loss 5.3076 LearningRate 0.0173 Epoch: 11 Global Step: 194800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:24:04,589-Speed 9419.54 samples/sec Loss 5.3327 LearningRate 0.0173 Epoch: 11 Global Step: 194810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:24:05,668-Speed 9491.50 samples/sec Loss 5.3669 LearningRate 0.0173 Epoch: 11 Global Step: 194820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:24:06,767-Speed 9321.85 samples/sec Loss 5.2551 LearningRate 0.0173 Epoch: 11 Global Step: 194830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:24:07,846-Speed 9505.54 samples/sec Loss 5.2641 LearningRate 0.0173 Epoch: 11 Global Step: 194840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:08,920-Speed 9535.79 samples/sec Loss 5.3464 LearningRate 0.0173 Epoch: 11 Global Step: 194850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:10,031-Speed 9218.86 samples/sec Loss 5.3896 LearningRate 0.0173 Epoch: 11 Global Step: 194860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:11,091-Speed 9673.60 samples/sec Loss 5.3916 LearningRate 0.0173 Epoch: 11 Global Step: 194870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:12,184-Speed 9372.95 samples/sec Loss 5.3801 LearningRate 0.0173 Epoch: 11 Global Step: 194880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:13,253-Speed 9585.12 samples/sec Loss 5.3513 LearningRate 0.0173 Epoch: 11 Global Step: 194890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:14,352-Speed 9322.35 samples/sec Loss 5.3563 LearningRate 0.0173 Epoch: 11 Global Step: 194900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:15,455-Speed 9289.34 samples/sec Loss 5.3558 LearningRate 0.0173 Epoch: 11 Global Step: 194910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:16,550-Speed 9350.41 samples/sec Loss 5.3631 LearningRate 0.0173 Epoch: 11 Global Step: 194920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:17,602-Speed 9748.30 samples/sec Loss 5.2984 LearningRate 0.0173 Epoch: 11 Global Step: 194930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:18,693-Speed 9386.20 samples/sec Loss 5.3478 LearningRate 0.0173 Epoch: 11 Global Step: 194940 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:24:19,770-Speed 9519.15 samples/sec Loss 5.4340 LearningRate 0.0173 Epoch: 11 Global Step: 194950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:20,841-Speed 9569.90 samples/sec Loss 5.2584 LearningRate 0.0173 Epoch: 11 Global Step: 194960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:21,941-Speed 9311.96 samples/sec Loss 5.3238 LearningRate 0.0173 Epoch: 11 Global Step: 194970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:23,063-Speed 9128.57 samples/sec Loss 5.3370 LearningRate 0.0173 Epoch: 11 Global Step: 194980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:24,168-Speed 9279.93 samples/sec Loss 5.4028 LearningRate 0.0173 Epoch: 11 Global Step: 194990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:25,211-Speed 9816.41 samples/sec Loss 5.3849 LearningRate 0.0173 Epoch: 11 Global Step: 195000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:26,321-Speed 9235.40 samples/sec Loss 5.3740 LearningRate 0.0173 Epoch: 11 Global Step: 195010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:27,414-Speed 9373.86 samples/sec Loss 5.3844 LearningRate 0.0173 Epoch: 11 Global Step: 195020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:28,508-Speed 9362.59 samples/sec Loss 5.2479 LearningRate 0.0173 Epoch: 11 Global Step: 195030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:29,576-Speed 9597.64 samples/sec Loss 5.3052 LearningRate 0.0173 Epoch: 11 Global Step: 195040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:30,637-Speed 9650.09 samples/sec Loss 5.3680 LearningRate 0.0173 Epoch: 11 Global Step: 195050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:31,680-Speed 9836.01 samples/sec Loss 5.3663 LearningRate 0.0173 Epoch: 11 Global Step: 195060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:32,733-Speed 9730.25 samples/sec Loss 5.3712 LearningRate 0.0173 Epoch: 11 Global Step: 195070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:33,780-Speed 9787.30 samples/sec Loss 5.3780 LearningRate 0.0173 Epoch: 11 Global Step: 195080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:34,863-Speed 9463.58 samples/sec Loss 5.3743 LearningRate 0.0173 Epoch: 11 Global Step: 195090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:35,952-Speed 9412.61 samples/sec Loss 5.3762 LearningRate 0.0173 Epoch: 11 Global Step: 195100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:37,030-Speed 9502.30 samples/sec Loss 5.3457 LearningRate 0.0173 Epoch: 11 Global Step: 195110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:38,138-Speed 9247.12 samples/sec Loss 5.2503 LearningRate 0.0173 Epoch: 11 Global Step: 195120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:39,250-Speed 9215.50 samples/sec Loss 5.3684 LearningRate 0.0173 Epoch: 11 Global Step: 195130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:40,320-Speed 9578.95 samples/sec Loss 5.2669 LearningRate 0.0173 Epoch: 11 Global Step: 195140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:41,374-Speed 9724.88 samples/sec Loss 5.3156 LearningRate 0.0173 Epoch: 11 Global Step: 195150 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:24:42,453-Speed 9497.28 samples/sec Loss 5.3872 LearningRate 0.0173 Epoch: 11 Global Step: 195160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:43,612-Speed 8833.70 samples/sec Loss 5.3191 LearningRate 0.0173 Epoch: 11 Global Step: 195170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:44,699-Speed 9430.84 samples/sec Loss 5.3477 LearningRate 0.0172 Epoch: 11 Global Step: 195180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:45,759-Speed 9664.26 samples/sec Loss 5.2643 LearningRate 0.0172 Epoch: 11 Global Step: 195190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:46,834-Speed 9533.35 samples/sec Loss 5.3278 LearningRate 0.0172 Epoch: 11 Global Step: 195200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:47,895-Speed 9656.07 samples/sec Loss 5.3301 LearningRate 0.0172 Epoch: 11 Global Step: 195210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:48,996-Speed 9305.99 samples/sec Loss 5.3499 LearningRate 0.0172 Epoch: 11 Global Step: 195220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:50,062-Speed 9616.44 samples/sec Loss 5.2826 LearningRate 0.0172 Epoch: 11 Global Step: 195230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:51,137-Speed 9530.43 samples/sec Loss 5.2924 LearningRate 0.0172 Epoch: 11 Global Step: 195240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:52,245-Speed 9242.52 samples/sec Loss 5.2975 LearningRate 0.0172 Epoch: 11 Global Step: 195250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:53,339-Speed 9364.11 samples/sec Loss 5.3443 LearningRate 0.0172 Epoch: 11 Global Step: 195260 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:24:54,401-Speed 9647.71 samples/sec Loss 5.3314 LearningRate 0.0172 Epoch: 11 Global Step: 195270 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:24:55,493-Speed 9387.84 samples/sec Loss 5.4226 LearningRate 0.0172 Epoch: 11 Global Step: 195280 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:24:56,592-Speed 9331.06 samples/sec Loss 5.3971 LearningRate 0.0172 Epoch: 11 Global Step: 195290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:57,699-Speed 9251.15 samples/sec Loss 5.3474 LearningRate 0.0172 Epoch: 11 Global Step: 195300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:58,813-Speed 9199.13 samples/sec Loss 5.2936 LearningRate 0.0172 Epoch: 11 Global Step: 195310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:24:59,867-Speed 9722.66 samples/sec Loss 5.3027 LearningRate 0.0172 Epoch: 11 Global Step: 195320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:00,994-Speed 9088.41 samples/sec Loss 5.2225 LearningRate 0.0172 Epoch: 11 Global Step: 195330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:02,081-Speed 9424.25 samples/sec Loss 5.3172 LearningRate 0.0172 Epoch: 11 Global Step: 195340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:03,159-Speed 9506.02 samples/sec Loss 5.3194 LearningRate 0.0172 Epoch: 11 Global Step: 195350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:04,224-Speed 9621.05 samples/sec Loss 5.3415 LearningRate 0.0172 Epoch: 11 Global Step: 195360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:05,294-Speed 9574.96 samples/sec Loss 5.3580 LearningRate 0.0172 Epoch: 11 Global Step: 195370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:06,364-Speed 9575.23 samples/sec Loss 5.3305 LearningRate 0.0172 Epoch: 11 Global Step: 195380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:07,438-Speed 9543.31 samples/sec Loss 5.2692 LearningRate 0.0172 Epoch: 11 Global Step: 195390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:08,541-Speed 9289.86 samples/sec Loss 5.3368 LearningRate 0.0172 Epoch: 11 Global Step: 195400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:09,639-Speed 9333.85 samples/sec Loss 5.3745 LearningRate 0.0172 Epoch: 11 Global Step: 195410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:10,694-Speed 9706.42 samples/sec Loss 5.3174 LearningRate 0.0172 Epoch: 11 Global Step: 195420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:11,835-Speed 8986.02 samples/sec Loss 5.3466 LearningRate 0.0172 Epoch: 11 Global Step: 195430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:12,957-Speed 9131.69 samples/sec Loss 5.3158 LearningRate 0.0172 Epoch: 11 Global Step: 195440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:14,081-Speed 9114.90 samples/sec Loss 5.2613 LearningRate 0.0172 Epoch: 11 Global Step: 195450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:15,148-Speed 9598.53 samples/sec Loss 5.3780 LearningRate 0.0172 Epoch: 11 Global Step: 195460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:16,208-Speed 9670.64 samples/sec Loss 5.3164 LearningRate 0.0172 Epoch: 11 Global Step: 195470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:17,292-Speed 9448.20 samples/sec Loss 5.3461 LearningRate 0.0172 Epoch: 11 Global Step: 195480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:18,365-Speed 9549.45 samples/sec Loss 5.3111 LearningRate 0.0172 Epoch: 11 Global Step: 195490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:19,471-Speed 9269.09 samples/sec Loss 5.2848 LearningRate 0.0172 Epoch: 11 Global Step: 195500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:20,591-Speed 9151.32 samples/sec Loss 5.3730 LearningRate 0.0172 Epoch: 11 Global Step: 195510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:21,713-Speed 9132.12 samples/sec Loss 5.3923 LearningRate 0.0172 Epoch: 11 Global Step: 195520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:22,775-Speed 9641.03 samples/sec Loss 5.2727 LearningRate 0.0172 Epoch: 11 Global Step: 195530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:23,820-Speed 9810.04 samples/sec Loss 5.3437 LearningRate 0.0172 Epoch: 11 Global Step: 195540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:24,873-Speed 9722.87 samples/sec Loss 5.4497 LearningRate 0.0172 Epoch: 11 Global Step: 195550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:25,972-Speed 9332.46 samples/sec Loss 5.2875 LearningRate 0.0172 Epoch: 11 Global Step: 195560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:27,095-Speed 9121.87 samples/sec Loss 5.3076 LearningRate 0.0172 Epoch: 11 Global Step: 195570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:28,151-Speed 9704.36 samples/sec Loss 5.2888 LearningRate 0.0171 Epoch: 11 Global Step: 195580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:29,227-Speed 9519.28 samples/sec Loss 5.3048 LearningRate 0.0171 Epoch: 11 Global Step: 195590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:30,363-Speed 9017.36 samples/sec Loss 5.3974 LearningRate 0.0171 Epoch: 11 Global Step: 195600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:31,469-Speed 9266.60 samples/sec Loss 5.3623 LearningRate 0.0171 Epoch: 11 Global Step: 195610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:32,582-Speed 9198.95 samples/sec Loss 5.2778 LearningRate 0.0171 Epoch: 11 Global Step: 195620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:33,669-Speed 9432.54 samples/sec Loss 5.3629 LearningRate 0.0171 Epoch: 11 Global Step: 195630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:34,732-Speed 9637.37 samples/sec Loss 5.3714 LearningRate 0.0171 Epoch: 11 Global Step: 195640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:35,872-Speed 8989.34 samples/sec Loss 5.2547 LearningRate 0.0171 Epoch: 11 Global Step: 195650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:36,967-Speed 9352.67 samples/sec Loss 5.3279 LearningRate 0.0171 Epoch: 11 Global Step: 195660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:38,033-Speed 9618.41 samples/sec Loss 5.3609 LearningRate 0.0171 Epoch: 11 Global Step: 195670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:39,137-Speed 9278.26 samples/sec Loss 5.3680 LearningRate 0.0171 Epoch: 11 Global Step: 195680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:40,197-Speed 9665.13 samples/sec Loss 5.3763 LearningRate 0.0171 Epoch: 11 Global Step: 195690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:41,325-Speed 9085.94 samples/sec Loss 5.1696 LearningRate 0.0171 Epoch: 11 Global Step: 195700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:25:42,451-Speed 9099.84 samples/sec Loss 5.3639 LearningRate 0.0171 Epoch: 11 Global Step: 195710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:43,557-Speed 9265.40 samples/sec Loss 5.2974 LearningRate 0.0171 Epoch: 11 Global Step: 195720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:44,634-Speed 9507.40 samples/sec Loss 5.2945 LearningRate 0.0171 Epoch: 11 Global Step: 195730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:45,669-Speed 9900.38 samples/sec Loss 5.3211 LearningRate 0.0171 Epoch: 11 Global Step: 195740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:46,797-Speed 9085.78 samples/sec Loss 5.3856 LearningRate 0.0171 Epoch: 11 Global Step: 195750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:47,904-Speed 9258.59 samples/sec Loss 5.3462 LearningRate 0.0171 Epoch: 11 Global Step: 195760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:48,981-Speed 9509.21 samples/sec Loss 5.2973 LearningRate 0.0171 Epoch: 11 Global Step: 195770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:50,078-Speed 9349.01 samples/sec Loss 5.3306 LearningRate 0.0171 Epoch: 11 Global Step: 195780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:51,182-Speed 9280.20 samples/sec Loss 5.3715 LearningRate 0.0171 Epoch: 11 Global Step: 195790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:52,224-Speed 9831.98 samples/sec Loss 5.2896 LearningRate 0.0171 Epoch: 11 Global Step: 195800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:53,348-Speed 9116.50 samples/sec Loss 5.2655 LearningRate 0.0171 Epoch: 11 Global Step: 195810 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:25:54,457-Speed 9240.32 samples/sec Loss 5.2840 LearningRate 0.0171 Epoch: 11 Global Step: 195820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:55,539-Speed 9471.81 samples/sec Loss 5.2063 LearningRate 0.0171 Epoch: 11 Global Step: 195830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:56,673-Speed 9037.22 samples/sec Loss 5.2627 LearningRate 0.0171 Epoch: 11 Global Step: 195840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:57,797-Speed 9113.89 samples/sec Loss 5.3274 LearningRate 0.0171 Epoch: 11 Global Step: 195850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:58,864-Speed 9601.17 samples/sec Loss 5.2664 LearningRate 0.0171 Epoch: 11 Global Step: 195860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:25:59,945-Speed 9477.08 samples/sec Loss 5.3010 LearningRate 0.0171 Epoch: 11 Global Step: 195870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:26:01,023-Speed 9510.88 samples/sec Loss 5.2738 LearningRate 0.0171 Epoch: 11 Global Step: 195880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:26:02,112-Speed 9408.26 samples/sec Loss 5.2007 LearningRate 0.0171 Epoch: 11 Global Step: 195890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:26:03,232-Speed 9140.70 samples/sec Loss 5.4236 LearningRate 0.0171 Epoch: 11 Global Step: 195900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:26:04,313-Speed 9482.63 samples/sec Loss 5.3752 LearningRate 0.0171 Epoch: 11 Global Step: 195910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:26:05,421-Speed 9248.36 samples/sec Loss 5.3361 LearningRate 0.0171 Epoch: 11 Global Step: 195920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:26:06,515-Speed 9369.07 samples/sec Loss 5.3549 LearningRate 0.0171 Epoch: 11 Global Step: 195930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:26:07,625-Speed 9222.79 samples/sec Loss 5.3338 LearningRate 0.0171 Epoch: 11 Global Step: 195940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:26:08,749-Speed 9122.93 samples/sec Loss 5.2891 LearningRate 0.0171 Epoch: 11 Global Step: 195950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:26:09,851-Speed 9298.38 samples/sec Loss 5.3865 LearningRate 0.0171 Epoch: 11 Global Step: 195960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:26:10,944-Speed 9374.03 samples/sec Loss 5.2699 LearningRate 0.0171 Epoch: 11 Global Step: 195970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:26:11,996-Speed 9743.43 samples/sec Loss 5.2425 LearningRate 0.0171 Epoch: 11 Global Step: 195980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:26:13,069-Speed 9548.35 samples/sec Loss 5.3089 LearningRate 0.0170 Epoch: 11 Global Step: 195990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:26:14,152-Speed 9464.23 samples/sec Loss 5.3746 LearningRate 0.0170 Epoch: 11 Global Step: 196000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:26:35,926-[lfw][196000]XNorm: 8.772796 Training: 2022-04-11 19:26:35,926-[lfw][196000]Accuracy-Flip: 0.99667+-0.00269 Training: 2022-04-11 19:26:35,926-[lfw][196000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:27:01,378-[cfp_fp][196000]XNorm: 7.544887 Training: 2022-04-11 19:27:01,379-[cfp_fp][196000]Accuracy-Flip: 0.96386+-0.00948 Training: 2022-04-11 19:27:01,379-[cfp_fp][196000]Accuracy-Highest: 0.96714 Training: 2022-04-11 19:27:23,137-[agedb_30][196000]XNorm: 8.494056 Training: 2022-04-11 19:27:23,138-[agedb_30][196000]Accuracy-Flip: 0.96983+-0.00886 Training: 2022-04-11 19:27:23,138-[agedb_30][196000]Accuracy-Highest: 0.96983 Training: 2022-04-11 19:27:24,192-Speed 146.20 samples/sec Loss 5.3655 LearningRate 0.0170 Epoch: 11 Global Step: 196010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:25,248-Speed 9699.20 samples/sec Loss 5.3619 LearningRate 0.0170 Epoch: 11 Global Step: 196020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:26,307-Speed 9674.83 samples/sec Loss 5.3413 LearningRate 0.0170 Epoch: 11 Global Step: 196030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:27,368-Speed 9659.68 samples/sec Loss 5.3550 LearningRate 0.0170 Epoch: 11 Global Step: 196040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:28,420-Speed 9737.29 samples/sec Loss 5.2804 LearningRate 0.0170 Epoch: 11 Global Step: 196050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:29,508-Speed 9413.47 samples/sec Loss 5.3889 LearningRate 0.0170 Epoch: 11 Global Step: 196060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:30,574-Speed 9623.79 samples/sec Loss 5.3183 LearningRate 0.0170 Epoch: 11 Global Step: 196070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:31,657-Speed 9454.05 samples/sec Loss 5.3361 LearningRate 0.0170 Epoch: 11 Global Step: 196080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:32,791-Speed 9035.12 samples/sec Loss 5.3121 LearningRate 0.0170 Epoch: 11 Global Step: 196090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:33,861-Speed 9574.60 samples/sec Loss 5.4372 LearningRate 0.0170 Epoch: 11 Global Step: 196100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:34,928-Speed 9603.93 samples/sec Loss 5.2827 LearningRate 0.0170 Epoch: 11 Global Step: 196110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:36,006-Speed 9501.82 samples/sec Loss 5.3238 LearningRate 0.0170 Epoch: 11 Global Step: 196120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:37,066-Speed 9672.51 samples/sec Loss 5.3351 LearningRate 0.0170 Epoch: 11 Global Step: 196130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:38,185-Speed 9151.99 samples/sec Loss 5.3048 LearningRate 0.0170 Epoch: 11 Global Step: 196140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:39,287-Speed 9295.57 samples/sec Loss 5.3375 LearningRate 0.0170 Epoch: 11 Global Step: 196150 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:27:40,322-Speed 9907.23 samples/sec Loss 5.3127 LearningRate 0.0170 Epoch: 11 Global Step: 196160 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:27:41,400-Speed 9502.99 samples/sec Loss 5.3088 LearningRate 0.0170 Epoch: 11 Global Step: 196170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:42,478-Speed 9504.93 samples/sec Loss 5.3814 LearningRate 0.0170 Epoch: 11 Global Step: 196180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:43,553-Speed 9527.68 samples/sec Loss 5.3857 LearningRate 0.0170 Epoch: 11 Global Step: 196190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:44,625-Speed 9561.55 samples/sec Loss 5.3122 LearningRate 0.0170 Epoch: 11 Global Step: 196200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:45,684-Speed 9677.10 samples/sec Loss 5.3707 LearningRate 0.0170 Epoch: 11 Global Step: 196210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:46,798-Speed 9206.64 samples/sec Loss 5.3744 LearningRate 0.0170 Epoch: 11 Global Step: 196220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:47,908-Speed 9226.98 samples/sec Loss 5.3104 LearningRate 0.0170 Epoch: 11 Global Step: 196230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:48,971-Speed 9645.19 samples/sec Loss 5.3128 LearningRate 0.0170 Epoch: 11 Global Step: 196240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:50,059-Speed 9409.68 samples/sec Loss 5.3143 LearningRate 0.0170 Epoch: 11 Global Step: 196250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:51,095-Speed 9889.77 samples/sec Loss 5.2999 LearningRate 0.0170 Epoch: 11 Global Step: 196260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:52,196-Speed 9307.65 samples/sec Loss 5.4012 LearningRate 0.0170 Epoch: 11 Global Step: 196270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:53,242-Speed 9798.39 samples/sec Loss 5.3540 LearningRate 0.0170 Epoch: 11 Global Step: 196280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:54,304-Speed 9642.39 samples/sec Loss 5.2949 LearningRate 0.0170 Epoch: 11 Global Step: 196290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:27:55,379-Speed 9537.82 samples/sec Loss 5.3269 LearningRate 0.0170 Epoch: 11 Global Step: 196300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:56,461-Speed 9468.02 samples/sec Loss 5.2768 LearningRate 0.0170 Epoch: 11 Global Step: 196310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:57,563-Speed 9296.90 samples/sec Loss 5.2429 LearningRate 0.0170 Epoch: 11 Global Step: 196320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:58,666-Speed 9284.24 samples/sec Loss 5.2804 LearningRate 0.0170 Epoch: 11 Global Step: 196330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:27:59,738-Speed 9559.46 samples/sec Loss 5.2897 LearningRate 0.0170 Epoch: 11 Global Step: 196340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:00,816-Speed 9507.31 samples/sec Loss 5.2893 LearningRate 0.0170 Epoch: 11 Global Step: 196350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:01,867-Speed 9752.88 samples/sec Loss 5.3565 LearningRate 0.0170 Epoch: 11 Global Step: 196360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:02,998-Speed 9054.19 samples/sec Loss 5.3260 LearningRate 0.0170 Epoch: 11 Global Step: 196370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:04,121-Speed 9128.36 samples/sec Loss 5.3651 LearningRate 0.0170 Epoch: 11 Global Step: 196380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:05,211-Speed 9397.92 samples/sec Loss 5.3196 LearningRate 0.0169 Epoch: 11 Global Step: 196390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:06,282-Speed 9562.61 samples/sec Loss 5.2925 LearningRate 0.0169 Epoch: 11 Global Step: 196400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:07,362-Speed 9491.70 samples/sec Loss 5.3171 LearningRate 0.0169 Epoch: 11 Global Step: 196410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:08,456-Speed 9368.08 samples/sec Loss 5.2962 LearningRate 0.0169 Epoch: 11 Global Step: 196420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:09,526-Speed 9568.16 samples/sec Loss 5.3201 LearningRate 0.0169 Epoch: 11 Global Step: 196430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:10,608-Speed 9479.41 samples/sec Loss 5.4775 LearningRate 0.0169 Epoch: 11 Global Step: 196440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:11,698-Speed 9399.63 samples/sec Loss 5.4507 LearningRate 0.0169 Epoch: 11 Global Step: 196450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:12,802-Speed 9277.45 samples/sec Loss 5.3606 LearningRate 0.0169 Epoch: 11 Global Step: 196460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:13,913-Speed 9222.20 samples/sec Loss 5.2806 LearningRate 0.0169 Epoch: 11 Global Step: 196470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:15,014-Speed 9304.16 samples/sec Loss 5.3003 LearningRate 0.0169 Epoch: 11 Global Step: 196480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:16,081-Speed 9607.79 samples/sec Loss 5.3060 LearningRate 0.0169 Epoch: 11 Global Step: 196490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:17,153-Speed 9556.34 samples/sec Loss 5.3598 LearningRate 0.0169 Epoch: 11 Global Step: 196500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:18,239-Speed 9432.41 samples/sec Loss 5.3485 LearningRate 0.0169 Epoch: 11 Global Step: 196510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:19,345-Speed 9269.67 samples/sec Loss 5.3771 LearningRate 0.0169 Epoch: 11 Global Step: 196520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:20,475-Speed 9064.61 samples/sec Loss 5.3262 LearningRate 0.0169 Epoch: 11 Global Step: 196530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:21,599-Speed 9117.81 samples/sec Loss 5.2890 LearningRate 0.0169 Epoch: 11 Global Step: 196540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:22,657-Speed 9681.55 samples/sec Loss 5.1568 LearningRate 0.0169 Epoch: 11 Global Step: 196550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:23,747-Speed 9403.18 samples/sec Loss 5.2944 LearningRate 0.0169 Epoch: 11 Global Step: 196560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:24,835-Speed 9417.85 samples/sec Loss 5.3629 LearningRate 0.0169 Epoch: 11 Global Step: 196570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:25,899-Speed 9629.92 samples/sec Loss 5.3076 LearningRate 0.0169 Epoch: 11 Global Step: 196580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:26,954-Speed 9709.13 samples/sec Loss 5.1319 LearningRate 0.0169 Epoch: 11 Global Step: 196590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:28,019-Speed 9624.49 samples/sec Loss 5.2598 LearningRate 0.0169 Epoch: 11 Global Step: 196600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:29,108-Speed 9408.23 samples/sec Loss 5.3491 LearningRate 0.0169 Epoch: 11 Global Step: 196610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:30,197-Speed 9408.75 samples/sec Loss 5.4879 LearningRate 0.0169 Epoch: 11 Global Step: 196620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:31,270-Speed 9550.81 samples/sec Loss 5.3002 LearningRate 0.0169 Epoch: 11 Global Step: 196630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:32,357-Speed 9423.13 samples/sec Loss 5.2783 LearningRate 0.0169 Epoch: 11 Global Step: 196640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:33,448-Speed 9391.39 samples/sec Loss 5.3645 LearningRate 0.0169 Epoch: 11 Global Step: 196650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:34,533-Speed 9450.42 samples/sec Loss 5.3081 LearningRate 0.0169 Epoch: 11 Global Step: 196660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:35,564-Speed 9928.96 samples/sec Loss 5.3879 LearningRate 0.0169 Epoch: 11 Global Step: 196670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:36,674-Speed 9235.68 samples/sec Loss 5.3035 LearningRate 0.0169 Epoch: 11 Global Step: 196680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:37,729-Speed 9706.55 samples/sec Loss 5.3290 LearningRate 0.0169 Epoch: 11 Global Step: 196690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:38,796-Speed 9603.01 samples/sec Loss 5.3422 LearningRate 0.0169 Epoch: 11 Global Step: 196700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:39,889-Speed 9376.72 samples/sec Loss 5.3052 LearningRate 0.0169 Epoch: 11 Global Step: 196710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:40,944-Speed 9714.03 samples/sec Loss 5.3440 LearningRate 0.0169 Epoch: 11 Global Step: 196720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:41,998-Speed 9723.15 samples/sec Loss 5.3068 LearningRate 0.0169 Epoch: 11 Global Step: 196730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:43,056-Speed 9686.72 samples/sec Loss 5.3668 LearningRate 0.0169 Epoch: 11 Global Step: 196740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:44,175-Speed 9153.68 samples/sec Loss 5.2818 LearningRate 0.0169 Epoch: 11 Global Step: 196750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:45,243-Speed 9595.92 samples/sec Loss 5.2717 LearningRate 0.0169 Epoch: 11 Global Step: 196760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:46,362-Speed 9160.73 samples/sec Loss 5.3734 LearningRate 0.0169 Epoch: 11 Global Step: 196770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 19:28:47,477-Speed 9189.30 samples/sec Loss 5.3775 LearningRate 0.0169 Epoch: 11 Global Step: 196780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:48,568-Speed 9387.97 samples/sec Loss 5.2774 LearningRate 0.0169 Epoch: 11 Global Step: 196790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:49,627-Speed 9679.10 samples/sec Loss 5.3266 LearningRate 0.0168 Epoch: 11 Global Step: 196800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:50,718-Speed 9389.65 samples/sec Loss 5.2469 LearningRate 0.0168 Epoch: 11 Global Step: 196810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:51,761-Speed 9819.51 samples/sec Loss 5.4513 LearningRate 0.0168 Epoch: 11 Global Step: 196820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:52,838-Speed 9519.50 samples/sec Loss 5.3178 LearningRate 0.0168 Epoch: 11 Global Step: 196830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:53,900-Speed 9647.87 samples/sec Loss 5.4058 LearningRate 0.0168 Epoch: 11 Global Step: 196840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:55,010-Speed 9225.85 samples/sec Loss 5.3745 LearningRate 0.0168 Epoch: 11 Global Step: 196850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:56,062-Speed 9745.02 samples/sec Loss 5.3763 LearningRate 0.0168 Epoch: 11 Global Step: 196860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:57,163-Speed 9304.57 samples/sec Loss 5.2644 LearningRate 0.0168 Epoch: 11 Global Step: 196870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:28:58,230-Speed 9596.52 samples/sec Loss 5.3203 LearningRate 0.0168 Epoch: 11 Global Step: 196880 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:28:59,287-Speed 9694.39 samples/sec Loss 5.3856 LearningRate 0.0168 Epoch: 11 Global Step: 196890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:00,369-Speed 9479.13 samples/sec Loss 5.3720 LearningRate 0.0168 Epoch: 11 Global Step: 196900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:01,464-Speed 9357.88 samples/sec Loss 5.3122 LearningRate 0.0168 Epoch: 11 Global Step: 196910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:02,534-Speed 9577.00 samples/sec Loss 5.4353 LearningRate 0.0168 Epoch: 11 Global Step: 196920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:03,650-Speed 9181.50 samples/sec Loss 5.3161 LearningRate 0.0168 Epoch: 11 Global Step: 196930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:04,722-Speed 9555.64 samples/sec Loss 5.2910 LearningRate 0.0168 Epoch: 11 Global Step: 196940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:05,806-Speed 9450.06 samples/sec Loss 5.2568 LearningRate 0.0168 Epoch: 11 Global Step: 196950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:06,897-Speed 9396.09 samples/sec Loss 5.3961 LearningRate 0.0168 Epoch: 11 Global Step: 196960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:08,038-Speed 8976.04 samples/sec Loss 5.2805 LearningRate 0.0168 Epoch: 11 Global Step: 196970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:09,154-Speed 9183.68 samples/sec Loss 5.3701 LearningRate 0.0168 Epoch: 11 Global Step: 196980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:10,244-Speed 9399.90 samples/sec Loss 5.3048 LearningRate 0.0168 Epoch: 11 Global Step: 196990 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:29:11,273-Speed 9955.96 samples/sec Loss 5.3817 LearningRate 0.0168 Epoch: 11 Global Step: 197000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:12,346-Speed 9556.23 samples/sec Loss 5.2454 LearningRate 0.0168 Epoch: 11 Global Step: 197010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:13,461-Speed 9184.38 samples/sec Loss 5.3166 LearningRate 0.0168 Epoch: 11 Global Step: 197020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:14,567-Speed 9268.29 samples/sec Loss 5.4434 LearningRate 0.0168 Epoch: 11 Global Step: 197030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:15,684-Speed 9166.68 samples/sec Loss 5.4057 LearningRate 0.0168 Epoch: 11 Global Step: 197040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:16,764-Speed 9490.74 samples/sec Loss 5.4572 LearningRate 0.0168 Epoch: 11 Global Step: 197050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:17,866-Speed 9302.06 samples/sec Loss 5.4157 LearningRate 0.0168 Epoch: 11 Global Step: 197060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:18,960-Speed 9365.50 samples/sec Loss 5.3504 LearningRate 0.0168 Epoch: 11 Global Step: 197070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:20,051-Speed 9395.28 samples/sec Loss 5.4418 LearningRate 0.0168 Epoch: 11 Global Step: 197080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:21,108-Speed 9689.82 samples/sec Loss 5.3032 LearningRate 0.0168 Epoch: 11 Global Step: 197090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:22,173-Speed 9624.46 samples/sec Loss 5.4287 LearningRate 0.0168 Epoch: 11 Global Step: 197100 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:29:23,291-Speed 9159.73 samples/sec Loss 5.2947 LearningRate 0.0168 Epoch: 11 Global Step: 197110 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 19:29:24,340-Speed 9773.60 samples/sec Loss 5.2360 LearningRate 0.0168 Epoch: 11 Global Step: 197120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:25,414-Speed 9533.91 samples/sec Loss 5.3996 LearningRate 0.0168 Epoch: 11 Global Step: 197130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:26,524-Speed 9228.42 samples/sec Loss 5.3060 LearningRate 0.0168 Epoch: 11 Global Step: 197140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:27,605-Speed 9479.20 samples/sec Loss 5.2736 LearningRate 0.0168 Epoch: 11 Global Step: 197150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:28,711-Speed 9267.65 samples/sec Loss 5.3277 LearningRate 0.0168 Epoch: 11 Global Step: 197160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:29,796-Speed 9443.13 samples/sec Loss 5.3201 LearningRate 0.0168 Epoch: 11 Global Step: 197170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:30,885-Speed 9408.83 samples/sec Loss 5.3597 LearningRate 0.0168 Epoch: 11 Global Step: 197180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:32,005-Speed 9152.08 samples/sec Loss 5.2877 LearningRate 0.0168 Epoch: 11 Global Step: 197190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:33,083-Speed 9505.41 samples/sec Loss 5.2320 LearningRate 0.0167 Epoch: 11 Global Step: 197200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:34,191-Speed 9245.30 samples/sec Loss 5.3813 LearningRate 0.0167 Epoch: 11 Global Step: 197210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:35,273-Speed 9471.88 samples/sec Loss 5.2845 LearningRate 0.0167 Epoch: 11 Global Step: 197220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:36,342-Speed 9581.07 samples/sec Loss 5.3377 LearningRate 0.0167 Epoch: 11 Global Step: 197230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:37,399-Speed 9692.86 samples/sec Loss 5.3426 LearningRate 0.0167 Epoch: 11 Global Step: 197240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:38,508-Speed 9245.32 samples/sec Loss 5.3548 LearningRate 0.0167 Epoch: 11 Global Step: 197250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:39,586-Speed 9501.68 samples/sec Loss 5.3110 LearningRate 0.0167 Epoch: 11 Global Step: 197260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:40,666-Speed 9488.79 samples/sec Loss 5.3115 LearningRate 0.0167 Epoch: 11 Global Step: 197270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:41,781-Speed 9189.32 samples/sec Loss 5.3093 LearningRate 0.0167 Epoch: 11 Global Step: 197280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:42,879-Speed 9335.69 samples/sec Loss 5.4150 LearningRate 0.0167 Epoch: 11 Global Step: 197290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 19:29:43,950-Speed 9563.31 samples/sec Loss 5.3122 LearningRate 0.0167 Epoch: 11 Global Step: 197300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:45,021-Speed 9569.46 samples/sec Loss 5.3414 LearningRate 0.0167 Epoch: 11 Global Step: 197310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:46,087-Speed 9605.29 samples/sec Loss 5.4466 LearningRate 0.0167 Epoch: 11 Global Step: 197320 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:29:47,153-Speed 9622.58 samples/sec Loss 5.3376 LearningRate 0.0167 Epoch: 11 Global Step: 197330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:48,261-Speed 9247.02 samples/sec Loss 5.4106 LearningRate 0.0167 Epoch: 11 Global Step: 197340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:49,341-Speed 9486.61 samples/sec Loss 5.2992 LearningRate 0.0167 Epoch: 11 Global Step: 197350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:50,413-Speed 9554.39 samples/sec Loss 5.2814 LearningRate 0.0167 Epoch: 11 Global Step: 197360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:51,478-Speed 9625.76 samples/sec Loss 5.3218 LearningRate 0.0167 Epoch: 11 Global Step: 197370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:52,630-Speed 8895.01 samples/sec Loss 5.4039 LearningRate 0.0167 Epoch: 11 Global Step: 197380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:53,730-Speed 9309.28 samples/sec Loss 5.3234 LearningRate 0.0167 Epoch: 11 Global Step: 197390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:54,790-Speed 9669.49 samples/sec Loss 5.3090 LearningRate 0.0167 Epoch: 11 Global Step: 197400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:55,874-Speed 9451.98 samples/sec Loss 5.2482 LearningRate 0.0167 Epoch: 11 Global Step: 197410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:56,958-Speed 9449.35 samples/sec Loss 5.3260 LearningRate 0.0167 Epoch: 11 Global Step: 197420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:29:58,077-Speed 9159.13 samples/sec Loss 5.2886 LearningRate 0.0167 Epoch: 11 Global Step: 197430 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:29:59,158-Speed 9478.73 samples/sec Loss 5.3485 LearningRate 0.0167 Epoch: 11 Global Step: 197440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:00,243-Speed 9443.06 samples/sec Loss 5.3949 LearningRate 0.0167 Epoch: 11 Global Step: 197450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:01,340-Speed 9342.11 samples/sec Loss 5.4250 LearningRate 0.0167 Epoch: 11 Global Step: 197460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:02,437-Speed 9342.98 samples/sec Loss 5.2667 LearningRate 0.0167 Epoch: 11 Global Step: 197470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:03,579-Speed 8973.18 samples/sec Loss 5.4156 LearningRate 0.0167 Epoch: 11 Global Step: 197480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:04,655-Speed 9518.61 samples/sec Loss 5.3367 LearningRate 0.0167 Epoch: 11 Global Step: 197490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:05,747-Speed 9380.17 samples/sec Loss 5.2304 LearningRate 0.0167 Epoch: 11 Global Step: 197500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:06,857-Speed 9235.64 samples/sec Loss 5.2667 LearningRate 0.0167 Epoch: 11 Global Step: 197510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:07,959-Speed 9296.20 samples/sec Loss 5.2771 LearningRate 0.0167 Epoch: 11 Global Step: 197520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:09,044-Speed 9446.25 samples/sec Loss 5.3485 LearningRate 0.0167 Epoch: 11 Global Step: 197530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:10,124-Speed 9485.86 samples/sec Loss 5.4397 LearningRate 0.0167 Epoch: 11 Global Step: 197540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:11,178-Speed 9726.43 samples/sec Loss 5.3614 LearningRate 0.0167 Epoch: 11 Global Step: 197550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:12,254-Speed 9523.89 samples/sec Loss 5.3733 LearningRate 0.0167 Epoch: 11 Global Step: 197560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:13,377-Speed 9119.10 samples/sec Loss 5.3651 LearningRate 0.0167 Epoch: 11 Global Step: 197570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:14,445-Speed 9597.69 samples/sec Loss 5.3986 LearningRate 0.0167 Epoch: 11 Global Step: 197580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:15,520-Speed 9528.58 samples/sec Loss 5.2811 LearningRate 0.0167 Epoch: 11 Global Step: 197590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:16,613-Speed 9374.48 samples/sec Loss 5.3177 LearningRate 0.0167 Epoch: 11 Global Step: 197600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:17,685-Speed 9566.80 samples/sec Loss 5.3524 LearningRate 0.0166 Epoch: 11 Global Step: 197610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:18,754-Speed 9581.32 samples/sec Loss 5.2687 LearningRate 0.0166 Epoch: 11 Global Step: 197620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:30:19,854-Speed 9314.25 samples/sec Loss 5.3352 LearningRate 0.0166 Epoch: 11 Global Step: 197630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:30:20,920-Speed 9613.33 samples/sec Loss 5.3773 LearningRate 0.0166 Epoch: 11 Global Step: 197640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:30:21,996-Speed 9517.44 samples/sec Loss 5.3256 LearningRate 0.0166 Epoch: 11 Global Step: 197650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:30:23,063-Speed 9608.13 samples/sec Loss 5.2650 LearningRate 0.0166 Epoch: 11 Global Step: 197660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:30:24,117-Speed 9713.93 samples/sec Loss 5.2173 LearningRate 0.0166 Epoch: 11 Global Step: 197670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:30:25,171-Speed 9727.38 samples/sec Loss 5.3614 LearningRate 0.0166 Epoch: 11 Global Step: 197680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:30:26,265-Speed 9367.54 samples/sec Loss 5.3458 LearningRate 0.0166 Epoch: 11 Global Step: 197690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:30:27,331-Speed 9610.37 samples/sec Loss 5.3332 LearningRate 0.0166 Epoch: 11 Global Step: 197700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:30:28,410-Speed 9500.04 samples/sec Loss 5.3237 LearningRate 0.0166 Epoch: 11 Global Step: 197710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:30:29,492-Speed 9464.08 samples/sec Loss 5.2989 LearningRate 0.0166 Epoch: 11 Global Step: 197720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:30,577-Speed 9455.40 samples/sec Loss 5.3104 LearningRate 0.0166 Epoch: 11 Global Step: 197730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:31,687-Speed 9227.46 samples/sec Loss 5.3911 LearningRate 0.0166 Epoch: 11 Global Step: 197740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:32,786-Speed 9324.38 samples/sec Loss 5.4197 LearningRate 0.0166 Epoch: 11 Global Step: 197750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:33,842-Speed 9698.11 samples/sec Loss 5.3205 LearningRate 0.0166 Epoch: 11 Global Step: 197760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:34,935-Speed 9379.06 samples/sec Loss 5.3255 LearningRate 0.0166 Epoch: 11 Global Step: 197770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:36,057-Speed 9130.35 samples/sec Loss 5.3410 LearningRate 0.0166 Epoch: 11 Global Step: 197780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:37,149-Speed 9380.17 samples/sec Loss 5.4634 LearningRate 0.0166 Epoch: 11 Global Step: 197790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:38,210-Speed 9656.00 samples/sec Loss 5.4394 LearningRate 0.0166 Epoch: 11 Global Step: 197800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:39,279-Speed 9583.94 samples/sec Loss 5.3426 LearningRate 0.0166 Epoch: 11 Global Step: 197810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:40,367-Speed 9421.05 samples/sec Loss 5.3835 LearningRate 0.0166 Epoch: 11 Global Step: 197820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:41,448-Speed 9482.76 samples/sec Loss 5.2459 LearningRate 0.0166 Epoch: 11 Global Step: 197830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:42,486-Speed 9865.22 samples/sec Loss 5.2896 LearningRate 0.0166 Epoch: 11 Global Step: 197840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:43,578-Speed 9379.89 samples/sec Loss 5.2378 LearningRate 0.0166 Epoch: 11 Global Step: 197850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:44,665-Speed 9429.56 samples/sec Loss 5.2844 LearningRate 0.0166 Epoch: 11 Global Step: 197860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:45,771-Speed 9267.68 samples/sec Loss 5.3905 LearningRate 0.0166 Epoch: 11 Global Step: 197870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:46,887-Speed 9183.62 samples/sec Loss 5.3226 LearningRate 0.0166 Epoch: 11 Global Step: 197880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:47,946-Speed 9670.83 samples/sec Loss 5.3664 LearningRate 0.0166 Epoch: 11 Global Step: 197890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:49,025-Speed 9501.82 samples/sec Loss 5.4279 LearningRate 0.0166 Epoch: 11 Global Step: 197900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:50,108-Speed 9455.54 samples/sec Loss 5.3101 LearningRate 0.0166 Epoch: 11 Global Step: 197910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:51,179-Speed 9570.55 samples/sec Loss 5.3341 LearningRate 0.0166 Epoch: 11 Global Step: 197920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:52,227-Speed 9774.47 samples/sec Loss 5.3668 LearningRate 0.0166 Epoch: 11 Global Step: 197930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:53,292-Speed 9619.86 samples/sec Loss 5.3342 LearningRate 0.0166 Epoch: 11 Global Step: 197940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:54,353-Speed 9662.27 samples/sec Loss 5.3654 LearningRate 0.0166 Epoch: 11 Global Step: 197950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:55,465-Speed 9210.90 samples/sec Loss 5.2659 LearningRate 0.0166 Epoch: 11 Global Step: 197960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:56,519-Speed 9716.31 samples/sec Loss 5.2523 LearningRate 0.0166 Epoch: 11 Global Step: 197970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:57,601-Speed 9475.14 samples/sec Loss 5.3615 LearningRate 0.0166 Epoch: 11 Global Step: 197980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:58,666-Speed 9614.55 samples/sec Loss 5.3300 LearningRate 0.0166 Epoch: 11 Global Step: 197990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:30:59,786-Speed 9150.55 samples/sec Loss 5.4674 LearningRate 0.0166 Epoch: 11 Global Step: 198000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:31:21,782-[lfw][198000]XNorm: 8.728273 Training: 2022-04-11 19:31:21,783-[lfw][198000]Accuracy-Flip: 0.99667+-0.00247 Training: 2022-04-11 19:31:21,783-[lfw][198000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:31:47,178-[cfp_fp][198000]XNorm: 7.472294 Training: 2022-04-11 19:31:47,179-[cfp_fp][198000]Accuracy-Flip: 0.96300+-0.00970 Training: 2022-04-11 19:31:47,180-[cfp_fp][198000]Accuracy-Highest: 0.96714 Training: 2022-04-11 19:32:09,134-[agedb_30][198000]XNorm: 8.502214 Training: 2022-04-11 19:32:09,135-[agedb_30][198000]Accuracy-Flip: 0.96367+-0.01183 Training: 2022-04-11 19:32:09,135-[agedb_30][198000]Accuracy-Highest: 0.96983 Training: 2022-04-11 19:32:10,207-Speed 145.41 samples/sec Loss 5.3627 LearningRate 0.0166 Epoch: 11 Global Step: 198010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:11,278-Speed 9571.89 samples/sec Loss 5.3808 LearningRate 0.0165 Epoch: 11 Global Step: 198020 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:32:12,369-Speed 9392.91 samples/sec Loss 5.2270 LearningRate 0.0165 Epoch: 11 Global Step: 198030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:13,462-Speed 9372.51 samples/sec Loss 5.3342 LearningRate 0.0165 Epoch: 11 Global Step: 198040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:14,575-Speed 9200.31 samples/sec Loss 5.2733 LearningRate 0.0165 Epoch: 11 Global Step: 198050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:15,664-Speed 9410.96 samples/sec Loss 5.3619 LearningRate 0.0165 Epoch: 11 Global Step: 198060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:16,756-Speed 9385.54 samples/sec Loss 5.4010 LearningRate 0.0165 Epoch: 11 Global Step: 198070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:17,858-Speed 9299.92 samples/sec Loss 5.2539 LearningRate 0.0165 Epoch: 11 Global Step: 198080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:18,910-Speed 9738.43 samples/sec Loss 5.3014 LearningRate 0.0165 Epoch: 11 Global Step: 198090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:19,977-Speed 9607.07 samples/sec Loss 5.3191 LearningRate 0.0165 Epoch: 11 Global Step: 198100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:21,056-Speed 9496.86 samples/sec Loss 5.2597 LearningRate 0.0165 Epoch: 11 Global Step: 198110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:22,151-Speed 9353.75 samples/sec Loss 5.2792 LearningRate 0.0165 Epoch: 11 Global Step: 198120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:23,209-Speed 9686.26 samples/sec Loss 5.3172 LearningRate 0.0165 Epoch: 11 Global Step: 198130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:24,354-Speed 8946.19 samples/sec Loss 5.4145 LearningRate 0.0165 Epoch: 11 Global Step: 198140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:25,437-Speed 9464.73 samples/sec Loss 5.3075 LearningRate 0.0165 Epoch: 11 Global Step: 198150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:26,536-Speed 9325.71 samples/sec Loss 5.3773 LearningRate 0.0165 Epoch: 11 Global Step: 198160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:27,604-Speed 9586.40 samples/sec Loss 5.3155 LearningRate 0.0165 Epoch: 11 Global Step: 198170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:28,710-Speed 9263.30 samples/sec Loss 5.2594 LearningRate 0.0165 Epoch: 11 Global Step: 198180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:29,802-Speed 9384.35 samples/sec Loss 5.3453 LearningRate 0.0165 Epoch: 11 Global Step: 198190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:30,902-Speed 9314.56 samples/sec Loss 5.2444 LearningRate 0.0165 Epoch: 11 Global Step: 198200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:31,980-Speed 9511.86 samples/sec Loss 5.3705 LearningRate 0.0165 Epoch: 11 Global Step: 198210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:33,041-Speed 9653.56 samples/sec Loss 5.2601 LearningRate 0.0165 Epoch: 11 Global Step: 198220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:34,151-Speed 9230.58 samples/sec Loss 5.2975 LearningRate 0.0165 Epoch: 11 Global Step: 198230 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:32:35,213-Speed 9645.78 samples/sec Loss 5.2393 LearningRate 0.0165 Epoch: 11 Global Step: 198240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:36,252-Speed 9865.35 samples/sec Loss 5.4286 LearningRate 0.0165 Epoch: 11 Global Step: 198250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:37,307-Speed 9710.74 samples/sec Loss 5.3419 LearningRate 0.0165 Epoch: 11 Global Step: 198260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:38,405-Speed 9330.55 samples/sec Loss 5.2664 LearningRate 0.0165 Epoch: 11 Global Step: 198270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:39,525-Speed 9141.78 samples/sec Loss 5.2498 LearningRate 0.0165 Epoch: 11 Global Step: 198280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:40,640-Speed 9193.50 samples/sec Loss 5.2303 LearningRate 0.0165 Epoch: 11 Global Step: 198290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:41,743-Speed 9300.29 samples/sec Loss 5.2328 LearningRate 0.0165 Epoch: 11 Global Step: 198300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:42,811-Speed 9589.16 samples/sec Loss 5.3306 LearningRate 0.0165 Epoch: 11 Global Step: 198310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:43,900-Speed 9414.21 samples/sec Loss 5.3683 LearningRate 0.0165 Epoch: 11 Global Step: 198320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:44,995-Speed 9355.60 samples/sec Loss 5.2865 LearningRate 0.0165 Epoch: 11 Global Step: 198330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:46,078-Speed 9454.00 samples/sec Loss 5.3232 LearningRate 0.0165 Epoch: 11 Global Step: 198340 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:32:47,148-Speed 9580.59 samples/sec Loss 5.3664 LearningRate 0.0165 Epoch: 11 Global Step: 198350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:48,204-Speed 9707.16 samples/sec Loss 5.3282 LearningRate 0.0165 Epoch: 11 Global Step: 198360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:49,278-Speed 9538.03 samples/sec Loss 5.3644 LearningRate 0.0165 Epoch: 11 Global Step: 198370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:50,345-Speed 9603.74 samples/sec Loss 5.1984 LearningRate 0.0165 Epoch: 11 Global Step: 198380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:51,411-Speed 9611.60 samples/sec Loss 5.3633 LearningRate 0.0165 Epoch: 11 Global Step: 198390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:52,490-Speed 9498.86 samples/sec Loss 5.3161 LearningRate 0.0165 Epoch: 11 Global Step: 198400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:53,541-Speed 9745.46 samples/sec Loss 5.2972 LearningRate 0.0165 Epoch: 11 Global Step: 198410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:54,604-Speed 9641.26 samples/sec Loss 5.1562 LearningRate 0.0165 Epoch: 11 Global Step: 198420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:55,661-Speed 9687.17 samples/sec Loss 5.3860 LearningRate 0.0164 Epoch: 11 Global Step: 198430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:56,715-Speed 9725.46 samples/sec Loss 5.3964 LearningRate 0.0164 Epoch: 11 Global Step: 198440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:57,777-Speed 9641.90 samples/sec Loss 5.1871 LearningRate 0.0164 Epoch: 11 Global Step: 198450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:58,867-Speed 9406.74 samples/sec Loss 5.3334 LearningRate 0.0164 Epoch: 11 Global Step: 198460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:32:59,919-Speed 9742.58 samples/sec Loss 5.1726 LearningRate 0.0164 Epoch: 11 Global Step: 198470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:00,989-Speed 9575.77 samples/sec Loss 5.3809 LearningRate 0.0164 Epoch: 11 Global Step: 198480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:33:02,061-Speed 9555.33 samples/sec Loss 5.3412 LearningRate 0.0164 Epoch: 11 Global Step: 198490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:33:03,158-Speed 9339.49 samples/sec Loss 5.2851 LearningRate 0.0164 Epoch: 11 Global Step: 198500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:33:04,253-Speed 9358.69 samples/sec Loss 5.3778 LearningRate 0.0164 Epoch: 11 Global Step: 198510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:33:05,373-Speed 9147.28 samples/sec Loss 5.2820 LearningRate 0.0164 Epoch: 11 Global Step: 198520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:33:06,469-Speed 9352.38 samples/sec Loss 5.3205 LearningRate 0.0164 Epoch: 11 Global Step: 198530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:33:07,546-Speed 9515.43 samples/sec Loss 5.3080 LearningRate 0.0164 Epoch: 11 Global Step: 198540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:33:08,603-Speed 9687.59 samples/sec Loss 5.2708 LearningRate 0.0164 Epoch: 11 Global Step: 198550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:33:09,721-Speed 9169.47 samples/sec Loss 5.3159 LearningRate 0.0164 Epoch: 11 Global Step: 198560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:33:10,810-Speed 9405.25 samples/sec Loss 5.4019 LearningRate 0.0164 Epoch: 11 Global Step: 198570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:33:11,883-Speed 9555.08 samples/sec Loss 5.2668 LearningRate 0.0164 Epoch: 11 Global Step: 198580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:12,957-Speed 9533.44 samples/sec Loss 5.3346 LearningRate 0.0164 Epoch: 11 Global Step: 198590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:14,040-Speed 9461.82 samples/sec Loss 5.2358 LearningRate 0.0164 Epoch: 11 Global Step: 198600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:15,116-Speed 9525.83 samples/sec Loss 5.3701 LearningRate 0.0164 Epoch: 11 Global Step: 198610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:16,206-Speed 9399.93 samples/sec Loss 5.1620 LearningRate 0.0164 Epoch: 11 Global Step: 198620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:17,320-Speed 9197.50 samples/sec Loss 5.3142 LearningRate 0.0164 Epoch: 11 Global Step: 198630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:18,451-Speed 9062.55 samples/sec Loss 5.3006 LearningRate 0.0164 Epoch: 11 Global Step: 198640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:19,569-Speed 9159.67 samples/sec Loss 5.2598 LearningRate 0.0164 Epoch: 11 Global Step: 198650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:20,644-Speed 9534.96 samples/sec Loss 5.3434 LearningRate 0.0164 Epoch: 11 Global Step: 198660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:21,708-Speed 9634.62 samples/sec Loss 5.3752 LearningRate 0.0164 Epoch: 11 Global Step: 198670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:22,759-Speed 9745.63 samples/sec Loss 5.3339 LearningRate 0.0164 Epoch: 11 Global Step: 198680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:23,824-Speed 9615.25 samples/sec Loss 5.3169 LearningRate 0.0164 Epoch: 11 Global Step: 198690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:24,945-Speed 9141.36 samples/sec Loss 5.2978 LearningRate 0.0164 Epoch: 11 Global Step: 198700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:26,025-Speed 9491.40 samples/sec Loss 5.3001 LearningRate 0.0164 Epoch: 11 Global Step: 198710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:27,073-Speed 9770.97 samples/sec Loss 5.3486 LearningRate 0.0164 Epoch: 11 Global Step: 198720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:28,189-Speed 9181.07 samples/sec Loss 5.3241 LearningRate 0.0164 Epoch: 11 Global Step: 198730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:29,330-Speed 8980.24 samples/sec Loss 5.2713 LearningRate 0.0164 Epoch: 11 Global Step: 198740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:30,391-Speed 9660.78 samples/sec Loss 5.3575 LearningRate 0.0164 Epoch: 11 Global Step: 198750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:31,441-Speed 9760.27 samples/sec Loss 5.3344 LearningRate 0.0164 Epoch: 11 Global Step: 198760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:32,516-Speed 9528.21 samples/sec Loss 5.2831 LearningRate 0.0164 Epoch: 11 Global Step: 198770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:33,547-Speed 9937.78 samples/sec Loss 5.2972 LearningRate 0.0164 Epoch: 11 Global Step: 198780 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:33:34,628-Speed 9479.42 samples/sec Loss 5.3326 LearningRate 0.0164 Epoch: 11 Global Step: 198790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:35,752-Speed 9111.64 samples/sec Loss 5.2956 LearningRate 0.0164 Epoch: 11 Global Step: 198800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:36,821-Speed 9590.73 samples/sec Loss 5.2911 LearningRate 0.0164 Epoch: 11 Global Step: 198810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:37,874-Speed 9731.88 samples/sec Loss 5.3130 LearningRate 0.0164 Epoch: 11 Global Step: 198820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:38,957-Speed 9461.15 samples/sec Loss 5.3500 LearningRate 0.0164 Epoch: 11 Global Step: 198830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:40,034-Speed 9517.41 samples/sec Loss 5.4738 LearningRate 0.0163 Epoch: 11 Global Step: 198840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:41,149-Speed 9199.56 samples/sec Loss 5.3582 LearningRate 0.0163 Epoch: 11 Global Step: 198850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:42,241-Speed 9381.24 samples/sec Loss 5.3546 LearningRate 0.0163 Epoch: 11 Global Step: 198860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:43,322-Speed 9477.21 samples/sec Loss 5.2930 LearningRate 0.0163 Epoch: 11 Global Step: 198870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:44,423-Speed 9305.09 samples/sec Loss 5.3204 LearningRate 0.0163 Epoch: 11 Global Step: 198880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:45,497-Speed 9543.82 samples/sec Loss 5.3849 LearningRate 0.0163 Epoch: 11 Global Step: 198890 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:33:46,600-Speed 9287.01 samples/sec Loss 5.3358 LearningRate 0.0163 Epoch: 11 Global Step: 198900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:47,709-Speed 9243.66 samples/sec Loss 5.3412 LearningRate 0.0163 Epoch: 11 Global Step: 198910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:48,858-Speed 8910.51 samples/sec Loss 5.2647 LearningRate 0.0163 Epoch: 11 Global Step: 198920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:49,936-Speed 9506.62 samples/sec Loss 5.4347 LearningRate 0.0163 Epoch: 11 Global Step: 198930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:50,996-Speed 9664.30 samples/sec Loss 5.2619 LearningRate 0.0163 Epoch: 11 Global Step: 198940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:52,106-Speed 9235.46 samples/sec Loss 5.3031 LearningRate 0.0163 Epoch: 11 Global Step: 198950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:53,159-Speed 9722.85 samples/sec Loss 5.2732 LearningRate 0.0163 Epoch: 11 Global Step: 198960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:54,216-Speed 9698.79 samples/sec Loss 5.2942 LearningRate 0.0163 Epoch: 11 Global Step: 198970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:55,305-Speed 9408.18 samples/sec Loss 5.3063 LearningRate 0.0163 Epoch: 11 Global Step: 198980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:56,370-Speed 9615.25 samples/sec Loss 5.2323 LearningRate 0.0163 Epoch: 11 Global Step: 198990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:57,426-Speed 9711.33 samples/sec Loss 5.2675 LearningRate 0.0163 Epoch: 11 Global Step: 199000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:58,469-Speed 9824.09 samples/sec Loss 5.3144 LearningRate 0.0163 Epoch: 11 Global Step: 199010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:33:59,599-Speed 9072.60 samples/sec Loss 5.3021 LearningRate 0.0163 Epoch: 11 Global Step: 199020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:00,660-Speed 9657.03 samples/sec Loss 5.3152 LearningRate 0.0163 Epoch: 11 Global Step: 199030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:01,768-Speed 9245.40 samples/sec Loss 5.2455 LearningRate 0.0163 Epoch: 11 Global Step: 199040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:02,863-Speed 9358.39 samples/sec Loss 5.1508 LearningRate 0.0163 Epoch: 11 Global Step: 199050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:03,996-Speed 9045.29 samples/sec Loss 5.3595 LearningRate 0.0163 Epoch: 11 Global Step: 199060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:05,065-Speed 9585.46 samples/sec Loss 5.2901 LearningRate 0.0163 Epoch: 11 Global Step: 199070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:06,158-Speed 9367.08 samples/sec Loss 5.2896 LearningRate 0.0163 Epoch: 11 Global Step: 199080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:07,221-Speed 9638.43 samples/sec Loss 5.3450 LearningRate 0.0163 Epoch: 11 Global Step: 199090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:08,320-Speed 9323.61 samples/sec Loss 5.3568 LearningRate 0.0163 Epoch: 11 Global Step: 199100 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:34:09,392-Speed 9562.95 samples/sec Loss 5.2393 LearningRate 0.0163 Epoch: 11 Global Step: 199110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:10,453-Speed 9652.24 samples/sec Loss 5.3109 LearningRate 0.0163 Epoch: 11 Global Step: 199120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:11,526-Speed 9550.22 samples/sec Loss 5.3097 LearningRate 0.0163 Epoch: 11 Global Step: 199130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:12,626-Speed 9314.33 samples/sec Loss 5.2617 LearningRate 0.0163 Epoch: 11 Global Step: 199140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:13,767-Speed 8978.11 samples/sec Loss 5.2701 LearningRate 0.0163 Epoch: 11 Global Step: 199150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:14,887-Speed 9154.71 samples/sec Loss 5.2593 LearningRate 0.0163 Epoch: 11 Global Step: 199160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:15,966-Speed 9497.90 samples/sec Loss 5.3474 LearningRate 0.0163 Epoch: 11 Global Step: 199170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:17,034-Speed 9591.52 samples/sec Loss 5.2747 LearningRate 0.0163 Epoch: 11 Global Step: 199180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:18,113-Speed 9497.58 samples/sec Loss 5.3300 LearningRate 0.0163 Epoch: 11 Global Step: 199190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:19,223-Speed 9228.15 samples/sec Loss 5.3640 LearningRate 0.0163 Epoch: 11 Global Step: 199200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:20,311-Speed 9419.97 samples/sec Loss 5.2681 LearningRate 0.0163 Epoch: 11 Global Step: 199210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:21,374-Speed 9640.68 samples/sec Loss 5.3392 LearningRate 0.0163 Epoch: 11 Global Step: 199220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:22,406-Speed 9926.29 samples/sec Loss 5.3017 LearningRate 0.0163 Epoch: 11 Global Step: 199230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:23,457-Speed 9750.93 samples/sec Loss 5.3302 LearningRate 0.0163 Epoch: 11 Global Step: 199240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:24,539-Speed 9473.99 samples/sec Loss 5.2815 LearningRate 0.0163 Epoch: 11 Global Step: 199250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:25,609-Speed 9568.44 samples/sec Loss 5.4222 LearningRate 0.0162 Epoch: 11 Global Step: 199260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:26,692-Speed 9463.92 samples/sec Loss 5.4165 LearningRate 0.0162 Epoch: 11 Global Step: 199270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:27,825-Speed 9040.46 samples/sec Loss 5.3729 LearningRate 0.0162 Epoch: 11 Global Step: 199280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:28,886-Speed 9659.71 samples/sec Loss 5.3859 LearningRate 0.0162 Epoch: 11 Global Step: 199290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:29,959-Speed 9547.09 samples/sec Loss 5.2788 LearningRate 0.0162 Epoch: 11 Global Step: 199300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:31,024-Speed 9619.58 samples/sec Loss 5.2905 LearningRate 0.0162 Epoch: 11 Global Step: 199310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:32,103-Speed 9492.33 samples/sec Loss 5.3146 LearningRate 0.0162 Epoch: 11 Global Step: 199320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:33,180-Speed 9515.60 samples/sec Loss 5.3226 LearningRate 0.0162 Epoch: 11 Global Step: 199330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:34,300-Speed 9149.83 samples/sec Loss 5.2694 LearningRate 0.0162 Epoch: 11 Global Step: 199340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:35,385-Speed 9446.33 samples/sec Loss 5.3173 LearningRate 0.0162 Epoch: 11 Global Step: 199350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:36,489-Speed 9280.91 samples/sec Loss 5.3596 LearningRate 0.0162 Epoch: 11 Global Step: 199360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:37,572-Speed 9468.11 samples/sec Loss 5.3419 LearningRate 0.0162 Epoch: 11 Global Step: 199370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:38,638-Speed 9608.08 samples/sec Loss 5.2920 LearningRate 0.0162 Epoch: 11 Global Step: 199380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:39,704-Speed 9611.16 samples/sec Loss 5.3135 LearningRate 0.0162 Epoch: 11 Global Step: 199390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:40,778-Speed 9534.51 samples/sec Loss 5.3623 LearningRate 0.0162 Epoch: 11 Global Step: 199400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:41,863-Speed 9448.75 samples/sec Loss 5.2926 LearningRate 0.0162 Epoch: 11 Global Step: 199410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:42,977-Speed 9192.23 samples/sec Loss 5.2184 LearningRate 0.0162 Epoch: 11 Global Step: 199420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:44,036-Speed 9679.74 samples/sec Loss 5.3812 LearningRate 0.0162 Epoch: 11 Global Step: 199430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:45,112-Speed 9520.33 samples/sec Loss 5.2832 LearningRate 0.0162 Epoch: 11 Global Step: 199440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:46,236-Speed 9117.79 samples/sec Loss 5.2716 LearningRate 0.0162 Epoch: 11 Global Step: 199450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:47,272-Speed 9887.62 samples/sec Loss 5.3786 LearningRate 0.0162 Epoch: 11 Global Step: 199460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:48,379-Speed 9253.67 samples/sec Loss 5.3260 LearningRate 0.0162 Epoch: 11 Global Step: 199470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:49,504-Speed 9113.12 samples/sec Loss 5.3559 LearningRate 0.0162 Epoch: 11 Global Step: 199480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:50,621-Speed 9165.16 samples/sec Loss 5.3361 LearningRate 0.0162 Epoch: 11 Global Step: 199490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:51,695-Speed 9546.45 samples/sec Loss 5.2827 LearningRate 0.0162 Epoch: 11 Global Step: 199500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:52,821-Speed 9099.87 samples/sec Loss 5.3541 LearningRate 0.0162 Epoch: 11 Global Step: 199510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:53,980-Speed 8847.87 samples/sec Loss 5.3100 LearningRate 0.0162 Epoch: 11 Global Step: 199520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:55,073-Speed 9369.05 samples/sec Loss 5.3055 LearningRate 0.0162 Epoch: 11 Global Step: 199530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:56,165-Speed 9383.74 samples/sec Loss 5.2743 LearningRate 0.0162 Epoch: 11 Global Step: 199540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:57,251-Speed 9439.34 samples/sec Loss 5.2409 LearningRate 0.0162 Epoch: 11 Global Step: 199550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:34:58,370-Speed 9149.60 samples/sec Loss 5.3519 LearningRate 0.0162 Epoch: 11 Global Step: 199560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:34:59,468-Speed 9334.95 samples/sec Loss 5.3174 LearningRate 0.0162 Epoch: 11 Global Step: 199570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:00,543-Speed 9529.96 samples/sec Loss 5.2137 LearningRate 0.0162 Epoch: 11 Global Step: 199580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:01,611-Speed 9590.42 samples/sec Loss 5.2725 LearningRate 0.0162 Epoch: 11 Global Step: 199590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:02,714-Speed 9287.82 samples/sec Loss 5.3329 LearningRate 0.0162 Epoch: 11 Global Step: 199600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:03,800-Speed 9434.70 samples/sec Loss 5.3028 LearningRate 0.0162 Epoch: 11 Global Step: 199610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:04,874-Speed 9540.35 samples/sec Loss 5.2798 LearningRate 0.0162 Epoch: 11 Global Step: 199620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:05,967-Speed 9381.04 samples/sec Loss 5.3040 LearningRate 0.0162 Epoch: 11 Global Step: 199630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:07,070-Speed 9285.24 samples/sec Loss 5.3552 LearningRate 0.0162 Epoch: 11 Global Step: 199640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:08,179-Speed 9241.97 samples/sec Loss 5.4156 LearningRate 0.0162 Epoch: 11 Global Step: 199650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:09,283-Speed 9276.92 samples/sec Loss 5.2814 LearningRate 0.0162 Epoch: 11 Global Step: 199660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:10,383-Speed 9317.43 samples/sec Loss 5.3512 LearningRate 0.0161 Epoch: 11 Global Step: 199670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:11,516-Speed 9043.88 samples/sec Loss 5.3093 LearningRate 0.0161 Epoch: 11 Global Step: 199680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:12,623-Speed 9260.69 samples/sec Loss 5.2085 LearningRate 0.0161 Epoch: 11 Global Step: 199690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:13,683-Speed 9663.28 samples/sec Loss 5.3209 LearningRate 0.0161 Epoch: 11 Global Step: 199700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:14,806-Speed 9124.70 samples/sec Loss 5.3375 LearningRate 0.0161 Epoch: 11 Global Step: 199710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:15,901-Speed 9359.90 samples/sec Loss 5.3182 LearningRate 0.0161 Epoch: 11 Global Step: 199720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:17,032-Speed 9058.39 samples/sec Loss 5.3858 LearningRate 0.0161 Epoch: 11 Global Step: 199730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:18,099-Speed 9601.62 samples/sec Loss 5.2814 LearningRate 0.0161 Epoch: 11 Global Step: 199740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:19,166-Speed 9606.74 samples/sec Loss 5.3933 LearningRate 0.0161 Epoch: 11 Global Step: 199750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:20,251-Speed 9442.68 samples/sec Loss 5.1923 LearningRate 0.0161 Epoch: 11 Global Step: 199760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:21,348-Speed 9341.83 samples/sec Loss 5.4207 LearningRate 0.0161 Epoch: 11 Global Step: 199770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:22,472-Speed 9113.44 samples/sec Loss 5.2858 LearningRate 0.0161 Epoch: 11 Global Step: 199780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:23,528-Speed 9706.94 samples/sec Loss 5.3074 LearningRate 0.0161 Epoch: 11 Global Step: 199790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:35:24,607-Speed 9496.81 samples/sec Loss 5.2302 LearningRate 0.0161 Epoch: 11 Global Step: 199800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:25,726-Speed 9154.62 samples/sec Loss 5.3190 LearningRate 0.0161 Epoch: 11 Global Step: 199810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:26,819-Speed 9367.72 samples/sec Loss 5.3685 LearningRate 0.0161 Epoch: 11 Global Step: 199820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:27,933-Speed 9197.65 samples/sec Loss 5.3212 LearningRate 0.0161 Epoch: 11 Global Step: 199830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:29,038-Speed 9279.72 samples/sec Loss 5.4038 LearningRate 0.0161 Epoch: 11 Global Step: 199840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:30,147-Speed 9231.41 samples/sec Loss 5.4289 LearningRate 0.0161 Epoch: 11 Global Step: 199850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:31,208-Speed 9663.74 samples/sec Loss 5.2619 LearningRate 0.0161 Epoch: 11 Global Step: 199860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:32,305-Speed 9339.76 samples/sec Loss 5.2421 LearningRate 0.0161 Epoch: 11 Global Step: 199870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:33,403-Speed 9337.90 samples/sec Loss 5.2463 LearningRate 0.0161 Epoch: 11 Global Step: 199880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:34,455-Speed 9742.15 samples/sec Loss 5.2011 LearningRate 0.0161 Epoch: 11 Global Step: 199890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:35,522-Speed 9596.02 samples/sec Loss 5.1276 LearningRate 0.0161 Epoch: 11 Global Step: 199900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:36,582-Speed 9670.38 samples/sec Loss 5.2622 LearningRate 0.0161 Epoch: 11 Global Step: 199910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:37,641-Speed 9673.91 samples/sec Loss 5.2622 LearningRate 0.0161 Epoch: 11 Global Step: 199920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:38,791-Speed 8909.84 samples/sec Loss 5.3144 LearningRate 0.0161 Epoch: 11 Global Step: 199930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:39,876-Speed 9445.16 samples/sec Loss 5.3433 LearningRate 0.0161 Epoch: 11 Global Step: 199940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:40,960-Speed 9449.66 samples/sec Loss 5.2989 LearningRate 0.0161 Epoch: 11 Global Step: 199950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:42,044-Speed 9450.59 samples/sec Loss 5.3021 LearningRate 0.0161 Epoch: 11 Global Step: 199960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:43,189-Speed 8953.10 samples/sec Loss 5.3182 LearningRate 0.0161 Epoch: 11 Global Step: 199970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:44,293-Speed 9282.88 samples/sec Loss 5.3473 LearningRate 0.0161 Epoch: 11 Global Step: 199980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:45,399-Speed 9261.33 samples/sec Loss 5.3355 LearningRate 0.0161 Epoch: 11 Global Step: 199990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:35:46,436-Speed 9878.96 samples/sec Loss 5.3712 LearningRate 0.0161 Epoch: 11 Global Step: 200000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:36:08,610-[lfw][200000]XNorm: 8.798848 Training: 2022-04-11 19:36:08,610-[lfw][200000]Accuracy-Flip: 0.99650+-0.00283 Training: 2022-04-11 19:36:08,611-[lfw][200000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:36:34,132-[cfp_fp][200000]XNorm: 7.570795 Training: 2022-04-11 19:36:34,133-[cfp_fp][200000]Accuracy-Flip: 0.96771+-0.00858 Training: 2022-04-11 19:36:34,133-[cfp_fp][200000]Accuracy-Highest: 0.96771 Training: 2022-04-11 19:36:56,129-[agedb_30][200000]XNorm: 8.513522 Training: 2022-04-11 19:36:56,130-[agedb_30][200000]Accuracy-Flip: 0.96467+-0.00971 Training: 2022-04-11 19:36:56,131-[agedb_30][200000]Accuracy-Highest: 0.96983 Training: 2022-04-11 19:36:57,221-Speed 144.67 samples/sec Loss 5.3801 LearningRate 0.0161 Epoch: 11 Global Step: 200010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:36:58,352-Speed 9059.76 samples/sec Loss 5.3245 LearningRate 0.0161 Epoch: 11 Global Step: 200020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:36:59,432-Speed 9481.19 samples/sec Loss 5.3992 LearningRate 0.0161 Epoch: 11 Global Step: 200030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:00,517-Speed 9450.56 samples/sec Loss 5.3688 LearningRate 0.0161 Epoch: 11 Global Step: 200040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:01,594-Speed 9506.48 samples/sec Loss 5.2778 LearningRate 0.0161 Epoch: 11 Global Step: 200050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:02,681-Speed 9432.07 samples/sec Loss 5.2911 LearningRate 0.0161 Epoch: 11 Global Step: 200060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:37:03,790-Speed 9235.00 samples/sec Loss 5.1641 LearningRate 0.0161 Epoch: 11 Global Step: 200070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:37:04,886-Speed 9351.96 samples/sec Loss 5.2919 LearningRate 0.0161 Epoch: 11 Global Step: 200080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:37:05,963-Speed 9510.69 samples/sec Loss 5.3231 LearningRate 0.0160 Epoch: 11 Global Step: 200090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:37:07,061-Speed 9332.37 samples/sec Loss 5.3277 LearningRate 0.0160 Epoch: 11 Global Step: 200100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:37:08,152-Speed 9391.95 samples/sec Loss 5.3080 LearningRate 0.0160 Epoch: 11 Global Step: 200110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:37:09,231-Speed 9494.56 samples/sec Loss 5.2699 LearningRate 0.0160 Epoch: 11 Global Step: 200120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:37:10,317-Speed 9433.99 samples/sec Loss 5.3509 LearningRate 0.0160 Epoch: 11 Global Step: 200130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:37:11,424-Speed 9257.73 samples/sec Loss 5.2632 LearningRate 0.0160 Epoch: 11 Global Step: 200140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:37:12,493-Speed 9586.27 samples/sec Loss 5.2691 LearningRate 0.0160 Epoch: 11 Global Step: 200150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:37:13,611-Speed 9164.55 samples/sec Loss 5.3178 LearningRate 0.0160 Epoch: 11 Global Step: 200160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:14,696-Speed 9437.34 samples/sec Loss 5.4581 LearningRate 0.0160 Epoch: 11 Global Step: 200170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:15,773-Speed 9514.74 samples/sec Loss 5.3763 LearningRate 0.0160 Epoch: 11 Global Step: 200180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:16,898-Speed 9110.43 samples/sec Loss 5.3643 LearningRate 0.0160 Epoch: 11 Global Step: 200190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:17,995-Speed 9336.81 samples/sec Loss 5.2771 LearningRate 0.0160 Epoch: 11 Global Step: 200200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:19,082-Speed 9426.61 samples/sec Loss 5.4147 LearningRate 0.0160 Epoch: 11 Global Step: 200210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:20,127-Speed 9806.11 samples/sec Loss 5.2634 LearningRate 0.0160 Epoch: 11 Global Step: 200220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:21,235-Speed 9247.09 samples/sec Loss 5.1984 LearningRate 0.0160 Epoch: 11 Global Step: 200230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:22,343-Speed 9254.77 samples/sec Loss 5.1925 LearningRate 0.0160 Epoch: 11 Global Step: 200240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:23,452-Speed 9239.52 samples/sec Loss 5.1695 LearningRate 0.0160 Epoch: 11 Global Step: 200250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:24,537-Speed 9440.86 samples/sec Loss 5.4040 LearningRate 0.0160 Epoch: 11 Global Step: 200260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:25,664-Speed 9091.87 samples/sec Loss 5.2301 LearningRate 0.0160 Epoch: 11 Global Step: 200270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:26,749-Speed 9444.92 samples/sec Loss 5.3398 LearningRate 0.0160 Epoch: 11 Global Step: 200280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:28,079-Speed 7697.90 samples/sec Loss 5.3355 LearningRate 0.0160 Epoch: 11 Global Step: 200290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:55,540-Speed 372.91 samples/sec Loss 4.7374 LearningRate 0.0160 Epoch: 12 Global Step: 200300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:57,379-Speed 5572.66 samples/sec Loss 4.5564 LearningRate 0.0160 Epoch: 12 Global Step: 200310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:58,619-Speed 8265.20 samples/sec Loss 4.5416 LearningRate 0.0160 Epoch: 12 Global Step: 200320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:37:59,930-Speed 7816.95 samples/sec Loss 4.5539 LearningRate 0.0160 Epoch: 12 Global Step: 200330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:01,353-Speed 7203.19 samples/sec Loss 4.5835 LearningRate 0.0160 Epoch: 12 Global Step: 200340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:02,797-Speed 7094.98 samples/sec Loss 4.6246 LearningRate 0.0160 Epoch: 12 Global Step: 200350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:03,859-Speed 9652.97 samples/sec Loss 4.5743 LearningRate 0.0160 Epoch: 12 Global Step: 200360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:04,950-Speed 9387.76 samples/sec Loss 4.5719 LearningRate 0.0160 Epoch: 12 Global Step: 200370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:06,015-Speed 9624.98 samples/sec Loss 4.5592 LearningRate 0.0160 Epoch: 12 Global Step: 200380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:07,256-Speed 8258.69 samples/sec Loss 4.5833 LearningRate 0.0160 Epoch: 12 Global Step: 200390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:08,380-Speed 9116.11 samples/sec Loss 4.5994 LearningRate 0.0160 Epoch: 12 Global Step: 200400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:09,487-Speed 9255.49 samples/sec Loss 4.6441 LearningRate 0.0160 Epoch: 12 Global Step: 200410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:10,581-Speed 9369.77 samples/sec Loss 4.5565 LearningRate 0.0160 Epoch: 12 Global Step: 200420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:11,716-Speed 9031.62 samples/sec Loss 4.5087 LearningRate 0.0160 Epoch: 12 Global Step: 200430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:12,828-Speed 9218.64 samples/sec Loss 4.5497 LearningRate 0.0160 Epoch: 12 Global Step: 200440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:13,941-Speed 9201.21 samples/sec Loss 4.6402 LearningRate 0.0160 Epoch: 12 Global Step: 200450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:15,032-Speed 9397.30 samples/sec Loss 4.5633 LearningRate 0.0160 Epoch: 12 Global Step: 200460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:16,108-Speed 9532.82 samples/sec Loss 4.5854 LearningRate 0.0160 Epoch: 12 Global Step: 200470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:17,254-Speed 8938.59 samples/sec Loss 4.5104 LearningRate 0.0160 Epoch: 12 Global Step: 200480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:18,403-Speed 8918.78 samples/sec Loss 4.5589 LearningRate 0.0160 Epoch: 12 Global Step: 200490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:19,526-Speed 9125.93 samples/sec Loss 4.5186 LearningRate 0.0160 Epoch: 12 Global Step: 200500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:20,590-Speed 9629.94 samples/sec Loss 4.6101 LearningRate 0.0159 Epoch: 12 Global Step: 200510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:21,686-Speed 9347.31 samples/sec Loss 4.6141 LearningRate 0.0159 Epoch: 12 Global Step: 200520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:22,800-Speed 9202.50 samples/sec Loss 4.6651 LearningRate 0.0159 Epoch: 12 Global Step: 200530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:23,919-Speed 9156.90 samples/sec Loss 4.6247 LearningRate 0.0159 Epoch: 12 Global Step: 200540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:25,068-Speed 8913.74 samples/sec Loss 4.5887 LearningRate 0.0159 Epoch: 12 Global Step: 200550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:26,207-Speed 8999.48 samples/sec Loss 4.6099 LearningRate 0.0159 Epoch: 12 Global Step: 200560 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:38:27,359-Speed 8892.71 samples/sec Loss 4.5717 LearningRate 0.0159 Epoch: 12 Global Step: 200570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:28,441-Speed 9469.89 samples/sec Loss 4.6042 LearningRate 0.0159 Epoch: 12 Global Step: 200580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:29,525-Speed 9455.02 samples/sec Loss 4.6425 LearningRate 0.0159 Epoch: 12 Global Step: 200590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:30,598-Speed 9551.07 samples/sec Loss 4.6221 LearningRate 0.0159 Epoch: 12 Global Step: 200600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:31,761-Speed 8808.96 samples/sec Loss 4.7000 LearningRate 0.0159 Epoch: 12 Global Step: 200610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:32,862-Speed 9309.39 samples/sec Loss 4.5734 LearningRate 0.0159 Epoch: 12 Global Step: 200620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:33,930-Speed 9599.50 samples/sec Loss 4.5932 LearningRate 0.0159 Epoch: 12 Global Step: 200630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:35,045-Speed 9191.32 samples/sec Loss 4.5536 LearningRate 0.0159 Epoch: 12 Global Step: 200640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:36,109-Speed 9630.84 samples/sec Loss 4.6190 LearningRate 0.0159 Epoch: 12 Global Step: 200650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:37,257-Speed 8921.91 samples/sec Loss 4.6208 LearningRate 0.0159 Epoch: 12 Global Step: 200660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:38,355-Speed 9333.12 samples/sec Loss 4.5183 LearningRate 0.0159 Epoch: 12 Global Step: 200670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:39,449-Speed 9366.30 samples/sec Loss 4.6455 LearningRate 0.0159 Epoch: 12 Global Step: 200680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:40,554-Speed 9275.31 samples/sec Loss 4.6957 LearningRate 0.0159 Epoch: 12 Global Step: 200690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:41,683-Speed 9077.77 samples/sec Loss 4.6222 LearningRate 0.0159 Epoch: 12 Global Step: 200700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:42,750-Speed 9599.24 samples/sec Loss 4.6090 LearningRate 0.0159 Epoch: 12 Global Step: 200710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:44,388-Speed 6254.48 samples/sec Loss 4.6129 LearningRate 0.0159 Epoch: 12 Global Step: 200720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:45,635-Speed 8213.09 samples/sec Loss 4.7218 LearningRate 0.0159 Epoch: 12 Global Step: 200730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:47,074-Speed 7121.18 samples/sec Loss 4.6718 LearningRate 0.0159 Epoch: 12 Global Step: 200740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:48,139-Speed 9613.59 samples/sec Loss 4.6204 LearningRate 0.0159 Epoch: 12 Global Step: 200750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:49,256-Speed 9176.42 samples/sec Loss 4.7092 LearningRate 0.0159 Epoch: 12 Global Step: 200760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:50,496-Speed 8265.95 samples/sec Loss 4.6302 LearningRate 0.0159 Epoch: 12 Global Step: 200770 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:38:51,576-Speed 9495.79 samples/sec Loss 4.5472 LearningRate 0.0159 Epoch: 12 Global Step: 200780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:52,699-Speed 9120.23 samples/sec Loss 4.6720 LearningRate 0.0159 Epoch: 12 Global Step: 200790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:53,772-Speed 9551.64 samples/sec Loss 4.6089 LearningRate 0.0159 Epoch: 12 Global Step: 200800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:54,834-Speed 9646.04 samples/sec Loss 4.5638 LearningRate 0.0159 Epoch: 12 Global Step: 200810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:55,919-Speed 9449.39 samples/sec Loss 4.6095 LearningRate 0.0159 Epoch: 12 Global Step: 200820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:57,002-Speed 9461.12 samples/sec Loss 4.7544 LearningRate 0.0159 Epoch: 12 Global Step: 200830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:58,102-Speed 9308.50 samples/sec Loss 4.5940 LearningRate 0.0159 Epoch: 12 Global Step: 200840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:38:59,223-Speed 9146.41 samples/sec Loss 4.6085 LearningRate 0.0159 Epoch: 12 Global Step: 200850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:00,318-Speed 9357.60 samples/sec Loss 4.6463 LearningRate 0.0159 Epoch: 12 Global Step: 200860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:01,418-Speed 9314.49 samples/sec Loss 4.5787 LearningRate 0.0159 Epoch: 12 Global Step: 200870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:02,486-Speed 9590.66 samples/sec Loss 4.5889 LearningRate 0.0159 Epoch: 12 Global Step: 200880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:03,586-Speed 9315.98 samples/sec Loss 4.6748 LearningRate 0.0159 Epoch: 12 Global Step: 200890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:04,692-Speed 9264.94 samples/sec Loss 4.7241 LearningRate 0.0159 Epoch: 12 Global Step: 200900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:05,820-Speed 9079.33 samples/sec Loss 4.7110 LearningRate 0.0159 Epoch: 12 Global Step: 200910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:06,900-Speed 9488.52 samples/sec Loss 4.6708 LearningRate 0.0158 Epoch: 12 Global Step: 200920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:08,173-Speed 8053.77 samples/sec Loss 4.6737 LearningRate 0.0158 Epoch: 12 Global Step: 200930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:09,353-Speed 8684.29 samples/sec Loss 4.6085 LearningRate 0.0158 Epoch: 12 Global Step: 200940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:10,453-Speed 9317.05 samples/sec Loss 4.6889 LearningRate 0.0158 Epoch: 12 Global Step: 200950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:11,548-Speed 9358.53 samples/sec Loss 4.6546 LearningRate 0.0158 Epoch: 12 Global Step: 200960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:12,617-Speed 9581.11 samples/sec Loss 4.6997 LearningRate 0.0158 Epoch: 12 Global Step: 200970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:13,698-Speed 9481.21 samples/sec Loss 4.6269 LearningRate 0.0158 Epoch: 12 Global Step: 200980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:14,794-Speed 9350.86 samples/sec Loss 4.6516 LearningRate 0.0158 Epoch: 12 Global Step: 200990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:15,882-Speed 9410.85 samples/sec Loss 4.6037 LearningRate 0.0158 Epoch: 12 Global Step: 201000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:17,016-Speed 9038.83 samples/sec Loss 4.6309 LearningRate 0.0158 Epoch: 12 Global Step: 201010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:18,139-Speed 9126.51 samples/sec Loss 4.6821 LearningRate 0.0158 Epoch: 12 Global Step: 201020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:19,247-Speed 9239.26 samples/sec Loss 4.6666 LearningRate 0.0158 Epoch: 12 Global Step: 201030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:20,323-Speed 9528.56 samples/sec Loss 4.6333 LearningRate 0.0158 Epoch: 12 Global Step: 201040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:21,409-Speed 9434.75 samples/sec Loss 4.6683 LearningRate 0.0158 Epoch: 12 Global Step: 201050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:22,498-Speed 9409.41 samples/sec Loss 4.6383 LearningRate 0.0158 Epoch: 12 Global Step: 201060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:23,590-Speed 9380.16 samples/sec Loss 4.5913 LearningRate 0.0158 Epoch: 12 Global Step: 201070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:24,684-Speed 9366.24 samples/sec Loss 4.6217 LearningRate 0.0158 Epoch: 12 Global Step: 201080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:25,778-Speed 9361.94 samples/sec Loss 4.6320 LearningRate 0.0158 Epoch: 12 Global Step: 201090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:26,960-Speed 8670.07 samples/sec Loss 4.7373 LearningRate 0.0158 Epoch: 12 Global Step: 201100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:28,085-Speed 9107.81 samples/sec Loss 4.6236 LearningRate 0.0158 Epoch: 12 Global Step: 201110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:29,234-Speed 8921.21 samples/sec Loss 4.6000 LearningRate 0.0158 Epoch: 12 Global Step: 201120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:30,330-Speed 9348.81 samples/sec Loss 4.7455 LearningRate 0.0158 Epoch: 12 Global Step: 201130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:31,426-Speed 9351.88 samples/sec Loss 4.6155 LearningRate 0.0158 Epoch: 12 Global Step: 201140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:32,515-Speed 9409.18 samples/sec Loss 4.7721 LearningRate 0.0158 Epoch: 12 Global Step: 201150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:33,603-Speed 9422.20 samples/sec Loss 4.6712 LearningRate 0.0158 Epoch: 12 Global Step: 201160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:34,723-Speed 9150.03 samples/sec Loss 4.7304 LearningRate 0.0158 Epoch: 12 Global Step: 201170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:35,831-Speed 9245.43 samples/sec Loss 4.7479 LearningRate 0.0158 Epoch: 12 Global Step: 201180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:36,971-Speed 8993.21 samples/sec Loss 4.8029 LearningRate 0.0158 Epoch: 12 Global Step: 201190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:38,062-Speed 9395.93 samples/sec Loss 4.7155 LearningRate 0.0158 Epoch: 12 Global Step: 201200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:39,216-Speed 8875.12 samples/sec Loss 4.6468 LearningRate 0.0158 Epoch: 12 Global Step: 201210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:40,330-Speed 9199.12 samples/sec Loss 4.7041 LearningRate 0.0158 Epoch: 12 Global Step: 201220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:41,433-Speed 9295.62 samples/sec Loss 4.7263 LearningRate 0.0158 Epoch: 12 Global Step: 201230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:42,550-Speed 9176.69 samples/sec Loss 4.7032 LearningRate 0.0158 Epoch: 12 Global Step: 201240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:43,654-Speed 9277.30 samples/sec Loss 4.6617 LearningRate 0.0158 Epoch: 12 Global Step: 201250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:44,728-Speed 9543.01 samples/sec Loss 4.7505 LearningRate 0.0158 Epoch: 12 Global Step: 201260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:45,783-Speed 9713.79 samples/sec Loss 4.7215 LearningRate 0.0158 Epoch: 12 Global Step: 201270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:46,892-Speed 9237.61 samples/sec Loss 4.6893 LearningRate 0.0158 Epoch: 12 Global Step: 201280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:47,954-Speed 9652.43 samples/sec Loss 4.6691 LearningRate 0.0158 Epoch: 12 Global Step: 201290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:39:49,057-Speed 9282.64 samples/sec Loss 4.6966 LearningRate 0.0158 Epoch: 12 Global Step: 201300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:50,206-Speed 8929.19 samples/sec Loss 4.6344 LearningRate 0.0158 Epoch: 12 Global Step: 201310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:51,295-Speed 9408.32 samples/sec Loss 4.7172 LearningRate 0.0158 Epoch: 12 Global Step: 201320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:52,421-Speed 9098.23 samples/sec Loss 4.8109 LearningRate 0.0158 Epoch: 12 Global Step: 201330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:53,520-Speed 9322.85 samples/sec Loss 4.6645 LearningRate 0.0157 Epoch: 12 Global Step: 201340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:54,594-Speed 9545.75 samples/sec Loss 4.6932 LearningRate 0.0157 Epoch: 12 Global Step: 201350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:55,714-Speed 9142.68 samples/sec Loss 4.7488 LearningRate 0.0157 Epoch: 12 Global Step: 201360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:56,838-Speed 9119.24 samples/sec Loss 4.7987 LearningRate 0.0157 Epoch: 12 Global Step: 201370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:57,951-Speed 9201.65 samples/sec Loss 4.6559 LearningRate 0.0157 Epoch: 12 Global Step: 201380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:39:59,093-Speed 8975.00 samples/sec Loss 4.7138 LearningRate 0.0157 Epoch: 12 Global Step: 201390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:00,189-Speed 9348.64 samples/sec Loss 4.7178 LearningRate 0.0157 Epoch: 12 Global Step: 201400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:01,258-Speed 9588.17 samples/sec Loss 4.7511 LearningRate 0.0157 Epoch: 12 Global Step: 201410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:02,332-Speed 9535.77 samples/sec Loss 4.7195 LearningRate 0.0157 Epoch: 12 Global Step: 201420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:40:03,423-Speed 9390.17 samples/sec Loss 4.6947 LearningRate 0.0157 Epoch: 12 Global Step: 201430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:40:04,498-Speed 9536.05 samples/sec Loss 4.7451 LearningRate 0.0157 Epoch: 12 Global Step: 201440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:40:05,584-Speed 9432.30 samples/sec Loss 4.7321 LearningRate 0.0157 Epoch: 12 Global Step: 201450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:40:06,706-Speed 9127.76 samples/sec Loss 4.7380 LearningRate 0.0157 Epoch: 12 Global Step: 201460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:40:07,765-Speed 9686.07 samples/sec Loss 4.7006 LearningRate 0.0157 Epoch: 12 Global Step: 201470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:40:08,888-Speed 9125.57 samples/sec Loss 4.6844 LearningRate 0.0157 Epoch: 12 Global Step: 201480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:40:10,010-Speed 9132.90 samples/sec Loss 4.6774 LearningRate 0.0157 Epoch: 12 Global Step: 201490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:40:11,058-Speed 9783.72 samples/sec Loss 4.7153 LearningRate 0.0157 Epoch: 12 Global Step: 201500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:40:12,118-Speed 9667.53 samples/sec Loss 4.7678 LearningRate 0.0157 Epoch: 12 Global Step: 201510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:40:13,201-Speed 9452.80 samples/sec Loss 4.7484 LearningRate 0.0157 Epoch: 12 Global Step: 201520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:14,307-Speed 9266.53 samples/sec Loss 4.7237 LearningRate 0.0157 Epoch: 12 Global Step: 201530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:15,568-Speed 8127.71 samples/sec Loss 4.6994 LearningRate 0.0157 Epoch: 12 Global Step: 201540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:16,679-Speed 9222.61 samples/sec Loss 4.8221 LearningRate 0.0157 Epoch: 12 Global Step: 201550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:17,802-Speed 9118.42 samples/sec Loss 4.6015 LearningRate 0.0157 Epoch: 12 Global Step: 201560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:18,929-Speed 9095.17 samples/sec Loss 4.8317 LearningRate 0.0157 Epoch: 12 Global Step: 201570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:20,045-Speed 9188.64 samples/sec Loss 4.6217 LearningRate 0.0157 Epoch: 12 Global Step: 201580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:21,220-Speed 8715.87 samples/sec Loss 4.7397 LearningRate 0.0157 Epoch: 12 Global Step: 201590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:22,303-Speed 9464.05 samples/sec Loss 4.6670 LearningRate 0.0157 Epoch: 12 Global Step: 201600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:23,678-Speed 7452.03 samples/sec Loss 4.7307 LearningRate 0.0157 Epoch: 12 Global Step: 201610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:24,729-Speed 9755.59 samples/sec Loss 4.8012 LearningRate 0.0157 Epoch: 12 Global Step: 201620 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:40:25,803-Speed 9538.20 samples/sec Loss 4.7188 LearningRate 0.0157 Epoch: 12 Global Step: 201630 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:40:26,870-Speed 9601.82 samples/sec Loss 4.7714 LearningRate 0.0157 Epoch: 12 Global Step: 201640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:28,057-Speed 8635.24 samples/sec Loss 4.7055 LearningRate 0.0157 Epoch: 12 Global Step: 201650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:29,187-Speed 9062.32 samples/sec Loss 4.6806 LearningRate 0.0157 Epoch: 12 Global Step: 201660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:30,315-Speed 9089.25 samples/sec Loss 4.7446 LearningRate 0.0157 Epoch: 12 Global Step: 201670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:31,438-Speed 9127.87 samples/sec Loss 4.7051 LearningRate 0.0157 Epoch: 12 Global Step: 201680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:32,589-Speed 8895.23 samples/sec Loss 4.6981 LearningRate 0.0157 Epoch: 12 Global Step: 201690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:33,742-Speed 8889.63 samples/sec Loss 4.7220 LearningRate 0.0157 Epoch: 12 Global Step: 201700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:34,872-Speed 9070.91 samples/sec Loss 4.7558 LearningRate 0.0157 Epoch: 12 Global Step: 201710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:35,974-Speed 9292.04 samples/sec Loss 4.7939 LearningRate 0.0157 Epoch: 12 Global Step: 201720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:37,068-Speed 9367.42 samples/sec Loss 4.9498 LearningRate 0.0157 Epoch: 12 Global Step: 201730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:38,150-Speed 9478.06 samples/sec Loss 4.8290 LearningRate 0.0157 Epoch: 12 Global Step: 201740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:39,226-Speed 9523.15 samples/sec Loss 4.7227 LearningRate 0.0157 Epoch: 12 Global Step: 201750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:40,324-Speed 9336.30 samples/sec Loss 4.8509 LearningRate 0.0157 Epoch: 12 Global Step: 201760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:41,417-Speed 9366.94 samples/sec Loss 4.6612 LearningRate 0.0156 Epoch: 12 Global Step: 201770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:42,482-Speed 9627.46 samples/sec Loss 4.7413 LearningRate 0.0156 Epoch: 12 Global Step: 201780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:43,604-Speed 9136.93 samples/sec Loss 4.6815 LearningRate 0.0156 Epoch: 12 Global Step: 201790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:44,689-Speed 9442.15 samples/sec Loss 4.7541 LearningRate 0.0156 Epoch: 12 Global Step: 201800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:45,777-Speed 9419.58 samples/sec Loss 4.7560 LearningRate 0.0156 Epoch: 12 Global Step: 201810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:46,910-Speed 9048.91 samples/sec Loss 4.7273 LearningRate 0.0156 Epoch: 12 Global Step: 201820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:47,984-Speed 9539.51 samples/sec Loss 4.7866 LearningRate 0.0156 Epoch: 12 Global Step: 201830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:49,119-Speed 9027.54 samples/sec Loss 4.7080 LearningRate 0.0156 Epoch: 12 Global Step: 201840 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:40:50,185-Speed 9614.78 samples/sec Loss 4.7650 LearningRate 0.0156 Epoch: 12 Global Step: 201850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:51,270-Speed 9445.18 samples/sec Loss 4.6597 LearningRate 0.0156 Epoch: 12 Global Step: 201860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:52,441-Speed 8744.20 samples/sec Loss 4.7923 LearningRate 0.0156 Epoch: 12 Global Step: 201870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:53,543-Speed 9294.94 samples/sec Loss 4.7906 LearningRate 0.0156 Epoch: 12 Global Step: 201880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:54,625-Speed 9471.18 samples/sec Loss 4.7885 LearningRate 0.0156 Epoch: 12 Global Step: 201890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:55,715-Speed 9406.09 samples/sec Loss 4.7960 LearningRate 0.0156 Epoch: 12 Global Step: 201900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:56,859-Speed 8952.39 samples/sec Loss 4.7686 LearningRate 0.0156 Epoch: 12 Global Step: 201910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:58,054-Speed 8579.66 samples/sec Loss 4.7912 LearningRate 0.0156 Epoch: 12 Global Step: 201920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:40:59,149-Speed 9357.87 samples/sec Loss 4.8263 LearningRate 0.0156 Epoch: 12 Global Step: 201930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:41:00,242-Speed 9373.04 samples/sec Loss 4.7970 LearningRate 0.0156 Epoch: 12 Global Step: 201940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:41:01,410-Speed 8780.71 samples/sec Loss 4.7284 LearningRate 0.0156 Epoch: 12 Global Step: 201950 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:41:02,486-Speed 9517.59 samples/sec Loss 4.6900 LearningRate 0.0156 Epoch: 12 Global Step: 201960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:41:03,571-Speed 9443.16 samples/sec Loss 4.7523 LearningRate 0.0156 Epoch: 12 Global Step: 201970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:41:04,690-Speed 9266.86 samples/sec Loss 4.7835 LearningRate 0.0156 Epoch: 12 Global Step: 201980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:41:05,753-Speed 9648.85 samples/sec Loss 4.8489 LearningRate 0.0156 Epoch: 12 Global Step: 201990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:41:06,812-Speed 9672.94 samples/sec Loss 4.8108 LearningRate 0.0156 Epoch: 12 Global Step: 202000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:41:28,659-[lfw][202000]XNorm: 8.694646 Training: 2022-04-11 19:41:28,660-[lfw][202000]Accuracy-Flip: 0.99617+-0.00325 Training: 2022-04-11 19:41:28,660-[lfw][202000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:41:53,851-[cfp_fp][202000]XNorm: 7.466227 Training: 2022-04-11 19:41:53,852-[cfp_fp][202000]Accuracy-Flip: 0.96171+-0.00921 Training: 2022-04-11 19:41:53,852-[cfp_fp][202000]Accuracy-Highest: 0.96771 Training: 2022-04-11 19:42:15,575-[agedb_30][202000]XNorm: 8.526358 Training: 2022-04-11 19:42:15,576-[agedb_30][202000]Accuracy-Flip: 0.96667+-0.01057 Training: 2022-04-11 19:42:15,576-[agedb_30][202000]Accuracy-Highest: 0.96983 Training: 2022-04-11 19:42:16,702-Speed 146.52 samples/sec Loss 4.8032 LearningRate 0.0156 Epoch: 12 Global Step: 202010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:17,771-Speed 9588.82 samples/sec Loss 4.7937 LearningRate 0.0156 Epoch: 12 Global Step: 202020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:18,883-Speed 9213.74 samples/sec Loss 4.7664 LearningRate 0.0156 Epoch: 12 Global Step: 202030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:19,970-Speed 9421.15 samples/sec Loss 4.7829 LearningRate 0.0156 Epoch: 12 Global Step: 202040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:21,072-Speed 9306.40 samples/sec Loss 4.8075 LearningRate 0.0156 Epoch: 12 Global Step: 202050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:22,151-Speed 9494.63 samples/sec Loss 4.7980 LearningRate 0.0156 Epoch: 12 Global Step: 202060 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:42:23,196-Speed 9805.32 samples/sec Loss 4.8458 LearningRate 0.0156 Epoch: 12 Global Step: 202070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:24,258-Speed 9648.92 samples/sec Loss 4.7667 LearningRate 0.0156 Epoch: 12 Global Step: 202080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:25,381-Speed 9124.62 samples/sec Loss 4.7678 LearningRate 0.0156 Epoch: 12 Global Step: 202090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:26,468-Speed 9432.55 samples/sec Loss 4.7590 LearningRate 0.0156 Epoch: 12 Global Step: 202100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:27,526-Speed 9676.79 samples/sec Loss 4.8679 LearningRate 0.0156 Epoch: 12 Global Step: 202110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:28,599-Speed 9551.43 samples/sec Loss 4.8647 LearningRate 0.0156 Epoch: 12 Global Step: 202120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:29,696-Speed 9432.48 samples/sec Loss 4.7708 LearningRate 0.0156 Epoch: 12 Global Step: 202130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:30,807-Speed 9222.49 samples/sec Loss 4.8198 LearningRate 0.0156 Epoch: 12 Global Step: 202140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:31,880-Speed 9547.16 samples/sec Loss 4.7478 LearningRate 0.0156 Epoch: 12 Global Step: 202150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:32,994-Speed 9200.39 samples/sec Loss 4.8309 LearningRate 0.0156 Epoch: 12 Global Step: 202160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:34,044-Speed 9762.04 samples/sec Loss 4.8555 LearningRate 0.0156 Epoch: 12 Global Step: 202170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:35,112-Speed 9591.15 samples/sec Loss 4.9010 LearningRate 0.0156 Epoch: 12 Global Step: 202180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:36,192-Speed 9491.33 samples/sec Loss 4.7993 LearningRate 0.0155 Epoch: 12 Global Step: 202190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:37,298-Speed 9260.41 samples/sec Loss 4.8646 LearningRate 0.0155 Epoch: 12 Global Step: 202200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:38,389-Speed 9399.76 samples/sec Loss 4.7794 LearningRate 0.0155 Epoch: 12 Global Step: 202210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:42:39,469-Speed 9485.18 samples/sec Loss 4.7964 LearningRate 0.0155 Epoch: 12 Global Step: 202220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:42:40,524-Speed 9809.96 samples/sec Loss 4.8162 LearningRate 0.0155 Epoch: 12 Global Step: 202230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:42:41,660-Speed 9021.88 samples/sec Loss 4.7864 LearningRate 0.0155 Epoch: 12 Global Step: 202240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:42:42,789-Speed 9084.17 samples/sec Loss 4.8079 LearningRate 0.0155 Epoch: 12 Global Step: 202250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:42:43,874-Speed 9442.12 samples/sec Loss 4.8354 LearningRate 0.0155 Epoch: 12 Global Step: 202260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:42:44,998-Speed 9117.90 samples/sec Loss 4.8028 LearningRate 0.0155 Epoch: 12 Global Step: 202270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:42:46,104-Speed 9268.07 samples/sec Loss 4.8128 LearningRate 0.0155 Epoch: 12 Global Step: 202280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:42:47,186-Speed 9463.70 samples/sec Loss 4.7273 LearningRate 0.0155 Epoch: 12 Global Step: 202290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:42:48,322-Speed 9019.27 samples/sec Loss 4.7645 LearningRate 0.0155 Epoch: 12 Global Step: 202300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:42:49,454-Speed 9049.55 samples/sec Loss 4.8754 LearningRate 0.0155 Epoch: 12 Global Step: 202310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:50,575-Speed 9142.23 samples/sec Loss 4.8590 LearningRate 0.0155 Epoch: 12 Global Step: 202320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:51,629-Speed 9726.92 samples/sec Loss 4.8013 LearningRate 0.0155 Epoch: 12 Global Step: 202330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:52,795-Speed 8787.32 samples/sec Loss 4.7356 LearningRate 0.0155 Epoch: 12 Global Step: 202340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:53,896-Speed 9302.94 samples/sec Loss 4.7071 LearningRate 0.0155 Epoch: 12 Global Step: 202350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:55,123-Speed 8421.31 samples/sec Loss 4.7868 LearningRate 0.0155 Epoch: 12 Global Step: 202360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:56,206-Speed 9466.07 samples/sec Loss 4.8425 LearningRate 0.0155 Epoch: 12 Global Step: 202370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:57,306-Speed 9311.34 samples/sec Loss 4.9268 LearningRate 0.0155 Epoch: 12 Global Step: 202380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:58,365-Speed 9673.64 samples/sec Loss 4.7438 LearningRate 0.0155 Epoch: 12 Global Step: 202390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:42:59,452-Speed 9431.63 samples/sec Loss 4.8873 LearningRate 0.0155 Epoch: 12 Global Step: 202400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:00,570-Speed 9161.49 samples/sec Loss 4.8678 LearningRate 0.0155 Epoch: 12 Global Step: 202410 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:43:01,648-Speed 9511.49 samples/sec Loss 4.8007 LearningRate 0.0155 Epoch: 12 Global Step: 202420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:02,738-Speed 9399.92 samples/sec Loss 4.7941 LearningRate 0.0155 Epoch: 12 Global Step: 202430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:03,904-Speed 8781.65 samples/sec Loss 4.7029 LearningRate 0.0155 Epoch: 12 Global Step: 202440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:05,039-Speed 9032.49 samples/sec Loss 4.9281 LearningRate 0.0155 Epoch: 12 Global Step: 202450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:06,130-Speed 9400.64 samples/sec Loss 4.7992 LearningRate 0.0155 Epoch: 12 Global Step: 202460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:07,228-Speed 9331.48 samples/sec Loss 4.7963 LearningRate 0.0155 Epoch: 12 Global Step: 202470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:08,340-Speed 9212.81 samples/sec Loss 4.7945 LearningRate 0.0155 Epoch: 12 Global Step: 202480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:09,443-Speed 9291.99 samples/sec Loss 4.7324 LearningRate 0.0155 Epoch: 12 Global Step: 202490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:10,527-Speed 9452.49 samples/sec Loss 4.8590 LearningRate 0.0155 Epoch: 12 Global Step: 202500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:11,624-Speed 9344.14 samples/sec Loss 4.6881 LearningRate 0.0155 Epoch: 12 Global Step: 202510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:12,735-Speed 9217.14 samples/sec Loss 4.8251 LearningRate 0.0155 Epoch: 12 Global Step: 202520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:13,825-Speed 9405.07 samples/sec Loss 4.8514 LearningRate 0.0155 Epoch: 12 Global Step: 202530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:14,898-Speed 9540.77 samples/sec Loss 4.8638 LearningRate 0.0155 Epoch: 12 Global Step: 202540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:15,979-Speed 9484.42 samples/sec Loss 4.7645 LearningRate 0.0155 Epoch: 12 Global Step: 202550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:17,064-Speed 9446.81 samples/sec Loss 4.8886 LearningRate 0.0155 Epoch: 12 Global Step: 202560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:18,151-Speed 9426.03 samples/sec Loss 4.8209 LearningRate 0.0155 Epoch: 12 Global Step: 202570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:19,298-Speed 8932.73 samples/sec Loss 4.9314 LearningRate 0.0155 Epoch: 12 Global Step: 202580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:20,404-Speed 9264.33 samples/sec Loss 4.8517 LearningRate 0.0155 Epoch: 12 Global Step: 202590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:21,478-Speed 9548.09 samples/sec Loss 4.8624 LearningRate 0.0155 Epoch: 12 Global Step: 202600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:22,528-Speed 9752.00 samples/sec Loss 4.8266 LearningRate 0.0154 Epoch: 12 Global Step: 202610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:23,664-Speed 9022.52 samples/sec Loss 4.8596 LearningRate 0.0154 Epoch: 12 Global Step: 202620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:24,784-Speed 9147.43 samples/sec Loss 4.7793 LearningRate 0.0154 Epoch: 12 Global Step: 202630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:25,885-Speed 9310.64 samples/sec Loss 4.7575 LearningRate 0.0154 Epoch: 12 Global Step: 202640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:26,996-Speed 9220.64 samples/sec Loss 4.7749 LearningRate 0.0154 Epoch: 12 Global Step: 202650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:28,096-Speed 9308.89 samples/sec Loss 4.8390 LearningRate 0.0154 Epoch: 12 Global Step: 202660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:29,227-Speed 9065.49 samples/sec Loss 4.8379 LearningRate 0.0154 Epoch: 12 Global Step: 202670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:30,285-Speed 9688.69 samples/sec Loss 4.8921 LearningRate 0.0154 Epoch: 12 Global Step: 202680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:31,355-Speed 9571.28 samples/sec Loss 4.9915 LearningRate 0.0154 Epoch: 12 Global Step: 202690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:32,462-Speed 9251.80 samples/sec Loss 4.8696 LearningRate 0.0154 Epoch: 12 Global Step: 202700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:33,571-Speed 9246.03 samples/sec Loss 4.8563 LearningRate 0.0154 Epoch: 12 Global Step: 202710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:34,648-Speed 9515.43 samples/sec Loss 4.9142 LearningRate 0.0154 Epoch: 12 Global Step: 202720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:35,700-Speed 9735.12 samples/sec Loss 4.7741 LearningRate 0.0154 Epoch: 12 Global Step: 202730 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:43:36,854-Speed 8881.46 samples/sec Loss 4.8518 LearningRate 0.0154 Epoch: 12 Global Step: 202740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:37,948-Speed 9367.29 samples/sec Loss 4.8394 LearningRate 0.0154 Epoch: 12 Global Step: 202750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:39,040-Speed 9379.25 samples/sec Loss 4.8320 LearningRate 0.0154 Epoch: 12 Global Step: 202760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:40,153-Speed 9209.18 samples/sec Loss 4.8192 LearningRate 0.0154 Epoch: 12 Global Step: 202770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:41,276-Speed 9121.93 samples/sec Loss 4.8442 LearningRate 0.0154 Epoch: 12 Global Step: 202780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:42,409-Speed 9044.37 samples/sec Loss 4.8589 LearningRate 0.0154 Epoch: 12 Global Step: 202790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:43,507-Speed 9335.44 samples/sec Loss 4.9015 LearningRate 0.0154 Epoch: 12 Global Step: 202800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:44,617-Speed 9228.69 samples/sec Loss 4.7847 LearningRate 0.0154 Epoch: 12 Global Step: 202810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:45,692-Speed 9532.56 samples/sec Loss 4.8952 LearningRate 0.0154 Epoch: 12 Global Step: 202820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:46,808-Speed 9186.37 samples/sec Loss 4.7451 LearningRate 0.0154 Epoch: 12 Global Step: 202830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:47,885-Speed 9511.01 samples/sec Loss 4.8482 LearningRate 0.0154 Epoch: 12 Global Step: 202840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:48,983-Speed 9337.88 samples/sec Loss 4.8322 LearningRate 0.0154 Epoch: 12 Global Step: 202850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:50,091-Speed 9244.65 samples/sec Loss 4.8483 LearningRate 0.0154 Epoch: 12 Global Step: 202860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:51,167-Speed 9523.45 samples/sec Loss 4.8532 LearningRate 0.0154 Epoch: 12 Global Step: 202870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:52,274-Speed 9258.57 samples/sec Loss 4.7868 LearningRate 0.0154 Epoch: 12 Global Step: 202880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:53,388-Speed 9196.17 samples/sec Loss 4.8993 LearningRate 0.0154 Epoch: 12 Global Step: 202890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:43:54,449-Speed 9653.10 samples/sec Loss 4.8388 LearningRate 0.0154 Epoch: 12 Global Step: 202900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:55,585-Speed 9026.27 samples/sec Loss 4.9031 LearningRate 0.0154 Epoch: 12 Global Step: 202910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:56,694-Speed 9234.77 samples/sec Loss 4.8695 LearningRate 0.0154 Epoch: 12 Global Step: 202920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:57,797-Speed 9288.93 samples/sec Loss 4.9587 LearningRate 0.0154 Epoch: 12 Global Step: 202930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:58,885-Speed 9420.31 samples/sec Loss 4.8972 LearningRate 0.0154 Epoch: 12 Global Step: 202940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:43:59,990-Speed 9273.70 samples/sec Loss 4.8749 LearningRate 0.0154 Epoch: 12 Global Step: 202950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:01,084-Speed 9373.63 samples/sec Loss 4.8107 LearningRate 0.0154 Epoch: 12 Global Step: 202960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:02,214-Speed 9061.99 samples/sec Loss 4.8767 LearningRate 0.0154 Epoch: 12 Global Step: 202970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:03,346-Speed 9055.32 samples/sec Loss 4.8780 LearningRate 0.0154 Epoch: 12 Global Step: 202980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:04,472-Speed 9097.98 samples/sec Loss 4.9200 LearningRate 0.0154 Epoch: 12 Global Step: 202990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:05,537-Speed 9616.14 samples/sec Loss 4.8523 LearningRate 0.0154 Epoch: 12 Global Step: 203000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:06,665-Speed 9084.07 samples/sec Loss 4.8679 LearningRate 0.0154 Epoch: 12 Global Step: 203010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:07,808-Speed 8968.32 samples/sec Loss 4.9163 LearningRate 0.0154 Epoch: 12 Global Step: 203020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:08,952-Speed 8950.67 samples/sec Loss 4.9404 LearningRate 0.0154 Epoch: 12 Global Step: 203030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:10,104-Speed 8895.72 samples/sec Loss 4.8629 LearningRate 0.0153 Epoch: 12 Global Step: 203040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:11,195-Speed 9401.33 samples/sec Loss 4.7625 LearningRate 0.0153 Epoch: 12 Global Step: 203050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:12,278-Speed 9463.33 samples/sec Loss 4.8863 LearningRate 0.0153 Epoch: 12 Global Step: 203060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:13,400-Speed 9128.37 samples/sec Loss 4.8672 LearningRate 0.0153 Epoch: 12 Global Step: 203070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:14,490-Speed 9402.28 samples/sec Loss 4.8073 LearningRate 0.0153 Epoch: 12 Global Step: 203080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:15,619-Speed 9073.50 samples/sec Loss 4.8968 LearningRate 0.0153 Epoch: 12 Global Step: 203090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:16,707-Speed 9411.99 samples/sec Loss 4.8118 LearningRate 0.0153 Epoch: 12 Global Step: 203100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:17,797-Speed 9406.60 samples/sec Loss 4.8875 LearningRate 0.0153 Epoch: 12 Global Step: 203110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:18,850-Speed 9727.80 samples/sec Loss 4.8234 LearningRate 0.0153 Epoch: 12 Global Step: 203120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:19,949-Speed 9322.66 samples/sec Loss 4.8910 LearningRate 0.0153 Epoch: 12 Global Step: 203130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:21,069-Speed 9152.10 samples/sec Loss 4.8882 LearningRate 0.0153 Epoch: 12 Global Step: 203140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:22,234-Speed 8793.68 samples/sec Loss 4.7985 LearningRate 0.0153 Epoch: 12 Global Step: 203150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:23,309-Speed 9534.17 samples/sec Loss 4.8037 LearningRate 0.0153 Epoch: 12 Global Step: 203160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:24,363-Speed 9716.52 samples/sec Loss 4.8514 LearningRate 0.0153 Epoch: 12 Global Step: 203170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:25,448-Speed 9440.86 samples/sec Loss 4.8687 LearningRate 0.0153 Epoch: 12 Global Step: 203180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:26,533-Speed 9446.45 samples/sec Loss 4.9061 LearningRate 0.0153 Epoch: 12 Global Step: 203190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:27,603-Speed 9573.23 samples/sec Loss 5.0033 LearningRate 0.0153 Epoch: 12 Global Step: 203200 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:44:28,683-Speed 9489.13 samples/sec Loss 4.9000 LearningRate 0.0153 Epoch: 12 Global Step: 203210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:29,786-Speed 9293.73 samples/sec Loss 4.8567 LearningRate 0.0153 Epoch: 12 Global Step: 203220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:30,853-Speed 9608.90 samples/sec Loss 4.9474 LearningRate 0.0153 Epoch: 12 Global Step: 203230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:31,923-Speed 9572.03 samples/sec Loss 4.9169 LearningRate 0.0153 Epoch: 12 Global Step: 203240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:33,040-Speed 9174.75 samples/sec Loss 4.9164 LearningRate 0.0153 Epoch: 12 Global Step: 203250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:34,153-Speed 9205.70 samples/sec Loss 4.8371 LearningRate 0.0153 Epoch: 12 Global Step: 203260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:35,227-Speed 9546.84 samples/sec Loss 4.9584 LearningRate 0.0153 Epoch: 12 Global Step: 203270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:36,305-Speed 9507.89 samples/sec Loss 4.8522 LearningRate 0.0153 Epoch: 12 Global Step: 203280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:37,375-Speed 9569.00 samples/sec Loss 4.9127 LearningRate 0.0153 Epoch: 12 Global Step: 203290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:38,445-Speed 9573.81 samples/sec Loss 4.9089 LearningRate 0.0153 Epoch: 12 Global Step: 203300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:39,547-Speed 9303.00 samples/sec Loss 4.8043 LearningRate 0.0153 Epoch: 12 Global Step: 203310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:40,636-Speed 9410.51 samples/sec Loss 4.9145 LearningRate 0.0153 Epoch: 12 Global Step: 203320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:41,738-Speed 9296.08 samples/sec Loss 4.8442 LearningRate 0.0153 Epoch: 12 Global Step: 203330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:42,846-Speed 9247.85 samples/sec Loss 4.8060 LearningRate 0.0153 Epoch: 12 Global Step: 203340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:43,969-Speed 9123.42 samples/sec Loss 4.9674 LearningRate 0.0153 Epoch: 12 Global Step: 203350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:45,083-Speed 9198.87 samples/sec Loss 4.8853 LearningRate 0.0153 Epoch: 12 Global Step: 203360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:46,217-Speed 9038.13 samples/sec Loss 4.8575 LearningRate 0.0153 Epoch: 12 Global Step: 203370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:47,318-Speed 9309.70 samples/sec Loss 4.9158 LearningRate 0.0153 Epoch: 12 Global Step: 203380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:48,441-Speed 9127.66 samples/sec Loss 4.8996 LearningRate 0.0153 Epoch: 12 Global Step: 203390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:49,541-Speed 9311.20 samples/sec Loss 4.9220 LearningRate 0.0153 Epoch: 12 Global Step: 203400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:50,648-Speed 9258.74 samples/sec Loss 4.9666 LearningRate 0.0153 Epoch: 12 Global Step: 203410 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:44:51,717-Speed 9583.89 samples/sec Loss 4.9228 LearningRate 0.0153 Epoch: 12 Global Step: 203420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:52,807-Speed 9398.58 samples/sec Loss 4.9709 LearningRate 0.0153 Epoch: 12 Global Step: 203430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:53,930-Speed 9121.44 samples/sec Loss 4.8648 LearningRate 0.0153 Epoch: 12 Global Step: 203440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:55,001-Speed 9572.40 samples/sec Loss 4.7941 LearningRate 0.0153 Epoch: 12 Global Step: 203450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:56,091-Speed 9400.18 samples/sec Loss 4.8552 LearningRate 0.0152 Epoch: 12 Global Step: 203460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:57,177-Speed 9426.60 samples/sec Loss 4.8806 LearningRate 0.0152 Epoch: 12 Global Step: 203470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:58,275-Speed 9338.16 samples/sec Loss 4.8433 LearningRate 0.0152 Epoch: 12 Global Step: 203480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:44:59,374-Speed 9315.06 samples/sec Loss 4.8643 LearningRate 0.0152 Epoch: 12 Global Step: 203490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:00,482-Speed 9254.25 samples/sec Loss 4.8484 LearningRate 0.0152 Epoch: 12 Global Step: 203500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:01,577-Speed 9356.62 samples/sec Loss 4.7817 LearningRate 0.0152 Epoch: 12 Global Step: 203510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:02,683-Speed 9267.91 samples/sec Loss 4.9685 LearningRate 0.0152 Epoch: 12 Global Step: 203520 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:45:03,748-Speed 9620.41 samples/sec Loss 4.9130 LearningRate 0.0152 Epoch: 12 Global Step: 203530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:04,901-Speed 8883.33 samples/sec Loss 4.8854 LearningRate 0.0152 Epoch: 12 Global Step: 203540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:06,009-Speed 9246.43 samples/sec Loss 4.9692 LearningRate 0.0152 Epoch: 12 Global Step: 203550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:07,105-Speed 9349.29 samples/sec Loss 4.8539 LearningRate 0.0152 Epoch: 12 Global Step: 203560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:08,184-Speed 9494.54 samples/sec Loss 4.8990 LearningRate 0.0152 Epoch: 12 Global Step: 203570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:09,276-Speed 9384.17 samples/sec Loss 4.9022 LearningRate 0.0152 Epoch: 12 Global Step: 203580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:10,341-Speed 9616.81 samples/sec Loss 4.7821 LearningRate 0.0152 Epoch: 12 Global Step: 203590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:11,413-Speed 9566.62 samples/sec Loss 4.9348 LearningRate 0.0152 Epoch: 12 Global Step: 203600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:12,520-Speed 9249.72 samples/sec Loss 4.8660 LearningRate 0.0152 Epoch: 12 Global Step: 203610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:13,658-Speed 9006.28 samples/sec Loss 4.9392 LearningRate 0.0152 Epoch: 12 Global Step: 203620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:14,731-Speed 9544.61 samples/sec Loss 4.9962 LearningRate 0.0152 Epoch: 12 Global Step: 203630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:15,799-Speed 9597.36 samples/sec Loss 4.9211 LearningRate 0.0152 Epoch: 12 Global Step: 203640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:16,867-Speed 9591.83 samples/sec Loss 4.8693 LearningRate 0.0152 Epoch: 12 Global Step: 203650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:17,999-Speed 9058.10 samples/sec Loss 4.8984 LearningRate 0.0152 Epoch: 12 Global Step: 203660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:19,128-Speed 9071.70 samples/sec Loss 4.8934 LearningRate 0.0152 Epoch: 12 Global Step: 203670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:20,259-Speed 9061.67 samples/sec Loss 4.9896 LearningRate 0.0152 Epoch: 12 Global Step: 203680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:21,357-Speed 9333.33 samples/sec Loss 4.8900 LearningRate 0.0152 Epoch: 12 Global Step: 203690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:22,473-Speed 9177.17 samples/sec Loss 4.9034 LearningRate 0.0152 Epoch: 12 Global Step: 203700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:23,599-Speed 9101.15 samples/sec Loss 4.9323 LearningRate 0.0152 Epoch: 12 Global Step: 203710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:24,712-Speed 9205.99 samples/sec Loss 4.8029 LearningRate 0.0152 Epoch: 12 Global Step: 203720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:25,804-Speed 9385.02 samples/sec Loss 4.9692 LearningRate 0.0152 Epoch: 12 Global Step: 203730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:26,861-Speed 9690.50 samples/sec Loss 4.8480 LearningRate 0.0152 Epoch: 12 Global Step: 203740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:27,943-Speed 9473.50 samples/sec Loss 4.9266 LearningRate 0.0152 Epoch: 12 Global Step: 203750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:29,060-Speed 9166.34 samples/sec Loss 4.8818 LearningRate 0.0152 Epoch: 12 Global Step: 203760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:30,130-Speed 9587.31 samples/sec Loss 4.9374 LearningRate 0.0152 Epoch: 12 Global Step: 203770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:31,183-Speed 9724.81 samples/sec Loss 4.9497 LearningRate 0.0152 Epoch: 12 Global Step: 203780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:32,264-Speed 9480.07 samples/sec Loss 4.9231 LearningRate 0.0152 Epoch: 12 Global Step: 203790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:33,384-Speed 9147.03 samples/sec Loss 4.8339 LearningRate 0.0152 Epoch: 12 Global Step: 203800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:34,457-Speed 9553.56 samples/sec Loss 4.9022 LearningRate 0.0152 Epoch: 12 Global Step: 203810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:35,540-Speed 9454.91 samples/sec Loss 4.9225 LearningRate 0.0152 Epoch: 12 Global Step: 203820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:36,607-Speed 9604.12 samples/sec Loss 4.9581 LearningRate 0.0152 Epoch: 12 Global Step: 203830 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:45:37,723-Speed 9177.68 samples/sec Loss 4.9142 LearningRate 0.0152 Epoch: 12 Global Step: 203840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:38,804-Speed 9483.02 samples/sec Loss 4.9883 LearningRate 0.0152 Epoch: 12 Global Step: 203850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:39,879-Speed 9535.28 samples/sec Loss 4.9581 LearningRate 0.0152 Epoch: 12 Global Step: 203860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:40,973-Speed 9361.95 samples/sec Loss 4.8584 LearningRate 0.0152 Epoch: 12 Global Step: 203870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:42,047-Speed 9548.37 samples/sec Loss 4.9626 LearningRate 0.0152 Epoch: 12 Global Step: 203880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:43,146-Speed 9319.83 samples/sec Loss 4.9046 LearningRate 0.0151 Epoch: 12 Global Step: 203890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:44,229-Speed 9463.16 samples/sec Loss 4.9390 LearningRate 0.0151 Epoch: 12 Global Step: 203900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:45,300-Speed 9566.40 samples/sec Loss 4.9226 LearningRate 0.0151 Epoch: 12 Global Step: 203910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:46,397-Speed 9334.85 samples/sec Loss 4.8569 LearningRate 0.0151 Epoch: 12 Global Step: 203920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:47,533-Speed 9023.95 samples/sec Loss 4.8864 LearningRate 0.0151 Epoch: 12 Global Step: 203930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:48,605-Speed 9560.61 samples/sec Loss 4.8530 LearningRate 0.0151 Epoch: 12 Global Step: 203940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:49,660-Speed 9710.75 samples/sec Loss 4.9283 LearningRate 0.0151 Epoch: 12 Global Step: 203950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:50,733-Speed 9554.15 samples/sec Loss 4.9262 LearningRate 0.0151 Epoch: 12 Global Step: 203960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:51,808-Speed 9528.78 samples/sec Loss 4.8248 LearningRate 0.0151 Epoch: 12 Global Step: 203970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:52,883-Speed 9529.75 samples/sec Loss 4.8714 LearningRate 0.0151 Epoch: 12 Global Step: 203980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:54,003-Speed 9155.24 samples/sec Loss 4.9544 LearningRate 0.0151 Epoch: 12 Global Step: 203990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:45:55,125-Speed 9128.24 samples/sec Loss 4.9197 LearningRate 0.0151 Epoch: 12 Global Step: 204000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:46:17,180-[lfw][204000]XNorm: 8.552353 Training: 2022-04-11 19:46:17,181-[lfw][204000]Accuracy-Flip: 0.99533+-0.00314 Training: 2022-04-11 19:46:17,181-[lfw][204000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:46:42,645-[cfp_fp][204000]XNorm: 7.345010 Training: 2022-04-11 19:46:42,645-[cfp_fp][204000]Accuracy-Flip: 0.96443+-0.00853 Training: 2022-04-11 19:46:42,646-[cfp_fp][204000]Accuracy-Highest: 0.96771 Training: 2022-04-11 19:47:04,546-[agedb_30][204000]XNorm: 8.247466 Training: 2022-04-11 19:47:04,547-[agedb_30][204000]Accuracy-Flip: 0.96950+-0.00898 Training: 2022-04-11 19:47:04,547-[agedb_30][204000]Accuracy-Highest: 0.96983 Training: 2022-04-11 19:47:05,634-Speed 145.23 samples/sec Loss 4.8886 LearningRate 0.0151 Epoch: 12 Global Step: 204010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:06,732-Speed 9332.22 samples/sec Loss 4.9113 LearningRate 0.0151 Epoch: 12 Global Step: 204020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:07,867-Speed 9028.22 samples/sec Loss 4.9535 LearningRate 0.0151 Epoch: 12 Global Step: 204030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:08,945-Speed 9503.65 samples/sec Loss 4.9898 LearningRate 0.0151 Epoch: 12 Global Step: 204040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:10,025-Speed 9485.03 samples/sec Loss 5.0058 LearningRate 0.0151 Epoch: 12 Global Step: 204050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:11,093-Speed 9593.51 samples/sec Loss 5.0017 LearningRate 0.0151 Epoch: 12 Global Step: 204060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:12,176-Speed 9467.70 samples/sec Loss 4.8669 LearningRate 0.0151 Epoch: 12 Global Step: 204070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:13,227-Speed 9748.62 samples/sec Loss 4.8928 LearningRate 0.0151 Epoch: 12 Global Step: 204080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:14,338-Speed 9220.92 samples/sec Loss 4.8974 LearningRate 0.0151 Epoch: 12 Global Step: 204090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:15,434-Speed 9350.14 samples/sec Loss 5.0190 LearningRate 0.0151 Epoch: 12 Global Step: 204100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:16,508-Speed 9542.48 samples/sec Loss 4.9638 LearningRate 0.0151 Epoch: 12 Global Step: 204110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:17,570-Speed 9653.09 samples/sec Loss 5.0591 LearningRate 0.0151 Epoch: 12 Global Step: 204120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:18,644-Speed 9539.02 samples/sec Loss 4.9767 LearningRate 0.0151 Epoch: 12 Global Step: 204130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:19,719-Speed 9536.97 samples/sec Loss 5.0221 LearningRate 0.0151 Epoch: 12 Global Step: 204140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:20,796-Speed 9515.81 samples/sec Loss 4.8839 LearningRate 0.0151 Epoch: 12 Global Step: 204150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:21,873-Speed 9510.72 samples/sec Loss 4.9770 LearningRate 0.0151 Epoch: 12 Global Step: 204160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:22,986-Speed 9210.97 samples/sec Loss 4.9873 LearningRate 0.0151 Epoch: 12 Global Step: 204170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:24,070-Speed 9451.16 samples/sec Loss 4.8085 LearningRate 0.0151 Epoch: 12 Global Step: 204180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:25,123-Speed 9730.00 samples/sec Loss 4.9608 LearningRate 0.0151 Epoch: 12 Global Step: 204190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:26,184-Speed 9650.38 samples/sec Loss 4.8283 LearningRate 0.0151 Epoch: 12 Global Step: 204200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:27,299-Speed 9195.30 samples/sec Loss 4.9511 LearningRate 0.0151 Epoch: 12 Global Step: 204210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:28,362-Speed 9636.71 samples/sec Loss 4.9549 LearningRate 0.0151 Epoch: 12 Global Step: 204220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:29,483-Speed 9139.04 samples/sec Loss 4.9254 LearningRate 0.0151 Epoch: 12 Global Step: 204230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:30,570-Speed 9432.65 samples/sec Loss 4.8518 LearningRate 0.0151 Epoch: 12 Global Step: 204240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:31,675-Speed 9269.78 samples/sec Loss 4.9015 LearningRate 0.0151 Epoch: 12 Global Step: 204250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:32,802-Speed 9094.80 samples/sec Loss 4.9194 LearningRate 0.0151 Epoch: 12 Global Step: 204260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:33,854-Speed 9739.59 samples/sec Loss 4.9279 LearningRate 0.0151 Epoch: 12 Global Step: 204270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:34,967-Speed 9210.15 samples/sec Loss 4.9231 LearningRate 0.0151 Epoch: 12 Global Step: 204280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:36,035-Speed 9594.60 samples/sec Loss 5.0590 LearningRate 0.0151 Epoch: 12 Global Step: 204290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:37,111-Speed 9520.21 samples/sec Loss 5.1033 LearningRate 0.0151 Epoch: 12 Global Step: 204300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:38,214-Speed 9286.54 samples/sec Loss 4.9691 LearningRate 0.0151 Epoch: 12 Global Step: 204310 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:47:39,289-Speed 9537.50 samples/sec Loss 4.9871 LearningRate 0.0150 Epoch: 12 Global Step: 204320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:40,397-Speed 9241.03 samples/sec Loss 4.9889 LearningRate 0.0150 Epoch: 12 Global Step: 204330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:41,496-Speed 9329.21 samples/sec Loss 4.9547 LearningRate 0.0150 Epoch: 12 Global Step: 204340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:42,580-Speed 9453.04 samples/sec Loss 4.9732 LearningRate 0.0150 Epoch: 12 Global Step: 204350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:43,632-Speed 9739.47 samples/sec Loss 4.8523 LearningRate 0.0150 Epoch: 12 Global Step: 204360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:44,779-Speed 8930.54 samples/sec Loss 4.9389 LearningRate 0.0150 Epoch: 12 Global Step: 204370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:45,820-Speed 9841.59 samples/sec Loss 4.7979 LearningRate 0.0150 Epoch: 12 Global Step: 204380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:46,906-Speed 9442.33 samples/sec Loss 4.9313 LearningRate 0.0150 Epoch: 12 Global Step: 204390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:47,993-Speed 9421.06 samples/sec Loss 4.9807 LearningRate 0.0150 Epoch: 12 Global Step: 204400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:49,062-Speed 9591.04 samples/sec Loss 4.8770 LearningRate 0.0150 Epoch: 12 Global Step: 204410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:50,119-Speed 9695.27 samples/sec Loss 4.8798 LearningRate 0.0150 Epoch: 12 Global Step: 204420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:51,176-Speed 9695.10 samples/sec Loss 4.8902 LearningRate 0.0150 Epoch: 12 Global Step: 204430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:52,259-Speed 9458.17 samples/sec Loss 5.0219 LearningRate 0.0150 Epoch: 12 Global Step: 204440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:53,406-Speed 8932.74 samples/sec Loss 4.9241 LearningRate 0.0150 Epoch: 12 Global Step: 204450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:47:54,494-Speed 9423.84 samples/sec Loss 5.0144 LearningRate 0.0150 Epoch: 12 Global Step: 204460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:55,607-Speed 9200.78 samples/sec Loss 4.9081 LearningRate 0.0150 Epoch: 12 Global Step: 204470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:56,701-Speed 9367.97 samples/sec Loss 4.9782 LearningRate 0.0150 Epoch: 12 Global Step: 204480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:57,795-Speed 9363.89 samples/sec Loss 5.0102 LearningRate 0.0150 Epoch: 12 Global Step: 204490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:58,904-Speed 9244.11 samples/sec Loss 4.9539 LearningRate 0.0150 Epoch: 12 Global Step: 204500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:47:59,971-Speed 9603.62 samples/sec Loss 5.0146 LearningRate 0.0150 Epoch: 12 Global Step: 204510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:01,057-Speed 9435.60 samples/sec Loss 4.9430 LearningRate 0.0150 Epoch: 12 Global Step: 204520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:02,137-Speed 9488.53 samples/sec Loss 5.0838 LearningRate 0.0150 Epoch: 12 Global Step: 204530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:03,241-Speed 9282.96 samples/sec Loss 4.9412 LearningRate 0.0150 Epoch: 12 Global Step: 204540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:04,363-Speed 9131.53 samples/sec Loss 4.8827 LearningRate 0.0150 Epoch: 12 Global Step: 204550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:05,434-Speed 9571.93 samples/sec Loss 4.9978 LearningRate 0.0150 Epoch: 12 Global Step: 204560 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:48:06,524-Speed 9397.28 samples/sec Loss 4.8949 LearningRate 0.0150 Epoch: 12 Global Step: 204570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:07,647-Speed 9125.01 samples/sec Loss 4.9385 LearningRate 0.0150 Epoch: 12 Global Step: 204580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:08,740-Speed 9372.27 samples/sec Loss 5.0092 LearningRate 0.0150 Epoch: 12 Global Step: 204590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:09,908-Speed 8778.96 samples/sec Loss 4.8922 LearningRate 0.0150 Epoch: 12 Global Step: 204600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:11,000-Speed 9380.39 samples/sec Loss 4.9376 LearningRate 0.0150 Epoch: 12 Global Step: 204610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:12,080-Speed 9484.51 samples/sec Loss 4.9507 LearningRate 0.0150 Epoch: 12 Global Step: 204620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:13,161-Speed 9483.85 samples/sec Loss 4.9241 LearningRate 0.0150 Epoch: 12 Global Step: 204630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:14,238-Speed 9515.78 samples/sec Loss 4.8861 LearningRate 0.0150 Epoch: 12 Global Step: 204640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:15,333-Speed 9352.31 samples/sec Loss 4.9133 LearningRate 0.0150 Epoch: 12 Global Step: 204650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:16,388-Speed 9720.59 samples/sec Loss 4.9268 LearningRate 0.0150 Epoch: 12 Global Step: 204660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:17,467-Speed 9490.58 samples/sec Loss 5.0745 LearningRate 0.0150 Epoch: 12 Global Step: 204670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:18,564-Speed 9338.04 samples/sec Loss 4.9976 LearningRate 0.0150 Epoch: 12 Global Step: 204680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:19,646-Speed 9476.11 samples/sec Loss 4.9464 LearningRate 0.0150 Epoch: 12 Global Step: 204690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:20,729-Speed 9457.50 samples/sec Loss 5.0118 LearningRate 0.0150 Epoch: 12 Global Step: 204700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:21,802-Speed 9552.13 samples/sec Loss 4.9775 LearningRate 0.0150 Epoch: 12 Global Step: 204710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:22,883-Speed 9477.49 samples/sec Loss 5.0085 LearningRate 0.0150 Epoch: 12 Global Step: 204720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:24,009-Speed 9101.04 samples/sec Loss 4.9109 LearningRate 0.0150 Epoch: 12 Global Step: 204730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:25,100-Speed 9391.49 samples/sec Loss 4.8804 LearningRate 0.0150 Epoch: 12 Global Step: 204740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:26,168-Speed 9598.89 samples/sec Loss 5.0233 LearningRate 0.0149 Epoch: 12 Global Step: 204750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:27,233-Speed 9619.68 samples/sec Loss 4.8672 LearningRate 0.0149 Epoch: 12 Global Step: 204760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:28,372-Speed 8993.99 samples/sec Loss 4.9115 LearningRate 0.0149 Epoch: 12 Global Step: 204770 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:48:29,439-Speed 9606.27 samples/sec Loss 4.9425 LearningRate 0.0149 Epoch: 12 Global Step: 204780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:30,509-Speed 9579.14 samples/sec Loss 4.9391 LearningRate 0.0149 Epoch: 12 Global Step: 204790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:31,611-Speed 9295.50 samples/sec Loss 4.8395 LearningRate 0.0149 Epoch: 12 Global Step: 204800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:32,701-Speed 9400.69 samples/sec Loss 4.9039 LearningRate 0.0149 Epoch: 12 Global Step: 204810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:33,795-Speed 9366.53 samples/sec Loss 5.0250 LearningRate 0.0149 Epoch: 12 Global Step: 204820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:34,912-Speed 9174.36 samples/sec Loss 4.9871 LearningRate 0.0149 Epoch: 12 Global Step: 204830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:36,027-Speed 9193.00 samples/sec Loss 4.8719 LearningRate 0.0149 Epoch: 12 Global Step: 204840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:37,141-Speed 9200.14 samples/sec Loss 4.9570 LearningRate 0.0149 Epoch: 12 Global Step: 204850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:48:38,270-Speed 9077.19 samples/sec Loss 4.9642 LearningRate 0.0149 Epoch: 12 Global Step: 204860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:48:39,330-Speed 9663.86 samples/sec Loss 4.9508 LearningRate 0.0149 Epoch: 12 Global Step: 204870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:48:40,379-Speed 9764.40 samples/sec Loss 4.9120 LearningRate 0.0149 Epoch: 12 Global Step: 204880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:48:41,487-Speed 9249.81 samples/sec Loss 4.8994 LearningRate 0.0149 Epoch: 12 Global Step: 204890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:48:42,572-Speed 9438.41 samples/sec Loss 4.9266 LearningRate 0.0149 Epoch: 12 Global Step: 204900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:48:43,665-Speed 9382.38 samples/sec Loss 4.9552 LearningRate 0.0149 Epoch: 12 Global Step: 204910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:48:44,693-Speed 9967.79 samples/sec Loss 5.0073 LearningRate 0.0149 Epoch: 12 Global Step: 204920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:48:45,797-Speed 9280.22 samples/sec Loss 4.9346 LearningRate 0.0149 Epoch: 12 Global Step: 204930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:48:46,848-Speed 9749.32 samples/sec Loss 5.0120 LearningRate 0.0149 Epoch: 12 Global Step: 204940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:48:47,919-Speed 9566.83 samples/sec Loss 4.9800 LearningRate 0.0149 Epoch: 12 Global Step: 204950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:49,003-Speed 9449.91 samples/sec Loss 4.9449 LearningRate 0.0149 Epoch: 12 Global Step: 204960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:50,107-Speed 9281.87 samples/sec Loss 4.9394 LearningRate 0.0149 Epoch: 12 Global Step: 204970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:51,207-Speed 9321.80 samples/sec Loss 4.9914 LearningRate 0.0149 Epoch: 12 Global Step: 204980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:52,285-Speed 9503.45 samples/sec Loss 4.9435 LearningRate 0.0149 Epoch: 12 Global Step: 204990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:53,385-Speed 9321.56 samples/sec Loss 4.9801 LearningRate 0.0149 Epoch: 12 Global Step: 205000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:54,501-Speed 9179.98 samples/sec Loss 4.9999 LearningRate 0.0149 Epoch: 12 Global Step: 205010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:55,621-Speed 9148.14 samples/sec Loss 4.8310 LearningRate 0.0149 Epoch: 12 Global Step: 205020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:56,748-Speed 9098.06 samples/sec Loss 4.9292 LearningRate 0.0149 Epoch: 12 Global Step: 205030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:57,839-Speed 9389.14 samples/sec Loss 5.0152 LearningRate 0.0149 Epoch: 12 Global Step: 205040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:48:59,018-Speed 8694.74 samples/sec Loss 4.9590 LearningRate 0.0149 Epoch: 12 Global Step: 205050 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:49:00,082-Speed 9632.97 samples/sec Loss 4.9916 LearningRate 0.0149 Epoch: 12 Global Step: 205060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:01,117-Speed 9892.72 samples/sec Loss 4.9858 LearningRate 0.0149 Epoch: 12 Global Step: 205070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:02,187-Speed 9577.81 samples/sec Loss 5.0030 LearningRate 0.0149 Epoch: 12 Global Step: 205080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:03,295-Speed 9248.12 samples/sec Loss 5.0088 LearningRate 0.0149 Epoch: 12 Global Step: 205090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:04,355-Speed 9669.85 samples/sec Loss 4.9342 LearningRate 0.0149 Epoch: 12 Global Step: 205100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:05,503-Speed 8928.95 samples/sec Loss 4.9963 LearningRate 0.0149 Epoch: 12 Global Step: 205110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:06,619-Speed 9174.53 samples/sec Loss 4.9421 LearningRate 0.0149 Epoch: 12 Global Step: 205120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:07,714-Speed 9360.32 samples/sec Loss 4.9855 LearningRate 0.0149 Epoch: 12 Global Step: 205130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:08,822-Speed 9248.34 samples/sec Loss 4.9763 LearningRate 0.0149 Epoch: 12 Global Step: 205140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:09,908-Speed 9431.20 samples/sec Loss 4.9946 LearningRate 0.0149 Epoch: 12 Global Step: 205150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:10,988-Speed 9491.73 samples/sec Loss 5.0193 LearningRate 0.0149 Epoch: 12 Global Step: 205160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:12,090-Speed 9292.27 samples/sec Loss 4.9730 LearningRate 0.0149 Epoch: 12 Global Step: 205170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:13,182-Speed 9387.25 samples/sec Loss 5.0449 LearningRate 0.0149 Epoch: 12 Global Step: 205180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:14,290-Speed 9245.90 samples/sec Loss 4.9402 LearningRate 0.0148 Epoch: 12 Global Step: 205190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:15,424-Speed 9039.28 samples/sec Loss 4.9132 LearningRate 0.0148 Epoch: 12 Global Step: 205200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:16,571-Speed 8935.26 samples/sec Loss 5.0110 LearningRate 0.0148 Epoch: 12 Global Step: 205210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:17,672-Speed 9303.41 samples/sec Loss 4.9721 LearningRate 0.0148 Epoch: 12 Global Step: 205220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:18,826-Speed 8880.54 samples/sec Loss 5.1063 LearningRate 0.0148 Epoch: 12 Global Step: 205230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:19,909-Speed 9462.45 samples/sec Loss 4.9952 LearningRate 0.0148 Epoch: 12 Global Step: 205240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:20,974-Speed 9617.72 samples/sec Loss 5.0110 LearningRate 0.0148 Epoch: 12 Global Step: 205250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:22,062-Speed 9424.55 samples/sec Loss 4.9605 LearningRate 0.0148 Epoch: 12 Global Step: 205260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:23,140-Speed 9505.49 samples/sec Loss 4.9230 LearningRate 0.0148 Epoch: 12 Global Step: 205270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:24,265-Speed 9106.77 samples/sec Loss 4.9907 LearningRate 0.0148 Epoch: 12 Global Step: 205280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:25,343-Speed 9504.88 samples/sec Loss 4.9679 LearningRate 0.0148 Epoch: 12 Global Step: 205290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:49:26,427-Speed 9449.81 samples/sec Loss 4.9564 LearningRate 0.0148 Epoch: 12 Global Step: 205300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:49:27,504-Speed 9515.48 samples/sec Loss 5.0260 LearningRate 0.0148 Epoch: 12 Global Step: 205310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:49:28,592-Speed 9419.09 samples/sec Loss 4.8772 LearningRate 0.0148 Epoch: 12 Global Step: 205320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:49:29,688-Speed 9347.30 samples/sec Loss 4.9327 LearningRate 0.0148 Epoch: 12 Global Step: 205330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:49:30,827-Speed 8994.00 samples/sec Loss 4.9744 LearningRate 0.0148 Epoch: 12 Global Step: 205340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:49:31,931-Speed 9280.00 samples/sec Loss 5.0494 LearningRate 0.0148 Epoch: 12 Global Step: 205350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:49:33,027-Speed 9348.40 samples/sec Loss 4.9086 LearningRate 0.0148 Epoch: 12 Global Step: 205360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:49:34,159-Speed 9052.52 samples/sec Loss 5.0792 LearningRate 0.0148 Epoch: 12 Global Step: 205370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:49:35,257-Speed 9333.01 samples/sec Loss 4.9438 LearningRate 0.0148 Epoch: 12 Global Step: 205380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:49:36,385-Speed 9081.69 samples/sec Loss 4.9494 LearningRate 0.0148 Epoch: 12 Global Step: 205390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:37,473-Speed 9420.32 samples/sec Loss 4.9149 LearningRate 0.0148 Epoch: 12 Global Step: 205400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:38,556-Speed 9460.88 samples/sec Loss 4.9249 LearningRate 0.0148 Epoch: 12 Global Step: 205410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:39,674-Speed 9168.40 samples/sec Loss 4.9164 LearningRate 0.0148 Epoch: 12 Global Step: 205420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:40,783-Speed 9242.06 samples/sec Loss 5.0254 LearningRate 0.0148 Epoch: 12 Global Step: 205430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:41,934-Speed 8897.22 samples/sec Loss 5.0616 LearningRate 0.0148 Epoch: 12 Global Step: 205440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:43,025-Speed 9387.66 samples/sec Loss 5.0051 LearningRate 0.0148 Epoch: 12 Global Step: 205450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:44,108-Speed 9458.77 samples/sec Loss 4.9207 LearningRate 0.0148 Epoch: 12 Global Step: 205460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:45,208-Speed 9315.80 samples/sec Loss 4.9718 LearningRate 0.0148 Epoch: 12 Global Step: 205470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:46,347-Speed 9000.47 samples/sec Loss 4.8872 LearningRate 0.0148 Epoch: 12 Global Step: 205480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:47,464-Speed 9177.96 samples/sec Loss 4.9893 LearningRate 0.0148 Epoch: 12 Global Step: 205490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:48,581-Speed 9169.35 samples/sec Loss 4.9505 LearningRate 0.0148 Epoch: 12 Global Step: 205500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:49,682-Speed 9309.11 samples/sec Loss 5.0116 LearningRate 0.0148 Epoch: 12 Global Step: 205510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:50,793-Speed 9221.03 samples/sec Loss 4.9689 LearningRate 0.0148 Epoch: 12 Global Step: 205520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:51,851-Speed 9682.72 samples/sec Loss 4.8841 LearningRate 0.0148 Epoch: 12 Global Step: 205530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:52,964-Speed 9205.41 samples/sec Loss 4.9900 LearningRate 0.0148 Epoch: 12 Global Step: 205540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:54,076-Speed 9225.68 samples/sec Loss 5.0566 LearningRate 0.0148 Epoch: 12 Global Step: 205550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:55,169-Speed 9372.30 samples/sec Loss 5.0673 LearningRate 0.0148 Epoch: 12 Global Step: 205560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:56,290-Speed 9137.68 samples/sec Loss 4.9909 LearningRate 0.0148 Epoch: 12 Global Step: 205570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:57,372-Speed 9469.65 samples/sec Loss 4.9428 LearningRate 0.0148 Epoch: 12 Global Step: 205580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:49:58,465-Speed 9369.80 samples/sec Loss 4.9404 LearningRate 0.0148 Epoch: 12 Global Step: 205590 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:49:59,639-Speed 8729.70 samples/sec Loss 5.0751 LearningRate 0.0148 Epoch: 12 Global Step: 205600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:00,724-Speed 9442.36 samples/sec Loss 4.9732 LearningRate 0.0148 Epoch: 12 Global Step: 205610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:01,893-Speed 8769.27 samples/sec Loss 5.0645 LearningRate 0.0147 Epoch: 12 Global Step: 205620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:02,979-Speed 9434.52 samples/sec Loss 5.0134 LearningRate 0.0147 Epoch: 12 Global Step: 205630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:04,040-Speed 9656.56 samples/sec Loss 5.0144 LearningRate 0.0147 Epoch: 12 Global Step: 205640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:05,137-Speed 9339.92 samples/sec Loss 5.0387 LearningRate 0.0147 Epoch: 12 Global Step: 205650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:06,232-Speed 9361.01 samples/sec Loss 4.9082 LearningRate 0.0147 Epoch: 12 Global Step: 205660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:07,366-Speed 9035.93 samples/sec Loss 5.0440 LearningRate 0.0147 Epoch: 12 Global Step: 205670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:08,476-Speed 9233.68 samples/sec Loss 4.9902 LearningRate 0.0147 Epoch: 12 Global Step: 205680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:09,562-Speed 9437.79 samples/sec Loss 5.0578 LearningRate 0.0147 Epoch: 12 Global Step: 205690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:10,617-Speed 9711.21 samples/sec Loss 5.1294 LearningRate 0.0147 Epoch: 12 Global Step: 205700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:11,684-Speed 9600.80 samples/sec Loss 4.9646 LearningRate 0.0147 Epoch: 12 Global Step: 205710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:12,743-Speed 9676.63 samples/sec Loss 4.9002 LearningRate 0.0147 Epoch: 12 Global Step: 205720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:13,840-Speed 9341.85 samples/sec Loss 4.9747 LearningRate 0.0147 Epoch: 12 Global Step: 205730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:14,932-Speed 9383.78 samples/sec Loss 4.8832 LearningRate 0.0147 Epoch: 12 Global Step: 205740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:16,037-Speed 9272.76 samples/sec Loss 4.9852 LearningRate 0.0147 Epoch: 12 Global Step: 205750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:17,120-Speed 9463.33 samples/sec Loss 4.9510 LearningRate 0.0147 Epoch: 12 Global Step: 205760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:18,204-Speed 9458.57 samples/sec Loss 5.0327 LearningRate 0.0147 Epoch: 12 Global Step: 205770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:19,303-Speed 9321.29 samples/sec Loss 5.0302 LearningRate 0.0147 Epoch: 12 Global Step: 205780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:20,362-Speed 9671.49 samples/sec Loss 4.9434 LearningRate 0.0147 Epoch: 12 Global Step: 205790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:50:21,427-Speed 9628.09 samples/sec Loss 4.8893 LearningRate 0.0147 Epoch: 12 Global Step: 205800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:50:22,493-Speed 9608.63 samples/sec Loss 5.1033 LearningRate 0.0147 Epoch: 12 Global Step: 205810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:50:23,601-Speed 9252.43 samples/sec Loss 5.0034 LearningRate 0.0147 Epoch: 12 Global Step: 205820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:50:24,677-Speed 9519.20 samples/sec Loss 5.0068 LearningRate 0.0147 Epoch: 12 Global Step: 205830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:50:25,778-Speed 9312.38 samples/sec Loss 5.0606 LearningRate 0.0147 Epoch: 12 Global Step: 205840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:50:26,878-Speed 9308.47 samples/sec Loss 4.9862 LearningRate 0.0147 Epoch: 12 Global Step: 205850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:50:27,934-Speed 9710.01 samples/sec Loss 5.0357 LearningRate 0.0147 Epoch: 12 Global Step: 205860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:50:28,983-Speed 9769.00 samples/sec Loss 4.9760 LearningRate 0.0147 Epoch: 12 Global Step: 205870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:50:30,102-Speed 9154.95 samples/sec Loss 4.9809 LearningRate 0.0147 Epoch: 12 Global Step: 205880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:50:31,189-Speed 9428.32 samples/sec Loss 4.9378 LearningRate 0.0147 Epoch: 12 Global Step: 205890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:32,273-Speed 9450.55 samples/sec Loss 5.0334 LearningRate 0.0147 Epoch: 12 Global Step: 205900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:33,375-Speed 9292.23 samples/sec Loss 4.9928 LearningRate 0.0147 Epoch: 12 Global Step: 205910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:34,471-Speed 9353.73 samples/sec Loss 5.0313 LearningRate 0.0147 Epoch: 12 Global Step: 205920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:35,596-Speed 9104.71 samples/sec Loss 4.9665 LearningRate 0.0147 Epoch: 12 Global Step: 205930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:36,661-Speed 9620.83 samples/sec Loss 5.0658 LearningRate 0.0147 Epoch: 12 Global Step: 205940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:37,772-Speed 9221.72 samples/sec Loss 5.0306 LearningRate 0.0147 Epoch: 12 Global Step: 205950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:38,867-Speed 9359.38 samples/sec Loss 4.9746 LearningRate 0.0147 Epoch: 12 Global Step: 205960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:39,937-Speed 9573.66 samples/sec Loss 5.0046 LearningRate 0.0147 Epoch: 12 Global Step: 205970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:41,002-Speed 9625.17 samples/sec Loss 4.8837 LearningRate 0.0147 Epoch: 12 Global Step: 205980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:50:42,140-Speed 9001.97 samples/sec Loss 5.0358 LearningRate 0.0147 Epoch: 12 Global Step: 205990 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:50:43,218-Speed 9510.08 samples/sec Loss 4.8676 LearningRate 0.0147 Epoch: 12 Global Step: 206000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:51:05,313-[lfw][206000]XNorm: 8.476825 Training: 2022-04-11 19:51:05,313-[lfw][206000]Accuracy-Flip: 0.99600+-0.00309 Training: 2022-04-11 19:51:05,314-[lfw][206000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:51:30,535-[cfp_fp][206000]XNorm: 7.316313 Training: 2022-04-11 19:51:30,536-[cfp_fp][206000]Accuracy-Flip: 0.96657+-0.00911 Training: 2022-04-11 19:51:30,536-[cfp_fp][206000]Accuracy-Highest: 0.96771 Training: 2022-04-11 19:51:52,242-[agedb_30][206000]XNorm: 8.179552 Training: 2022-04-11 19:51:52,243-[agedb_30][206000]Accuracy-Flip: 0.96733+-0.00981 Training: 2022-04-11 19:51:52,243-[agedb_30][206000]Accuracy-Highest: 0.96983 Training: 2022-04-11 19:51:53,298-Speed 146.12 samples/sec Loss 5.0208 LearningRate 0.0147 Epoch: 12 Global Step: 206010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:51:54,416-Speed 9160.47 samples/sec Loss 5.0577 LearningRate 0.0147 Epoch: 12 Global Step: 206020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:51:55,503-Speed 9425.76 samples/sec Loss 4.9557 LearningRate 0.0147 Epoch: 12 Global Step: 206030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:51:56,584-Speed 9481.57 samples/sec Loss 4.9153 LearningRate 0.0147 Epoch: 12 Global Step: 206040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:51:57,660-Speed 9520.68 samples/sec Loss 5.0015 LearningRate 0.0146 Epoch: 12 Global Step: 206050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:51:58,724-Speed 9633.05 samples/sec Loss 4.9436 LearningRate 0.0146 Epoch: 12 Global Step: 206060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:51:59,810-Speed 9437.64 samples/sec Loss 4.9457 LearningRate 0.0146 Epoch: 12 Global Step: 206070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:52:00,905-Speed 9360.36 samples/sec Loss 4.9370 LearningRate 0.0146 Epoch: 12 Global Step: 206080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:52:01,995-Speed 9396.67 samples/sec Loss 4.9674 LearningRate 0.0146 Epoch: 12 Global Step: 206090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:52:03,109-Speed 9195.84 samples/sec Loss 4.9715 LearningRate 0.0146 Epoch: 12 Global Step: 206100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:52:04,187-Speed 9504.92 samples/sec Loss 5.0040 LearningRate 0.0146 Epoch: 12 Global Step: 206110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:52:05,245-Speed 9680.24 samples/sec Loss 5.0301 LearningRate 0.0146 Epoch: 12 Global Step: 206120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:52:06,334-Speed 9414.91 samples/sec Loss 4.9291 LearningRate 0.0146 Epoch: 12 Global Step: 206130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:07,441-Speed 9247.23 samples/sec Loss 4.9900 LearningRate 0.0146 Epoch: 12 Global Step: 206140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:08,501-Speed 9667.08 samples/sec Loss 5.1395 LearningRate 0.0146 Epoch: 12 Global Step: 206150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:09,576-Speed 9534.96 samples/sec Loss 5.0797 LearningRate 0.0146 Epoch: 12 Global Step: 206160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:10,659-Speed 9455.59 samples/sec Loss 4.9666 LearningRate 0.0146 Epoch: 12 Global Step: 206170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:11,790-Speed 9062.08 samples/sec Loss 4.9847 LearningRate 0.0146 Epoch: 12 Global Step: 206180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:12,852-Speed 9656.28 samples/sec Loss 4.9546 LearningRate 0.0146 Epoch: 12 Global Step: 206190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:13,901-Speed 9760.04 samples/sec Loss 5.0307 LearningRate 0.0146 Epoch: 12 Global Step: 206200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:14,984-Speed 9466.81 samples/sec Loss 5.0395 LearningRate 0.0146 Epoch: 12 Global Step: 206210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:16,055-Speed 9568.96 samples/sec Loss 4.9341 LearningRate 0.0146 Epoch: 12 Global Step: 206220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:17,129-Speed 9538.37 samples/sec Loss 4.9293 LearningRate 0.0146 Epoch: 12 Global Step: 206230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:18,237-Speed 9249.34 samples/sec Loss 4.9694 LearningRate 0.0146 Epoch: 12 Global Step: 206240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:19,381-Speed 8960.70 samples/sec Loss 4.9825 LearningRate 0.0146 Epoch: 12 Global Step: 206250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:20,485-Speed 9276.03 samples/sec Loss 5.0367 LearningRate 0.0146 Epoch: 12 Global Step: 206260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:21,571-Speed 9440.90 samples/sec Loss 4.9680 LearningRate 0.0146 Epoch: 12 Global Step: 206270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:22,674-Speed 9289.70 samples/sec Loss 4.8944 LearningRate 0.0146 Epoch: 12 Global Step: 206280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:23,768-Speed 9365.41 samples/sec Loss 4.9213 LearningRate 0.0146 Epoch: 12 Global Step: 206290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:24,912-Speed 8947.99 samples/sec Loss 4.8586 LearningRate 0.0146 Epoch: 12 Global Step: 206300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:25,994-Speed 9473.30 samples/sec Loss 4.9874 LearningRate 0.0146 Epoch: 12 Global Step: 206310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:27,059-Speed 9632.03 samples/sec Loss 4.9693 LearningRate 0.0146 Epoch: 12 Global Step: 206320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:28,112-Speed 9724.66 samples/sec Loss 4.9817 LearningRate 0.0146 Epoch: 12 Global Step: 206330 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:52:29,202-Speed 9401.08 samples/sec Loss 5.0414 LearningRate 0.0146 Epoch: 12 Global Step: 206340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:30,297-Speed 9358.53 samples/sec Loss 5.0016 LearningRate 0.0146 Epoch: 12 Global Step: 206350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:31,352-Speed 9705.27 samples/sec Loss 4.9979 LearningRate 0.0146 Epoch: 12 Global Step: 206360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:32,471-Speed 9159.60 samples/sec Loss 4.9680 LearningRate 0.0146 Epoch: 12 Global Step: 206370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:33,562-Speed 9396.35 samples/sec Loss 5.0394 LearningRate 0.0146 Epoch: 12 Global Step: 206380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:34,672-Speed 9229.69 samples/sec Loss 4.9266 LearningRate 0.0146 Epoch: 12 Global Step: 206390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:35,802-Speed 9065.32 samples/sec Loss 4.9325 LearningRate 0.0146 Epoch: 12 Global Step: 206400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:36,883-Speed 9480.87 samples/sec Loss 4.9292 LearningRate 0.0146 Epoch: 12 Global Step: 206410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:37,979-Speed 9349.53 samples/sec Loss 5.0141 LearningRate 0.0146 Epoch: 12 Global Step: 206420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:39,057-Speed 9500.50 samples/sec Loss 4.9277 LearningRate 0.0146 Epoch: 12 Global Step: 206430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:40,156-Speed 9324.88 samples/sec Loss 5.0107 LearningRate 0.0146 Epoch: 12 Global Step: 206440 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:52:41,241-Speed 9445.51 samples/sec Loss 4.9915 LearningRate 0.0146 Epoch: 12 Global Step: 206450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:42,323-Speed 9472.02 samples/sec Loss 4.9521 LearningRate 0.0146 Epoch: 12 Global Step: 206460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:43,372-Speed 9768.58 samples/sec Loss 4.9824 LearningRate 0.0146 Epoch: 12 Global Step: 206470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:44,487-Speed 9187.09 samples/sec Loss 4.8919 LearningRate 0.0146 Epoch: 12 Global Step: 206480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:45,586-Speed 9327.24 samples/sec Loss 4.9694 LearningRate 0.0145 Epoch: 12 Global Step: 206490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:46,731-Speed 8946.31 samples/sec Loss 5.0492 LearningRate 0.0145 Epoch: 12 Global Step: 206500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:47,887-Speed 8866.88 samples/sec Loss 5.0031 LearningRate 0.0145 Epoch: 12 Global Step: 206510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:48,947-Speed 9665.46 samples/sec Loss 5.0010 LearningRate 0.0145 Epoch: 12 Global Step: 206520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:50,025-Speed 9506.40 samples/sec Loss 5.0838 LearningRate 0.0145 Epoch: 12 Global Step: 206530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:51,125-Speed 9311.61 samples/sec Loss 4.9573 LearningRate 0.0145 Epoch: 12 Global Step: 206540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:52,183-Speed 9690.91 samples/sec Loss 4.9749 LearningRate 0.0145 Epoch: 12 Global Step: 206550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:53,267-Speed 9450.06 samples/sec Loss 5.0497 LearningRate 0.0145 Epoch: 12 Global Step: 206560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:54,345-Speed 9505.15 samples/sec Loss 5.0219 LearningRate 0.0145 Epoch: 12 Global Step: 206570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:55,391-Speed 9800.39 samples/sec Loss 4.9501 LearningRate 0.0145 Epoch: 12 Global Step: 206580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:56,448-Speed 9697.14 samples/sec Loss 5.0890 LearningRate 0.0145 Epoch: 12 Global Step: 206590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:57,542-Speed 9357.53 samples/sec Loss 5.0747 LearningRate 0.0145 Epoch: 12 Global Step: 206600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:58,635-Speed 9376.89 samples/sec Loss 4.9430 LearningRate 0.0145 Epoch: 12 Global Step: 206610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:52:59,682-Speed 9787.59 samples/sec Loss 4.9499 LearningRate 0.0145 Epoch: 12 Global Step: 206620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:00,743-Speed 9654.47 samples/sec Loss 5.0534 LearningRate 0.0145 Epoch: 12 Global Step: 206630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:01,824-Speed 9482.00 samples/sec Loss 4.9999 LearningRate 0.0145 Epoch: 12 Global Step: 206640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:02,939-Speed 9188.53 samples/sec Loss 4.9722 LearningRate 0.0145 Epoch: 12 Global Step: 206650 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:53:04,003-Speed 9631.87 samples/sec Loss 5.1238 LearningRate 0.0145 Epoch: 12 Global Step: 206660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:05,059-Speed 9704.32 samples/sec Loss 5.0437 LearningRate 0.0145 Epoch: 12 Global Step: 206670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:06,171-Speed 9213.74 samples/sec Loss 4.9015 LearningRate 0.0145 Epoch: 12 Global Step: 206680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:07,220-Speed 9768.57 samples/sec Loss 5.0185 LearningRate 0.0145 Epoch: 12 Global Step: 206690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:08,334-Speed 9202.80 samples/sec Loss 5.0324 LearningRate 0.0145 Epoch: 12 Global Step: 206700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:09,413-Speed 9488.52 samples/sec Loss 5.0601 LearningRate 0.0145 Epoch: 12 Global Step: 206710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:10,518-Speed 9273.77 samples/sec Loss 5.0719 LearningRate 0.0145 Epoch: 12 Global Step: 206720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:11,611-Speed 9374.79 samples/sec Loss 5.1183 LearningRate 0.0145 Epoch: 12 Global Step: 206730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:12,750-Speed 8997.87 samples/sec Loss 4.9824 LearningRate 0.0145 Epoch: 12 Global Step: 206740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:13,811-Speed 9654.25 samples/sec Loss 5.1023 LearningRate 0.0145 Epoch: 12 Global Step: 206750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:14,869-Speed 9691.17 samples/sec Loss 5.0097 LearningRate 0.0145 Epoch: 12 Global Step: 206760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:15,990-Speed 9138.59 samples/sec Loss 5.0502 LearningRate 0.0145 Epoch: 12 Global Step: 206770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:17,107-Speed 9177.37 samples/sec Loss 4.9774 LearningRate 0.0145 Epoch: 12 Global Step: 206780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:18,231-Speed 9114.45 samples/sec Loss 5.0411 LearningRate 0.0145 Epoch: 12 Global Step: 206790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:19,322-Speed 9394.34 samples/sec Loss 4.9636 LearningRate 0.0145 Epoch: 12 Global Step: 206800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:20,420-Speed 9331.40 samples/sec Loss 5.0228 LearningRate 0.0145 Epoch: 12 Global Step: 206810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:21,518-Speed 9327.24 samples/sec Loss 5.0312 LearningRate 0.0145 Epoch: 12 Global Step: 206820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:22,633-Speed 9188.21 samples/sec Loss 5.0386 LearningRate 0.0145 Epoch: 12 Global Step: 206830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:23,757-Speed 9117.13 samples/sec Loss 5.0323 LearningRate 0.0145 Epoch: 12 Global Step: 206840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:24,822-Speed 9621.55 samples/sec Loss 5.0272 LearningRate 0.0145 Epoch: 12 Global Step: 206850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:25,916-Speed 9362.80 samples/sec Loss 4.9638 LearningRate 0.0145 Epoch: 12 Global Step: 206860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:27,020-Speed 9290.83 samples/sec Loss 5.0683 LearningRate 0.0145 Epoch: 12 Global Step: 206870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:28,171-Speed 8903.39 samples/sec Loss 5.0195 LearningRate 0.0145 Epoch: 12 Global Step: 206880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:29,268-Speed 9338.03 samples/sec Loss 4.9737 LearningRate 0.0145 Epoch: 12 Global Step: 206890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:30,342-Speed 9539.71 samples/sec Loss 5.0109 LearningRate 0.0145 Epoch: 12 Global Step: 206900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:31,426-Speed 9453.67 samples/sec Loss 5.0137 LearningRate 0.0145 Epoch: 12 Global Step: 206910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:32,552-Speed 9103.27 samples/sec Loss 5.0812 LearningRate 0.0145 Epoch: 12 Global Step: 206920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:33,627-Speed 9531.49 samples/sec Loss 4.9378 LearningRate 0.0144 Epoch: 12 Global Step: 206930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:34,713-Speed 9434.39 samples/sec Loss 5.0400 LearningRate 0.0144 Epoch: 12 Global Step: 206940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:35,799-Speed 9438.13 samples/sec Loss 5.0100 LearningRate 0.0144 Epoch: 12 Global Step: 206950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:36,866-Speed 9605.70 samples/sec Loss 4.9817 LearningRate 0.0144 Epoch: 12 Global Step: 206960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:37,991-Speed 9109.49 samples/sec Loss 5.0491 LearningRate 0.0144 Epoch: 12 Global Step: 206970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:39,109-Speed 9170.72 samples/sec Loss 5.0109 LearningRate 0.0144 Epoch: 12 Global Step: 206980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:40,173-Speed 9624.69 samples/sec Loss 4.9755 LearningRate 0.0144 Epoch: 12 Global Step: 206990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:41,249-Speed 9524.45 samples/sec Loss 5.0665 LearningRate 0.0144 Epoch: 12 Global Step: 207000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:42,348-Speed 9325.70 samples/sec Loss 5.0309 LearningRate 0.0144 Epoch: 12 Global Step: 207010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:43,460-Speed 9216.71 samples/sec Loss 5.0327 LearningRate 0.0144 Epoch: 12 Global Step: 207020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:44,567-Speed 9253.38 samples/sec Loss 5.0859 LearningRate 0.0144 Epoch: 12 Global Step: 207030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:45,667-Speed 9320.38 samples/sec Loss 5.0037 LearningRate 0.0144 Epoch: 12 Global Step: 207040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:46,757-Speed 9402.51 samples/sec Loss 5.0257 LearningRate 0.0144 Epoch: 12 Global Step: 207050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:47,842-Speed 9444.08 samples/sec Loss 5.0553 LearningRate 0.0144 Epoch: 12 Global Step: 207060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:53:48,933-Speed 9389.37 samples/sec Loss 5.0542 LearningRate 0.0144 Epoch: 12 Global Step: 207070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:50,061-Speed 9084.62 samples/sec Loss 4.9956 LearningRate 0.0144 Epoch: 12 Global Step: 207080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:51,150-Speed 9409.90 samples/sec Loss 4.9652 LearningRate 0.0144 Epoch: 12 Global Step: 207090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:52,229-Speed 9497.53 samples/sec Loss 5.0989 LearningRate 0.0144 Epoch: 12 Global Step: 207100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:53,338-Speed 9238.88 samples/sec Loss 4.9373 LearningRate 0.0144 Epoch: 12 Global Step: 207110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:54,448-Speed 9227.60 samples/sec Loss 5.0296 LearningRate 0.0144 Epoch: 12 Global Step: 207120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:55,559-Speed 9227.89 samples/sec Loss 5.0050 LearningRate 0.0144 Epoch: 12 Global Step: 207130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:56,641-Speed 9469.21 samples/sec Loss 4.9528 LearningRate 0.0144 Epoch: 12 Global Step: 207140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:57,767-Speed 9104.08 samples/sec Loss 5.0039 LearningRate 0.0144 Epoch: 12 Global Step: 207150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:53:58,917-Speed 8906.95 samples/sec Loss 4.9537 LearningRate 0.0144 Epoch: 12 Global Step: 207160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:00,000-Speed 9465.73 samples/sec Loss 5.0594 LearningRate 0.0144 Epoch: 12 Global Step: 207170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:01,068-Speed 9588.96 samples/sec Loss 5.0389 LearningRate 0.0144 Epoch: 12 Global Step: 207180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:02,150-Speed 9475.20 samples/sec Loss 4.9913 LearningRate 0.0144 Epoch: 12 Global Step: 207190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:03,301-Speed 8902.65 samples/sec Loss 4.9911 LearningRate 0.0144 Epoch: 12 Global Step: 207200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:04,395-Speed 9370.88 samples/sec Loss 5.0889 LearningRate 0.0144 Epoch: 12 Global Step: 207210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:05,480-Speed 9442.11 samples/sec Loss 4.9413 LearningRate 0.0144 Epoch: 12 Global Step: 207220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:06,565-Speed 9443.01 samples/sec Loss 4.9083 LearningRate 0.0144 Epoch: 12 Global Step: 207230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:07,666-Speed 9304.71 samples/sec Loss 4.9927 LearningRate 0.0144 Epoch: 12 Global Step: 207240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:08,800-Speed 9038.18 samples/sec Loss 5.0059 LearningRate 0.0144 Epoch: 12 Global Step: 207250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:09,893-Speed 9373.96 samples/sec Loss 5.0294 LearningRate 0.0144 Epoch: 12 Global Step: 207260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:10,982-Speed 9410.94 samples/sec Loss 4.9820 LearningRate 0.0144 Epoch: 12 Global Step: 207270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:12,047-Speed 9611.64 samples/sec Loss 5.1115 LearningRate 0.0144 Epoch: 12 Global Step: 207280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:13,117-Speed 9582.16 samples/sec Loss 5.0328 LearningRate 0.0144 Epoch: 12 Global Step: 207290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:14,195-Speed 9507.71 samples/sec Loss 5.1159 LearningRate 0.0144 Epoch: 12 Global Step: 207300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:15,281-Speed 9437.18 samples/sec Loss 4.9755 LearningRate 0.0144 Epoch: 12 Global Step: 207310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:16,351-Speed 9573.69 samples/sec Loss 5.0629 LearningRate 0.0144 Epoch: 12 Global Step: 207320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:17,416-Speed 9622.47 samples/sec Loss 5.0174 LearningRate 0.0144 Epoch: 12 Global Step: 207330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:18,530-Speed 9203.01 samples/sec Loss 4.9575 LearningRate 0.0144 Epoch: 12 Global Step: 207340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:19,627-Speed 9342.32 samples/sec Loss 5.0285 LearningRate 0.0144 Epoch: 12 Global Step: 207350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:20,710-Speed 9454.47 samples/sec Loss 4.9920 LearningRate 0.0144 Epoch: 12 Global Step: 207360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:21,785-Speed 9530.82 samples/sec Loss 5.0198 LearningRate 0.0143 Epoch: 12 Global Step: 207370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:22,871-Speed 9439.62 samples/sec Loss 5.0413 LearningRate 0.0143 Epoch: 12 Global Step: 207380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:23,941-Speed 9567.44 samples/sec Loss 4.9824 LearningRate 0.0143 Epoch: 12 Global Step: 207390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:25,059-Speed 9169.31 samples/sec Loss 5.0703 LearningRate 0.0143 Epoch: 12 Global Step: 207400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:26,153-Speed 9363.30 samples/sec Loss 4.9961 LearningRate 0.0143 Epoch: 12 Global Step: 207410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:27,256-Speed 9294.58 samples/sec Loss 5.0841 LearningRate 0.0143 Epoch: 12 Global Step: 207420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:28,343-Speed 9425.87 samples/sec Loss 4.9782 LearningRate 0.0143 Epoch: 12 Global Step: 207430 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:54:29,411-Speed 9595.13 samples/sec Loss 4.9671 LearningRate 0.0143 Epoch: 12 Global Step: 207440 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:54:30,541-Speed 9073.55 samples/sec Loss 5.0878 LearningRate 0.0143 Epoch: 12 Global Step: 207450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:31,642-Speed 9306.04 samples/sec Loss 4.9679 LearningRate 0.0143 Epoch: 12 Global Step: 207460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:32,718-Speed 9515.30 samples/sec Loss 5.0567 LearningRate 0.0143 Epoch: 12 Global Step: 207470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:33,828-Speed 9234.77 samples/sec Loss 4.9482 LearningRate 0.0143 Epoch: 12 Global Step: 207480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:34,899-Speed 9572.40 samples/sec Loss 5.0806 LearningRate 0.0143 Epoch: 12 Global Step: 207490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:35,995-Speed 9346.74 samples/sec Loss 4.9413 LearningRate 0.0143 Epoch: 12 Global Step: 207500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:37,111-Speed 9179.24 samples/sec Loss 5.0358 LearningRate 0.0143 Epoch: 12 Global Step: 207510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:38,192-Speed 9483.19 samples/sec Loss 5.0440 LearningRate 0.0143 Epoch: 12 Global Step: 207520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:39,338-Speed 8932.69 samples/sec Loss 4.9725 LearningRate 0.0143 Epoch: 12 Global Step: 207530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:40,423-Speed 9443.25 samples/sec Loss 4.9366 LearningRate 0.0143 Epoch: 12 Global Step: 207540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:41,519-Speed 9348.23 samples/sec Loss 5.0693 LearningRate 0.0143 Epoch: 12 Global Step: 207550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:42,582-Speed 9645.04 samples/sec Loss 5.0067 LearningRate 0.0143 Epoch: 12 Global Step: 207560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:43,677-Speed 9355.98 samples/sec Loss 5.0305 LearningRate 0.0143 Epoch: 12 Global Step: 207570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:44,786-Speed 9234.73 samples/sec Loss 5.1375 LearningRate 0.0143 Epoch: 12 Global Step: 207580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:45,873-Speed 9435.19 samples/sec Loss 5.0436 LearningRate 0.0143 Epoch: 12 Global Step: 207590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:46,958-Speed 9441.66 samples/sec Loss 5.0985 LearningRate 0.0143 Epoch: 12 Global Step: 207600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:48,036-Speed 9500.63 samples/sec Loss 5.0479 LearningRate 0.0143 Epoch: 12 Global Step: 207610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:49,126-Speed 9399.39 samples/sec Loss 4.9433 LearningRate 0.0143 Epoch: 12 Global Step: 207620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:50,258-Speed 9051.11 samples/sec Loss 5.0258 LearningRate 0.0143 Epoch: 12 Global Step: 207630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:51,341-Speed 9457.96 samples/sec Loss 5.0067 LearningRate 0.0143 Epoch: 12 Global Step: 207640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:52,493-Speed 8895.58 samples/sec Loss 5.1112 LearningRate 0.0143 Epoch: 12 Global Step: 207650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:53,584-Speed 9388.90 samples/sec Loss 5.0691 LearningRate 0.0143 Epoch: 12 Global Step: 207660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:54,694-Speed 9236.08 samples/sec Loss 5.0380 LearningRate 0.0143 Epoch: 12 Global Step: 207670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:55,753-Speed 9676.18 samples/sec Loss 5.0431 LearningRate 0.0143 Epoch: 12 Global Step: 207680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:54:56,836-Speed 9468.60 samples/sec Loss 5.0405 LearningRate 0.0143 Epoch: 12 Global Step: 207690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:57,939-Speed 9285.90 samples/sec Loss 5.0831 LearningRate 0.0143 Epoch: 12 Global Step: 207700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:54:59,075-Speed 9022.05 samples/sec Loss 4.9634 LearningRate 0.0143 Epoch: 12 Global Step: 207710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:00,161-Speed 9434.75 samples/sec Loss 5.0397 LearningRate 0.0143 Epoch: 12 Global Step: 207720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:01,259-Speed 9329.43 samples/sec Loss 5.0148 LearningRate 0.0143 Epoch: 12 Global Step: 207730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:02,356-Speed 9344.03 samples/sec Loss 5.0538 LearningRate 0.0143 Epoch: 12 Global Step: 207740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:03,515-Speed 8842.22 samples/sec Loss 5.0302 LearningRate 0.0143 Epoch: 12 Global Step: 207750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:04,659-Speed 8955.69 samples/sec Loss 5.0478 LearningRate 0.0143 Epoch: 12 Global Step: 207760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:05,742-Speed 9465.10 samples/sec Loss 5.0095 LearningRate 0.0143 Epoch: 12 Global Step: 207770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:06,824-Speed 9471.71 samples/sec Loss 5.1087 LearningRate 0.0143 Epoch: 12 Global Step: 207780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:07,896-Speed 9554.53 samples/sec Loss 4.9614 LearningRate 0.0143 Epoch: 12 Global Step: 207790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:08,941-Speed 9809.16 samples/sec Loss 5.0543 LearningRate 0.0143 Epoch: 12 Global Step: 207800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:10,040-Speed 9325.90 samples/sec Loss 5.0676 LearningRate 0.0142 Epoch: 12 Global Step: 207810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:11,130-Speed 9398.59 samples/sec Loss 5.0700 LearningRate 0.0142 Epoch: 12 Global Step: 207820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:12,212-Speed 9468.78 samples/sec Loss 5.1254 LearningRate 0.0142 Epoch: 12 Global Step: 207830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:13,324-Speed 9211.75 samples/sec Loss 4.9673 LearningRate 0.0142 Epoch: 12 Global Step: 207840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:14,393-Speed 9589.66 samples/sec Loss 5.0155 LearningRate 0.0142 Epoch: 12 Global Step: 207850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:15,498-Speed 9271.92 samples/sec Loss 5.0074 LearningRate 0.0142 Epoch: 12 Global Step: 207860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:16,604-Speed 9262.80 samples/sec Loss 5.0369 LearningRate 0.0142 Epoch: 12 Global Step: 207870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:17,674-Speed 9578.05 samples/sec Loss 5.0350 LearningRate 0.0142 Epoch: 12 Global Step: 207880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:18,725-Speed 9748.76 samples/sec Loss 5.1246 LearningRate 0.0142 Epoch: 12 Global Step: 207890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:19,771-Speed 9800.92 samples/sec Loss 4.9660 LearningRate 0.0142 Epoch: 12 Global Step: 207900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:20,904-Speed 9039.05 samples/sec Loss 5.0270 LearningRate 0.0142 Epoch: 12 Global Step: 207910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:22,038-Speed 9040.38 samples/sec Loss 5.0843 LearningRate 0.0142 Epoch: 12 Global Step: 207920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:23,164-Speed 9092.97 samples/sec Loss 5.0326 LearningRate 0.0142 Epoch: 12 Global Step: 207930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:24,246-Speed 9476.36 samples/sec Loss 5.0538 LearningRate 0.0142 Epoch: 12 Global Step: 207940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:25,331-Speed 9440.21 samples/sec Loss 5.0273 LearningRate 0.0142 Epoch: 12 Global Step: 207950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:26,413-Speed 9470.08 samples/sec Loss 5.0141 LearningRate 0.0142 Epoch: 12 Global Step: 207960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:27,538-Speed 9107.46 samples/sec Loss 5.0316 LearningRate 0.0142 Epoch: 12 Global Step: 207970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:55:28,666-Speed 9079.81 samples/sec Loss 4.9445 LearningRate 0.0142 Epoch: 12 Global Step: 207980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:29,720-Speed 9727.33 samples/sec Loss 5.0990 LearningRate 0.0142 Epoch: 12 Global Step: 207990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:30,835-Speed 9183.41 samples/sec Loss 5.0474 LearningRate 0.0142 Epoch: 12 Global Step: 208000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:55:53,356-[lfw][208000]XNorm: 8.578325 Training: 2022-04-11 19:55:53,356-[lfw][208000]Accuracy-Flip: 0.99600+-0.00291 Training: 2022-04-11 19:55:53,356-[lfw][208000]Accuracy-Highest: 0.99683 Training: 2022-04-11 19:56:19,053-[cfp_fp][208000]XNorm: 7.294262 Training: 2022-04-11 19:56:19,054-[cfp_fp][208000]Accuracy-Flip: 0.96571+-0.00932 Training: 2022-04-11 19:56:19,054-[cfp_fp][208000]Accuracy-Highest: 0.96771 Training: 2022-04-11 19:56:41,458-[agedb_30][208000]XNorm: 8.298497 Training: 2022-04-11 19:56:41,459-[agedb_30][208000]Accuracy-Flip: 0.96883+-0.00837 Training: 2022-04-11 19:56:41,459-[agedb_30][208000]Accuracy-Highest: 0.96983 Training: 2022-04-11 19:56:42,534-Speed 142.82 samples/sec Loss 5.0363 LearningRate 0.0142 Epoch: 12 Global Step: 208010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:56:43,609-Speed 9531.17 samples/sec Loss 5.0898 LearningRate 0.0142 Epoch: 12 Global Step: 208020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:56:44,664-Speed 9711.31 samples/sec Loss 4.9662 LearningRate 0.0142 Epoch: 12 Global Step: 208030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:56:45,742-Speed 9501.41 samples/sec Loss 5.0186 LearningRate 0.0142 Epoch: 12 Global Step: 208040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:56:46,831-Speed 9411.58 samples/sec Loss 5.0039 LearningRate 0.0142 Epoch: 12 Global Step: 208050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:56:47,902-Speed 9567.72 samples/sec Loss 4.9874 LearningRate 0.0142 Epoch: 12 Global Step: 208060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:56:49,012-Speed 9236.44 samples/sec Loss 5.1053 LearningRate 0.0142 Epoch: 12 Global Step: 208070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:56:50,115-Speed 9294.02 samples/sec Loss 5.0408 LearningRate 0.0142 Epoch: 12 Global Step: 208080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:56:51,220-Speed 9267.09 samples/sec Loss 5.0833 LearningRate 0.0142 Epoch: 12 Global Step: 208090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:56:52,287-Speed 9606.72 samples/sec Loss 5.0013 LearningRate 0.0142 Epoch: 12 Global Step: 208100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:56:53,424-Speed 9004.77 samples/sec Loss 5.0092 LearningRate 0.0142 Epoch: 12 Global Step: 208110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:56:54,506-Speed 9471.34 samples/sec Loss 5.0405 LearningRate 0.0142 Epoch: 12 Global Step: 208120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:56:55,581-Speed 9540.77 samples/sec Loss 5.0527 LearningRate 0.0142 Epoch: 12 Global Step: 208130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:56:56,664-Speed 9454.01 samples/sec Loss 5.0897 LearningRate 0.0142 Epoch: 12 Global Step: 208140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:56:57,787-Speed 9131.63 samples/sec Loss 4.9546 LearningRate 0.0142 Epoch: 12 Global Step: 208150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:56:58,886-Speed 9322.92 samples/sec Loss 4.9480 LearningRate 0.0142 Epoch: 12 Global Step: 208160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:56:59,997-Speed 9217.87 samples/sec Loss 4.9852 LearningRate 0.0142 Epoch: 12 Global Step: 208170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:01,088-Speed 9398.61 samples/sec Loss 4.9432 LearningRate 0.0142 Epoch: 12 Global Step: 208180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:02,160-Speed 9551.48 samples/sec Loss 5.0178 LearningRate 0.0142 Epoch: 12 Global Step: 208190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:03,394-Speed 8306.96 samples/sec Loss 5.0404 LearningRate 0.0142 Epoch: 12 Global Step: 208200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:04,501-Speed 9259.04 samples/sec Loss 4.9583 LearningRate 0.0142 Epoch: 12 Global Step: 208210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:05,650-Speed 8920.58 samples/sec Loss 5.0585 LearningRate 0.0142 Epoch: 12 Global Step: 208220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:06,734-Speed 9450.56 samples/sec Loss 5.0931 LearningRate 0.0142 Epoch: 12 Global Step: 208230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:07,831-Speed 9338.74 samples/sec Loss 5.0448 LearningRate 0.0142 Epoch: 12 Global Step: 208240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:08,936-Speed 9274.87 samples/sec Loss 5.0175 LearningRate 0.0141 Epoch: 12 Global Step: 208250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:10,030-Speed 9364.56 samples/sec Loss 5.0279 LearningRate 0.0141 Epoch: 12 Global Step: 208260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:11,136-Speed 9265.15 samples/sec Loss 5.0962 LearningRate 0.0141 Epoch: 12 Global Step: 208270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:12,252-Speed 9177.17 samples/sec Loss 4.9563 LearningRate 0.0141 Epoch: 12 Global Step: 208280 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:57:13,373-Speed 9142.21 samples/sec Loss 5.0334 LearningRate 0.0141 Epoch: 12 Global Step: 208290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:14,434-Speed 9660.21 samples/sec Loss 4.9690 LearningRate 0.0141 Epoch: 12 Global Step: 208300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:15,577-Speed 8962.32 samples/sec Loss 5.0623 LearningRate 0.0141 Epoch: 12 Global Step: 208310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:16,700-Speed 9119.83 samples/sec Loss 5.0925 LearningRate 0.0141 Epoch: 12 Global Step: 208320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:17,821-Speed 9141.17 samples/sec Loss 4.9862 LearningRate 0.0141 Epoch: 12 Global Step: 208330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:18,865-Speed 9816.58 samples/sec Loss 5.0934 LearningRate 0.0141 Epoch: 12 Global Step: 208340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:19,935-Speed 9576.73 samples/sec Loss 4.9897 LearningRate 0.0141 Epoch: 12 Global Step: 208350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:21,020-Speed 9441.36 samples/sec Loss 5.0336 LearningRate 0.0141 Epoch: 12 Global Step: 208360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:22,116-Speed 9354.14 samples/sec Loss 5.0009 LearningRate 0.0141 Epoch: 12 Global Step: 208370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:23,205-Speed 9402.16 samples/sec Loss 5.0382 LearningRate 0.0141 Epoch: 12 Global Step: 208380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:24,297-Speed 9385.22 samples/sec Loss 5.0979 LearningRate 0.0141 Epoch: 12 Global Step: 208390 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:57:25,420-Speed 9125.57 samples/sec Loss 5.1080 LearningRate 0.0141 Epoch: 12 Global Step: 208400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:26,488-Speed 9597.66 samples/sec Loss 5.0426 LearningRate 0.0141 Epoch: 12 Global Step: 208410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:27,596-Speed 9251.25 samples/sec Loss 5.0643 LearningRate 0.0141 Epoch: 12 Global Step: 208420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:28,682-Speed 9431.43 samples/sec Loss 5.0444 LearningRate 0.0141 Epoch: 12 Global Step: 208430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:29,831-Speed 8916.91 samples/sec Loss 5.0429 LearningRate 0.0141 Epoch: 12 Global Step: 208440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:30,938-Speed 9254.33 samples/sec Loss 5.0112 LearningRate 0.0141 Epoch: 12 Global Step: 208450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:32,036-Speed 9340.74 samples/sec Loss 5.0217 LearningRate 0.0141 Epoch: 12 Global Step: 208460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:33,161-Speed 9103.27 samples/sec Loss 5.0925 LearningRate 0.0141 Epoch: 12 Global Step: 208470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:34,245-Speed 9455.92 samples/sec Loss 4.9839 LearningRate 0.0141 Epoch: 12 Global Step: 208480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:35,316-Speed 9562.96 samples/sec Loss 5.0086 LearningRate 0.0141 Epoch: 12 Global Step: 208490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:36,418-Speed 9303.96 samples/sec Loss 5.0410 LearningRate 0.0141 Epoch: 12 Global Step: 208500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:37,527-Speed 9237.55 samples/sec Loss 5.0950 LearningRate 0.0141 Epoch: 12 Global Step: 208510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:38,603-Speed 9519.24 samples/sec Loss 5.0152 LearningRate 0.0141 Epoch: 12 Global Step: 208520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:39,675-Speed 9564.59 samples/sec Loss 4.9391 LearningRate 0.0141 Epoch: 12 Global Step: 208530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:40,787-Speed 9210.66 samples/sec Loss 5.0661 LearningRate 0.0141 Epoch: 12 Global Step: 208540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:41,846-Speed 9678.16 samples/sec Loss 4.9741 LearningRate 0.0141 Epoch: 12 Global Step: 208550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:42,921-Speed 9533.75 samples/sec Loss 4.9254 LearningRate 0.0141 Epoch: 12 Global Step: 208560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:44,001-Speed 9487.25 samples/sec Loss 5.1292 LearningRate 0.0141 Epoch: 12 Global Step: 208570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:45,076-Speed 9526.53 samples/sec Loss 5.0378 LearningRate 0.0141 Epoch: 12 Global Step: 208580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:46,180-Speed 9287.26 samples/sec Loss 5.1196 LearningRate 0.0141 Epoch: 12 Global Step: 208590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:47,270-Speed 9401.13 samples/sec Loss 5.0005 LearningRate 0.0141 Epoch: 12 Global Step: 208600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:57:48,351-Speed 9477.16 samples/sec Loss 4.9982 LearningRate 0.0141 Epoch: 12 Global Step: 208610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:57:49,403-Speed 9743.36 samples/sec Loss 4.9328 LearningRate 0.0141 Epoch: 12 Global Step: 208620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:57:50,461-Speed 9688.42 samples/sec Loss 5.0184 LearningRate 0.0141 Epoch: 12 Global Step: 208630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:57:51,548-Speed 9420.97 samples/sec Loss 4.9679 LearningRate 0.0141 Epoch: 12 Global Step: 208640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:57:52,606-Speed 9689.44 samples/sec Loss 5.0425 LearningRate 0.0141 Epoch: 12 Global Step: 208650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:57:53,670-Speed 9631.84 samples/sec Loss 5.0866 LearningRate 0.0141 Epoch: 12 Global Step: 208660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:57:54,766-Speed 9349.51 samples/sec Loss 5.0305 LearningRate 0.0141 Epoch: 12 Global Step: 208670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:57:55,851-Speed 9438.50 samples/sec Loss 4.9643 LearningRate 0.0141 Epoch: 12 Global Step: 208680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:57:56,953-Speed 9304.02 samples/sec Loss 5.1063 LearningRate 0.0141 Epoch: 12 Global Step: 208690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:57:58,044-Speed 9387.18 samples/sec Loss 5.0678 LearningRate 0.0140 Epoch: 12 Global Step: 208700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:57:59,112-Speed 9596.65 samples/sec Loss 5.0020 LearningRate 0.0140 Epoch: 12 Global Step: 208710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:00,191-Speed 9494.32 samples/sec Loss 5.0294 LearningRate 0.0140 Epoch: 12 Global Step: 208720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:01,301-Speed 9229.54 samples/sec Loss 5.0027 LearningRate 0.0140 Epoch: 12 Global Step: 208730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:02,384-Speed 9461.01 samples/sec Loss 5.1144 LearningRate 0.0140 Epoch: 12 Global Step: 208740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:03,488-Speed 9286.22 samples/sec Loss 4.9644 LearningRate 0.0140 Epoch: 12 Global Step: 208750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:04,570-Speed 9473.07 samples/sec Loss 4.9919 LearningRate 0.0140 Epoch: 12 Global Step: 208760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:05,660-Speed 9397.60 samples/sec Loss 5.0126 LearningRate 0.0140 Epoch: 12 Global Step: 208770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:06,726-Speed 9613.50 samples/sec Loss 4.9564 LearningRate 0.0140 Epoch: 12 Global Step: 208780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:07,862-Speed 9013.80 samples/sec Loss 5.0263 LearningRate 0.0140 Epoch: 12 Global Step: 208790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:08,929-Speed 9606.04 samples/sec Loss 4.9834 LearningRate 0.0140 Epoch: 12 Global Step: 208800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:10,001-Speed 9558.63 samples/sec Loss 4.9515 LearningRate 0.0140 Epoch: 12 Global Step: 208810 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:58:11,127-Speed 9095.96 samples/sec Loss 5.0643 LearningRate 0.0140 Epoch: 12 Global Step: 208820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:12,221-Speed 9361.88 samples/sec Loss 5.0733 LearningRate 0.0140 Epoch: 12 Global Step: 208830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:13,282-Speed 9658.39 samples/sec Loss 5.0353 LearningRate 0.0140 Epoch: 12 Global Step: 208840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:14,348-Speed 9613.83 samples/sec Loss 5.0799 LearningRate 0.0140 Epoch: 12 Global Step: 208850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:15,429-Speed 9475.19 samples/sec Loss 5.0492 LearningRate 0.0140 Epoch: 12 Global Step: 208860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:16,458-Speed 9966.46 samples/sec Loss 5.0514 LearningRate 0.0140 Epoch: 12 Global Step: 208870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:17,573-Speed 9191.67 samples/sec Loss 5.0269 LearningRate 0.0140 Epoch: 12 Global Step: 208880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:18,660-Speed 9418.99 samples/sec Loss 5.0364 LearningRate 0.0140 Epoch: 12 Global Step: 208890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:19,777-Speed 9174.36 samples/sec Loss 5.0122 LearningRate 0.0140 Epoch: 12 Global Step: 208900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:20,878-Speed 9308.71 samples/sec Loss 5.0953 LearningRate 0.0140 Epoch: 12 Global Step: 208910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:21,952-Speed 9540.41 samples/sec Loss 4.9398 LearningRate 0.0140 Epoch: 12 Global Step: 208920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:23,064-Speed 9215.10 samples/sec Loss 5.0659 LearningRate 0.0140 Epoch: 12 Global Step: 208930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:24,236-Speed 8739.61 samples/sec Loss 5.1401 LearningRate 0.0140 Epoch: 12 Global Step: 208940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:25,315-Speed 9497.56 samples/sec Loss 5.0120 LearningRate 0.0140 Epoch: 12 Global Step: 208950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:26,410-Speed 9359.52 samples/sec Loss 4.9716 LearningRate 0.0140 Epoch: 12 Global Step: 208960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:27,503-Speed 9372.04 samples/sec Loss 5.0747 LearningRate 0.0140 Epoch: 12 Global Step: 208970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:28,619-Speed 9186.34 samples/sec Loss 4.9484 LearningRate 0.0140 Epoch: 12 Global Step: 208980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:29,686-Speed 9601.04 samples/sec Loss 5.1222 LearningRate 0.0140 Epoch: 12 Global Step: 208990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:30,776-Speed 9402.06 samples/sec Loss 4.9832 LearningRate 0.0140 Epoch: 12 Global Step: 209000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:31,815-Speed 9857.97 samples/sec Loss 4.9123 LearningRate 0.0140 Epoch: 12 Global Step: 209010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:32,923-Speed 9255.67 samples/sec Loss 5.0644 LearningRate 0.0140 Epoch: 12 Global Step: 209020 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:58:34,010-Speed 9425.12 samples/sec Loss 5.0356 LearningRate 0.0140 Epoch: 12 Global Step: 209030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:35,067-Speed 9687.52 samples/sec Loss 5.0136 LearningRate 0.0140 Epoch: 12 Global Step: 209040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:36,173-Speed 9268.95 samples/sec Loss 5.0184 LearningRate 0.0140 Epoch: 12 Global Step: 209050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:37,260-Speed 9423.95 samples/sec Loss 5.0925 LearningRate 0.0140 Epoch: 12 Global Step: 209060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:38,324-Speed 9633.85 samples/sec Loss 5.0236 LearningRate 0.0140 Epoch: 12 Global Step: 209070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:39,436-Speed 9218.72 samples/sec Loss 4.9612 LearningRate 0.0140 Epoch: 12 Global Step: 209080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:40,495-Speed 9671.94 samples/sec Loss 4.9725 LearningRate 0.0140 Epoch: 12 Global Step: 209090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:41,559-Speed 9629.07 samples/sec Loss 4.9949 LearningRate 0.0140 Epoch: 12 Global Step: 209100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:42,667-Speed 9251.78 samples/sec Loss 5.0269 LearningRate 0.0140 Epoch: 12 Global Step: 209110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:43,765-Speed 9330.95 samples/sec Loss 5.0376 LearningRate 0.0140 Epoch: 12 Global Step: 209120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:44,814-Speed 9767.40 samples/sec Loss 4.9614 LearningRate 0.0140 Epoch: 12 Global Step: 209130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:45,932-Speed 9165.31 samples/sec Loss 5.0019 LearningRate 0.0139 Epoch: 12 Global Step: 209140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:47,045-Speed 9205.26 samples/sec Loss 5.0032 LearningRate 0.0139 Epoch: 12 Global Step: 209150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:48,114-Speed 9588.86 samples/sec Loss 5.1013 LearningRate 0.0139 Epoch: 12 Global Step: 209160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:49,280-Speed 8784.41 samples/sec Loss 5.0075 LearningRate 0.0139 Epoch: 12 Global Step: 209170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:50,355-Speed 9538.14 samples/sec Loss 4.9514 LearningRate 0.0139 Epoch: 12 Global Step: 209180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:51,434-Speed 9497.69 samples/sec Loss 5.0809 LearningRate 0.0139 Epoch: 12 Global Step: 209190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:52,572-Speed 8998.84 samples/sec Loss 5.0413 LearningRate 0.0139 Epoch: 12 Global Step: 209200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:53,739-Speed 8783.39 samples/sec Loss 5.0698 LearningRate 0.0139 Epoch: 12 Global Step: 209210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:54,807-Speed 9596.28 samples/sec Loss 5.0766 LearningRate 0.0139 Epoch: 12 Global Step: 209220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:55,903-Speed 9350.39 samples/sec Loss 5.0278 LearningRate 0.0139 Epoch: 12 Global Step: 209230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:56,975-Speed 9557.13 samples/sec Loss 5.0123 LearningRate 0.0139 Epoch: 12 Global Step: 209240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:58,068-Speed 9374.21 samples/sec Loss 5.0111 LearningRate 0.0139 Epoch: 12 Global Step: 209250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:58:59,139-Speed 9568.60 samples/sec Loss 5.0197 LearningRate 0.0139 Epoch: 12 Global Step: 209260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:00,207-Speed 9592.64 samples/sec Loss 5.0431 LearningRate 0.0139 Epoch: 12 Global Step: 209270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:01,297-Speed 9400.83 samples/sec Loss 5.0943 LearningRate 0.0139 Epoch: 12 Global Step: 209280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:02,385-Speed 9413.06 samples/sec Loss 5.0120 LearningRate 0.0139 Epoch: 12 Global Step: 209290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:03,498-Speed 9209.23 samples/sec Loss 5.1079 LearningRate 0.0139 Epoch: 12 Global Step: 209300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:04,581-Speed 9463.18 samples/sec Loss 5.1039 LearningRate 0.0139 Epoch: 12 Global Step: 209310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:05,652-Speed 9567.41 samples/sec Loss 4.9798 LearningRate 0.0139 Epoch: 12 Global Step: 209320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:06,725-Speed 9549.43 samples/sec Loss 5.0115 LearningRate 0.0139 Epoch: 12 Global Step: 209330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:07,839-Speed 9192.96 samples/sec Loss 5.0482 LearningRate 0.0139 Epoch: 12 Global Step: 209340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:08,925-Speed 9436.40 samples/sec Loss 5.0337 LearningRate 0.0139 Epoch: 12 Global Step: 209350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:10,088-Speed 8812.74 samples/sec Loss 5.0557 LearningRate 0.0139 Epoch: 12 Global Step: 209360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:11,165-Speed 9519.40 samples/sec Loss 4.9770 LearningRate 0.0139 Epoch: 12 Global Step: 209370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:12,250-Speed 9437.70 samples/sec Loss 4.9848 LearningRate 0.0139 Epoch: 12 Global Step: 209380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:13,397-Speed 8934.53 samples/sec Loss 5.0400 LearningRate 0.0139 Epoch: 12 Global Step: 209390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:14,483-Speed 9435.62 samples/sec Loss 5.1109 LearningRate 0.0139 Epoch: 12 Global Step: 209400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:15,596-Speed 9204.05 samples/sec Loss 5.0474 LearningRate 0.0139 Epoch: 12 Global Step: 209410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:16,704-Speed 9247.32 samples/sec Loss 5.0655 LearningRate 0.0139 Epoch: 12 Global Step: 209420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:17,809-Speed 9271.98 samples/sec Loss 5.0401 LearningRate 0.0139 Epoch: 12 Global Step: 209430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:18,893-Speed 9449.16 samples/sec Loss 5.0086 LearningRate 0.0139 Epoch: 12 Global Step: 209440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:19,991-Speed 9336.31 samples/sec Loss 5.1058 LearningRate 0.0139 Epoch: 12 Global Step: 209450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:21,106-Speed 9192.68 samples/sec Loss 4.9710 LearningRate 0.0139 Epoch: 12 Global Step: 209460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:22,167-Speed 9658.30 samples/sec Loss 4.9598 LearningRate 0.0139 Epoch: 12 Global Step: 209470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:23,295-Speed 9079.51 samples/sec Loss 5.0305 LearningRate 0.0139 Epoch: 12 Global Step: 209480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:24,393-Speed 9334.69 samples/sec Loss 5.0916 LearningRate 0.0139 Epoch: 12 Global Step: 209490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:25,507-Speed 9195.95 samples/sec Loss 5.0907 LearningRate 0.0139 Epoch: 12 Global Step: 209500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:26,587-Speed 9483.94 samples/sec Loss 4.9739 LearningRate 0.0139 Epoch: 12 Global Step: 209510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:27,725-Speed 9002.52 samples/sec Loss 5.0556 LearningRate 0.0139 Epoch: 12 Global Step: 209520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:28,836-Speed 9224.05 samples/sec Loss 4.9825 LearningRate 0.0139 Epoch: 12 Global Step: 209530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:29,976-Speed 8989.08 samples/sec Loss 5.0735 LearningRate 0.0139 Epoch: 12 Global Step: 209540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:31,066-Speed 9405.26 samples/sec Loss 5.0003 LearningRate 0.0139 Epoch: 12 Global Step: 209550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:32,207-Speed 8985.10 samples/sec Loss 5.0848 LearningRate 0.0139 Epoch: 12 Global Step: 209560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:33,318-Speed 9220.44 samples/sec Loss 5.0831 LearningRate 0.0139 Epoch: 12 Global Step: 209570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:34,374-Speed 9701.13 samples/sec Loss 5.1063 LearningRate 0.0139 Epoch: 12 Global Step: 209580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:35,457-Speed 9465.80 samples/sec Loss 5.0106 LearningRate 0.0138 Epoch: 12 Global Step: 209590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:36,507-Speed 9751.98 samples/sec Loss 5.0146 LearningRate 0.0138 Epoch: 12 Global Step: 209600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:37,600-Speed 9379.44 samples/sec Loss 5.1950 LearningRate 0.0138 Epoch: 12 Global Step: 209610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:38,670-Speed 9580.03 samples/sec Loss 5.0550 LearningRate 0.0138 Epoch: 12 Global Step: 209620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:39,771-Speed 9305.73 samples/sec Loss 5.0320 LearningRate 0.0138 Epoch: 12 Global Step: 209630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:40,864-Speed 9378.40 samples/sec Loss 5.0361 LearningRate 0.0138 Epoch: 12 Global Step: 209640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:41,953-Speed 9404.13 samples/sec Loss 4.9733 LearningRate 0.0138 Epoch: 12 Global Step: 209650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:43,047-Speed 9364.27 samples/sec Loss 5.0006 LearningRate 0.0138 Epoch: 12 Global Step: 209660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:44,118-Speed 9567.23 samples/sec Loss 5.0224 LearningRate 0.0138 Epoch: 12 Global Step: 209670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:45,217-Speed 9331.42 samples/sec Loss 5.0834 LearningRate 0.0138 Epoch: 12 Global Step: 209680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:46,302-Speed 9436.69 samples/sec Loss 5.0293 LearningRate 0.0138 Epoch: 12 Global Step: 209690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:47,394-Speed 9381.96 samples/sec Loss 4.9463 LearningRate 0.0138 Epoch: 12 Global Step: 209700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:48,471-Speed 9512.94 samples/sec Loss 5.0350 LearningRate 0.0138 Epoch: 12 Global Step: 209710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:49,647-Speed 8715.16 samples/sec Loss 4.9772 LearningRate 0.0138 Epoch: 12 Global Step: 209720 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 19:59:50,724-Speed 9534.85 samples/sec Loss 4.9319 LearningRate 0.0138 Epoch: 12 Global Step: 209730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:51,863-Speed 8995.24 samples/sec Loss 5.0306 LearningRate 0.0138 Epoch: 12 Global Step: 209740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:53,028-Speed 8788.35 samples/sec Loss 4.9384 LearningRate 0.0138 Epoch: 12 Global Step: 209750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:54,149-Speed 9139.33 samples/sec Loss 5.1124 LearningRate 0.0138 Epoch: 12 Global Step: 209760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:55,246-Speed 9342.47 samples/sec Loss 4.9920 LearningRate 0.0138 Epoch: 12 Global Step: 209770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:56,350-Speed 9287.19 samples/sec Loss 4.9982 LearningRate 0.0138 Epoch: 12 Global Step: 209780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:57,445-Speed 9358.06 samples/sec Loss 5.0437 LearningRate 0.0138 Epoch: 12 Global Step: 209790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 19:59:58,529-Speed 9447.98 samples/sec Loss 5.0683 LearningRate 0.0138 Epoch: 12 Global Step: 209800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 19:59:59,636-Speed 9256.25 samples/sec Loss 5.0009 LearningRate 0.0138 Epoch: 12 Global Step: 209810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:00:00,712-Speed 9520.82 samples/sec Loss 5.0009 LearningRate 0.0138 Epoch: 12 Global Step: 209820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:00:01,819-Speed 9256.48 samples/sec Loss 4.9886 LearningRate 0.0138 Epoch: 12 Global Step: 209830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:00:02,934-Speed 9194.08 samples/sec Loss 5.0567 LearningRate 0.0138 Epoch: 12 Global Step: 209840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:00:04,006-Speed 9555.66 samples/sec Loss 5.0533 LearningRate 0.0138 Epoch: 12 Global Step: 209850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:00:05,101-Speed 9362.03 samples/sec Loss 5.0353 LearningRate 0.0138 Epoch: 12 Global Step: 209860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:00:06,251-Speed 8906.04 samples/sec Loss 5.0902 LearningRate 0.0138 Epoch: 12 Global Step: 209870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:00:07,343-Speed 9383.93 samples/sec Loss 4.9892 LearningRate 0.0138 Epoch: 12 Global Step: 209880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:00:08,445-Speed 9296.24 samples/sec Loss 5.0282 LearningRate 0.0138 Epoch: 12 Global Step: 209890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:00:09,518-Speed 9553.79 samples/sec Loss 5.0347 LearningRate 0.0138 Epoch: 12 Global Step: 209900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:10,576-Speed 9685.21 samples/sec Loss 5.1065 LearningRate 0.0138 Epoch: 12 Global Step: 209910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:11,667-Speed 9391.08 samples/sec Loss 5.0313 LearningRate 0.0138 Epoch: 12 Global Step: 209920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:12,725-Speed 9679.51 samples/sec Loss 5.1142 LearningRate 0.0138 Epoch: 12 Global Step: 209930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:13,818-Speed 9376.10 samples/sec Loss 4.9715 LearningRate 0.0138 Epoch: 12 Global Step: 209940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:14,887-Speed 9582.41 samples/sec Loss 4.9994 LearningRate 0.0138 Epoch: 12 Global Step: 209950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:15,968-Speed 9479.39 samples/sec Loss 5.0316 LearningRate 0.0138 Epoch: 12 Global Step: 209960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:17,087-Speed 9150.74 samples/sec Loss 5.0756 LearningRate 0.0138 Epoch: 12 Global Step: 209970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:18,161-Speed 9546.60 samples/sec Loss 5.0757 LearningRate 0.0138 Epoch: 12 Global Step: 209980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:19,264-Speed 9288.75 samples/sec Loss 5.0161 LearningRate 0.0138 Epoch: 12 Global Step: 209990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:20,399-Speed 9023.00 samples/sec Loss 5.0189 LearningRate 0.0138 Epoch: 12 Global Step: 210000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:00:42,359-[lfw][210000]XNorm: 8.414431 Training: 2022-04-11 20:00:42,360-[lfw][210000]Accuracy-Flip: 0.99550+-0.00325 Training: 2022-04-11 20:00:42,360-[lfw][210000]Accuracy-Highest: 0.99683 Training: 2022-04-11 20:01:07,857-[cfp_fp][210000]XNorm: 7.211093 Training: 2022-04-11 20:01:07,858-[cfp_fp][210000]Accuracy-Flip: 0.96514+-0.00872 Training: 2022-04-11 20:01:07,858-[cfp_fp][210000]Accuracy-Highest: 0.96771 Training: 2022-04-11 20:01:29,903-[agedb_30][210000]XNorm: 8.184819 Training: 2022-04-11 20:01:29,904-[agedb_30][210000]Accuracy-Flip: 0.96933+-0.00946 Training: 2022-04-11 20:01:29,904-[agedb_30][210000]Accuracy-Highest: 0.96983 Training: 2022-04-11 20:01:30,970-Speed 145.10 samples/sec Loss 5.0565 LearningRate 0.0138 Epoch: 12 Global Step: 210010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:32,053-Speed 9464.83 samples/sec Loss 5.1059 LearningRate 0.0138 Epoch: 12 Global Step: 210020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:33,172-Speed 9154.74 samples/sec Loss 5.0650 LearningRate 0.0138 Epoch: 12 Global Step: 210030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:34,260-Speed 9413.21 samples/sec Loss 4.9930 LearningRate 0.0137 Epoch: 12 Global Step: 210040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:35,362-Speed 9296.43 samples/sec Loss 5.0610 LearningRate 0.0137 Epoch: 12 Global Step: 210050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:36,452-Speed 9404.35 samples/sec Loss 5.0318 LearningRate 0.0137 Epoch: 12 Global Step: 210060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:37,492-Speed 9843.77 samples/sec Loss 5.0023 LearningRate 0.0137 Epoch: 12 Global Step: 210070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:38,620-Speed 9082.40 samples/sec Loss 5.0394 LearningRate 0.0137 Epoch: 12 Global Step: 210080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:39,708-Speed 9416.55 samples/sec Loss 5.1297 LearningRate 0.0137 Epoch: 12 Global Step: 210090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:40,801-Speed 9375.94 samples/sec Loss 4.9661 LearningRate 0.0137 Epoch: 12 Global Step: 210100 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:01:41,874-Speed 9550.68 samples/sec Loss 5.0267 LearningRate 0.0137 Epoch: 12 Global Step: 210110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:42,928-Speed 9725.26 samples/sec Loss 5.0796 LearningRate 0.0137 Epoch: 12 Global Step: 210120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:43,976-Speed 9773.51 samples/sec Loss 4.9941 LearningRate 0.0137 Epoch: 12 Global Step: 210130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:45,070-Speed 9362.90 samples/sec Loss 4.9320 LearningRate 0.0137 Epoch: 12 Global Step: 210140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:46,152-Speed 9472.66 samples/sec Loss 5.0862 LearningRate 0.0137 Epoch: 12 Global Step: 210150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:47,222-Speed 9571.63 samples/sec Loss 4.9585 LearningRate 0.0137 Epoch: 12 Global Step: 210160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:48,287-Speed 9621.99 samples/sec Loss 5.1075 LearningRate 0.0137 Epoch: 12 Global Step: 210170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:49,374-Speed 9439.13 samples/sec Loss 5.0620 LearningRate 0.0137 Epoch: 12 Global Step: 210180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:01:50,440-Speed 9613.75 samples/sec Loss 5.0382 LearningRate 0.0137 Epoch: 12 Global Step: 210190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:01:51,544-Speed 9280.68 samples/sec Loss 5.0192 LearningRate 0.0137 Epoch: 12 Global Step: 210200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:01:52,634-Speed 9403.40 samples/sec Loss 4.9632 LearningRate 0.0137 Epoch: 12 Global Step: 210210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:01:53,730-Speed 9346.66 samples/sec Loss 4.9984 LearningRate 0.0137 Epoch: 12 Global Step: 210220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:01:54,776-Speed 9789.62 samples/sec Loss 5.0643 LearningRate 0.0137 Epoch: 12 Global Step: 210230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:01:55,868-Speed 9387.49 samples/sec Loss 5.0158 LearningRate 0.0137 Epoch: 12 Global Step: 210240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:01:56,983-Speed 9183.66 samples/sec Loss 5.0960 LearningRate 0.0137 Epoch: 12 Global Step: 210250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:01:58,060-Speed 9519.46 samples/sec Loss 5.0514 LearningRate 0.0137 Epoch: 12 Global Step: 210260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:01:59,150-Speed 9399.00 samples/sec Loss 5.0710 LearningRate 0.0137 Epoch: 12 Global Step: 210270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:02:00,242-Speed 9382.63 samples/sec Loss 5.0090 LearningRate 0.0137 Epoch: 12 Global Step: 210280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:02:01,332-Speed 9395.30 samples/sec Loss 5.0133 LearningRate 0.0137 Epoch: 12 Global Step: 210290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:02,458-Speed 9113.65 samples/sec Loss 5.0849 LearningRate 0.0137 Epoch: 12 Global Step: 210300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:03,554-Speed 9347.33 samples/sec Loss 5.1371 LearningRate 0.0137 Epoch: 12 Global Step: 210310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:04,627-Speed 9545.91 samples/sec Loss 5.0780 LearningRate 0.0137 Epoch: 12 Global Step: 210320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:05,689-Speed 9651.84 samples/sec Loss 5.0748 LearningRate 0.0137 Epoch: 12 Global Step: 210330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:06,767-Speed 9498.24 samples/sec Loss 5.0797 LearningRate 0.0137 Epoch: 12 Global Step: 210340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:07,881-Speed 9201.29 samples/sec Loss 4.9885 LearningRate 0.0137 Epoch: 12 Global Step: 210350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:08,967-Speed 9439.36 samples/sec Loss 4.9992 LearningRate 0.0137 Epoch: 12 Global Step: 210360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:10,074-Speed 9251.04 samples/sec Loss 4.9653 LearningRate 0.0137 Epoch: 12 Global Step: 210370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:11,102-Speed 9965.67 samples/sec Loss 5.0775 LearningRate 0.0137 Epoch: 12 Global Step: 210380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:12,187-Speed 9446.16 samples/sec Loss 5.0575 LearningRate 0.0137 Epoch: 12 Global Step: 210390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:13,286-Speed 9323.85 samples/sec Loss 4.9191 LearningRate 0.0137 Epoch: 12 Global Step: 210400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:14,380-Speed 9365.02 samples/sec Loss 4.9609 LearningRate 0.0137 Epoch: 12 Global Step: 210410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:15,494-Speed 9197.51 samples/sec Loss 5.0368 LearningRate 0.0137 Epoch: 12 Global Step: 210420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:16,569-Speed 9530.15 samples/sec Loss 5.0660 LearningRate 0.0137 Epoch: 12 Global Step: 210430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:17,641-Speed 9561.00 samples/sec Loss 4.9920 LearningRate 0.0137 Epoch: 12 Global Step: 210440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:18,767-Speed 9099.15 samples/sec Loss 4.9989 LearningRate 0.0137 Epoch: 12 Global Step: 210450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:19,844-Speed 9518.16 samples/sec Loss 4.9755 LearningRate 0.0137 Epoch: 12 Global Step: 210460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:20,938-Speed 9367.61 samples/sec Loss 5.1656 LearningRate 0.0137 Epoch: 12 Global Step: 210470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:22,041-Speed 9283.49 samples/sec Loss 5.0578 LearningRate 0.0137 Epoch: 12 Global Step: 210480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:23,123-Speed 9473.52 samples/sec Loss 5.0568 LearningRate 0.0136 Epoch: 12 Global Step: 210490 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:02:24,217-Speed 9366.09 samples/sec Loss 4.9984 LearningRate 0.0136 Epoch: 12 Global Step: 210500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:25,287-Speed 9575.13 samples/sec Loss 5.0916 LearningRate 0.0136 Epoch: 12 Global Step: 210510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:26,422-Speed 9024.76 samples/sec Loss 5.0179 LearningRate 0.0136 Epoch: 12 Global Step: 210520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:27,555-Speed 9049.24 samples/sec Loss 5.1198 LearningRate 0.0136 Epoch: 12 Global Step: 210530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:28,639-Speed 9457.45 samples/sec Loss 5.0574 LearningRate 0.0136 Epoch: 12 Global Step: 210540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:29,722-Speed 9460.37 samples/sec Loss 5.0584 LearningRate 0.0136 Epoch: 12 Global Step: 210550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:30,824-Speed 9300.30 samples/sec Loss 5.0074 LearningRate 0.0136 Epoch: 12 Global Step: 210560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:31,923-Speed 9325.57 samples/sec Loss 5.1046 LearningRate 0.0136 Epoch: 12 Global Step: 210570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:33,006-Speed 9455.17 samples/sec Loss 5.1024 LearningRate 0.0136 Epoch: 12 Global Step: 210580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:34,096-Speed 9403.21 samples/sec Loss 5.0202 LearningRate 0.0136 Epoch: 12 Global Step: 210590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:35,186-Speed 9404.24 samples/sec Loss 5.1317 LearningRate 0.0136 Epoch: 12 Global Step: 210600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:36,323-Speed 9006.65 samples/sec Loss 5.0230 LearningRate 0.0136 Epoch: 12 Global Step: 210610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:37,444-Speed 9140.27 samples/sec Loss 4.8815 LearningRate 0.0136 Epoch: 12 Global Step: 210620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:38,561-Speed 9172.66 samples/sec Loss 5.0491 LearningRate 0.0136 Epoch: 12 Global Step: 210630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:39,671-Speed 9235.37 samples/sec Loss 5.0117 LearningRate 0.0136 Epoch: 12 Global Step: 210640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:40,772-Speed 9307.40 samples/sec Loss 4.9908 LearningRate 0.0136 Epoch: 12 Global Step: 210650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:41,865-Speed 9372.79 samples/sec Loss 5.1144 LearningRate 0.0136 Epoch: 12 Global Step: 210660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:42,955-Speed 9397.97 samples/sec Loss 5.0659 LearningRate 0.0136 Epoch: 12 Global Step: 210670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:44,084-Speed 9079.43 samples/sec Loss 5.0219 LearningRate 0.0136 Epoch: 12 Global Step: 210680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:45,169-Speed 9443.27 samples/sec Loss 5.0709 LearningRate 0.0136 Epoch: 12 Global Step: 210690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:46,302-Speed 9043.28 samples/sec Loss 5.0516 LearningRate 0.0136 Epoch: 12 Global Step: 210700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:47,405-Speed 9292.11 samples/sec Loss 4.9913 LearningRate 0.0136 Epoch: 12 Global Step: 210710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:48,502-Speed 9336.19 samples/sec Loss 5.0380 LearningRate 0.0136 Epoch: 12 Global Step: 210720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:49,586-Speed 9456.95 samples/sec Loss 5.2134 LearningRate 0.0136 Epoch: 12 Global Step: 210730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:50,685-Speed 9323.89 samples/sec Loss 5.0497 LearningRate 0.0136 Epoch: 12 Global Step: 210740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:51,830-Speed 8947.36 samples/sec Loss 5.0263 LearningRate 0.0136 Epoch: 12 Global Step: 210750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:52,895-Speed 9616.27 samples/sec Loss 5.0100 LearningRate 0.0136 Epoch: 12 Global Step: 210760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:54,010-Speed 9192.96 samples/sec Loss 5.0092 LearningRate 0.0136 Epoch: 12 Global Step: 210770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:55,189-Speed 8687.78 samples/sec Loss 5.0085 LearningRate 0.0136 Epoch: 12 Global Step: 210780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:56,292-Speed 9290.68 samples/sec Loss 5.0162 LearningRate 0.0136 Epoch: 12 Global Step: 210790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:57,410-Speed 9167.25 samples/sec Loss 5.0258 LearningRate 0.0136 Epoch: 12 Global Step: 210800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:58,562-Speed 8889.96 samples/sec Loss 5.0807 LearningRate 0.0136 Epoch: 12 Global Step: 210810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:02:59,643-Speed 9484.54 samples/sec Loss 5.1093 LearningRate 0.0136 Epoch: 12 Global Step: 210820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:00,761-Speed 9168.18 samples/sec Loss 5.1155 LearningRate 0.0136 Epoch: 12 Global Step: 210830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:01,843-Speed 9471.06 samples/sec Loss 5.0304 LearningRate 0.0136 Epoch: 12 Global Step: 210840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:02,926-Speed 9454.92 samples/sec Loss 5.1264 LearningRate 0.0136 Epoch: 12 Global Step: 210850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:03,998-Speed 9557.09 samples/sec Loss 5.0507 LearningRate 0.0136 Epoch: 12 Global Step: 210860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:05,069-Speed 9573.40 samples/sec Loss 5.0907 LearningRate 0.0136 Epoch: 12 Global Step: 210870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:06,218-Speed 8914.60 samples/sec Loss 5.0493 LearningRate 0.0136 Epoch: 12 Global Step: 210880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:07,321-Speed 9294.87 samples/sec Loss 5.0154 LearningRate 0.0136 Epoch: 12 Global Step: 210890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:08,416-Speed 9351.39 samples/sec Loss 5.1540 LearningRate 0.0136 Epoch: 12 Global Step: 210900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:09,476-Speed 9665.03 samples/sec Loss 5.0599 LearningRate 0.0136 Epoch: 12 Global Step: 210910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:10,534-Speed 9691.85 samples/sec Loss 5.1686 LearningRate 0.0136 Epoch: 12 Global Step: 210920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:11,636-Speed 9290.62 samples/sec Loss 5.1761 LearningRate 0.0136 Epoch: 12 Global Step: 210930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:12,787-Speed 8913.43 samples/sec Loss 4.9808 LearningRate 0.0135 Epoch: 12 Global Step: 210940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:13,894-Speed 9257.27 samples/sec Loss 4.9654 LearningRate 0.0135 Epoch: 12 Global Step: 210950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:14,943-Speed 9759.12 samples/sec Loss 5.0693 LearningRate 0.0135 Epoch: 12 Global Step: 210960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:16,055-Speed 9214.41 samples/sec Loss 4.9998 LearningRate 0.0135 Epoch: 12 Global Step: 210970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:17,199-Speed 8955.78 samples/sec Loss 5.0616 LearningRate 0.0135 Epoch: 12 Global Step: 210980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:18,257-Speed 9691.54 samples/sec Loss 5.1241 LearningRate 0.0135 Epoch: 12 Global Step: 210990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:19,367-Speed 9232.27 samples/sec Loss 5.0890 LearningRate 0.0135 Epoch: 12 Global Step: 211000 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:03:20,426-Speed 9682.36 samples/sec Loss 5.0628 LearningRate 0.0135 Epoch: 12 Global Step: 211010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:21,520-Speed 9365.82 samples/sec Loss 5.0247 LearningRate 0.0135 Epoch: 12 Global Step: 211020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:22,642-Speed 9134.82 samples/sec Loss 5.0258 LearningRate 0.0135 Epoch: 12 Global Step: 211030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:23,744-Speed 9296.07 samples/sec Loss 4.9716 LearningRate 0.0135 Epoch: 12 Global Step: 211040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:24,845-Speed 9304.53 samples/sec Loss 5.0755 LearningRate 0.0135 Epoch: 12 Global Step: 211050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:25,911-Speed 9608.34 samples/sec Loss 5.0402 LearningRate 0.0135 Epoch: 12 Global Step: 211060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:27,003-Speed 9381.16 samples/sec Loss 4.9891 LearningRate 0.0135 Epoch: 12 Global Step: 211070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:28,131-Speed 9083.93 samples/sec Loss 5.0831 LearningRate 0.0135 Epoch: 12 Global Step: 211080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:29,248-Speed 9181.58 samples/sec Loss 5.0726 LearningRate 0.0135 Epoch: 12 Global Step: 211090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:30,315-Speed 9595.71 samples/sec Loss 5.1020 LearningRate 0.0135 Epoch: 12 Global Step: 211100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:31,423-Speed 9255.00 samples/sec Loss 5.0408 LearningRate 0.0135 Epoch: 12 Global Step: 211110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:32,557-Speed 9031.49 samples/sec Loss 5.0353 LearningRate 0.0135 Epoch: 12 Global Step: 211120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:33,636-Speed 9499.02 samples/sec Loss 4.9740 LearningRate 0.0135 Epoch: 12 Global Step: 211130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:34,710-Speed 9540.09 samples/sec Loss 5.0595 LearningRate 0.0135 Epoch: 12 Global Step: 211140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:35,874-Speed 8801.17 samples/sec Loss 5.0504 LearningRate 0.0135 Epoch: 12 Global Step: 211150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:36,973-Speed 9324.49 samples/sec Loss 4.9515 LearningRate 0.0135 Epoch: 12 Global Step: 211160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:38,038-Speed 9626.03 samples/sec Loss 5.0107 LearningRate 0.0135 Epoch: 12 Global Step: 211170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:39,134-Speed 9349.86 samples/sec Loss 4.9975 LearningRate 0.0135 Epoch: 12 Global Step: 211180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:40,220-Speed 9428.32 samples/sec Loss 5.0131 LearningRate 0.0135 Epoch: 12 Global Step: 211190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:41,331-Speed 9226.76 samples/sec Loss 5.0239 LearningRate 0.0135 Epoch: 12 Global Step: 211200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:42,430-Speed 9319.19 samples/sec Loss 5.0525 LearningRate 0.0135 Epoch: 12 Global Step: 211210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:43,539-Speed 9239.32 samples/sec Loss 5.0875 LearningRate 0.0135 Epoch: 12 Global Step: 211220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:03:44,632-Speed 9375.22 samples/sec Loss 4.9834 LearningRate 0.0135 Epoch: 12 Global Step: 211230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:45,747-Speed 9192.46 samples/sec Loss 5.0040 LearningRate 0.0135 Epoch: 12 Global Step: 211240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:46,853-Speed 9261.92 samples/sec Loss 5.0393 LearningRate 0.0135 Epoch: 12 Global Step: 211250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:47,924-Speed 9568.39 samples/sec Loss 5.0743 LearningRate 0.0135 Epoch: 12 Global Step: 211260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:49,066-Speed 8978.25 samples/sec Loss 4.9549 LearningRate 0.0135 Epoch: 12 Global Step: 211270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:50,167-Speed 9301.14 samples/sec Loss 5.1097 LearningRate 0.0135 Epoch: 12 Global Step: 211280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:51,376-Speed 8479.24 samples/sec Loss 4.9839 LearningRate 0.0135 Epoch: 12 Global Step: 211290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:52,436-Speed 9663.81 samples/sec Loss 5.0400 LearningRate 0.0135 Epoch: 12 Global Step: 211300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:53,522-Speed 9432.14 samples/sec Loss 5.0871 LearningRate 0.0135 Epoch: 12 Global Step: 211310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:54,584-Speed 9650.88 samples/sec Loss 5.0563 LearningRate 0.0135 Epoch: 12 Global Step: 211320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:55,660-Speed 9522.05 samples/sec Loss 5.0063 LearningRate 0.0135 Epoch: 12 Global Step: 211330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:56,846-Speed 8637.08 samples/sec Loss 5.0475 LearningRate 0.0135 Epoch: 12 Global Step: 211340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:57,915-Speed 9582.76 samples/sec Loss 5.0196 LearningRate 0.0135 Epoch: 12 Global Step: 211350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:03:58,998-Speed 9461.88 samples/sec Loss 5.0328 LearningRate 0.0135 Epoch: 12 Global Step: 211360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:00,105-Speed 9261.64 samples/sec Loss 5.0194 LearningRate 0.0135 Epoch: 12 Global Step: 211370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:01,200-Speed 9352.92 samples/sec Loss 4.9826 LearningRate 0.0135 Epoch: 12 Global Step: 211380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:02,295-Speed 9356.19 samples/sec Loss 5.0149 LearningRate 0.0135 Epoch: 12 Global Step: 211390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:03,410-Speed 9192.58 samples/sec Loss 5.1974 LearningRate 0.0134 Epoch: 12 Global Step: 211400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:04,512-Speed 9297.51 samples/sec Loss 5.0115 LearningRate 0.0134 Epoch: 12 Global Step: 211410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:05,650-Speed 9001.55 samples/sec Loss 5.0797 LearningRate 0.0134 Epoch: 12 Global Step: 211420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:06,715-Speed 9616.39 samples/sec Loss 5.0663 LearningRate 0.0134 Epoch: 12 Global Step: 211430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:07,835-Speed 9157.04 samples/sec Loss 5.0435 LearningRate 0.0134 Epoch: 12 Global Step: 211440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:08,932-Speed 9337.41 samples/sec Loss 5.0714 LearningRate 0.0134 Epoch: 12 Global Step: 211450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:10,033-Speed 9308.23 samples/sec Loss 5.1009 LearningRate 0.0134 Epoch: 12 Global Step: 211460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:11,093-Speed 9669.40 samples/sec Loss 5.0852 LearningRate 0.0134 Epoch: 12 Global Step: 211470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:04:12,158-Speed 9621.77 samples/sec Loss 5.0691 LearningRate 0.0134 Epoch: 12 Global Step: 211480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:04:13,233-Speed 9528.60 samples/sec Loss 4.9898 LearningRate 0.0134 Epoch: 12 Global Step: 211490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:04:14,329-Speed 9354.42 samples/sec Loss 5.0654 LearningRate 0.0134 Epoch: 12 Global Step: 211500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:04:15,428-Speed 9319.35 samples/sec Loss 5.1131 LearningRate 0.0134 Epoch: 12 Global Step: 211510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:04:16,514-Speed 9436.40 samples/sec Loss 5.1274 LearningRate 0.0134 Epoch: 12 Global Step: 211520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:04:17,575-Speed 9648.60 samples/sec Loss 5.0835 LearningRate 0.0134 Epoch: 12 Global Step: 211530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:04:18,671-Speed 9355.03 samples/sec Loss 5.0842 LearningRate 0.0134 Epoch: 12 Global Step: 211540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:04:19,779-Speed 9251.56 samples/sec Loss 4.9933 LearningRate 0.0134 Epoch: 12 Global Step: 211550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:04:20,845-Speed 9612.55 samples/sec Loss 5.0497 LearningRate 0.0134 Epoch: 12 Global Step: 211560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:04:21,892-Speed 9781.18 samples/sec Loss 4.9987 LearningRate 0.0134 Epoch: 12 Global Step: 211570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:22,978-Speed 9442.76 samples/sec Loss 4.9829 LearningRate 0.0134 Epoch: 12 Global Step: 211580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:24,085-Speed 9253.80 samples/sec Loss 4.9884 LearningRate 0.0134 Epoch: 12 Global Step: 211590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:25,182-Speed 9345.26 samples/sec Loss 4.9924 LearningRate 0.0134 Epoch: 12 Global Step: 211600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:26,267-Speed 9442.51 samples/sec Loss 5.0135 LearningRate 0.0134 Epoch: 12 Global Step: 211610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:27,389-Speed 9130.13 samples/sec Loss 4.9610 LearningRate 0.0134 Epoch: 12 Global Step: 211620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:28,456-Speed 9601.58 samples/sec Loss 5.0528 LearningRate 0.0134 Epoch: 12 Global Step: 211630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:29,587-Speed 9067.36 samples/sec Loss 5.0524 LearningRate 0.0134 Epoch: 12 Global Step: 211640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:30,678-Speed 9388.12 samples/sec Loss 5.0857 LearningRate 0.0134 Epoch: 12 Global Step: 211650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:31,726-Speed 9777.58 samples/sec Loss 5.0325 LearningRate 0.0134 Epoch: 12 Global Step: 211660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:32,792-Speed 9613.37 samples/sec Loss 4.9822 LearningRate 0.0134 Epoch: 12 Global Step: 211670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:33,912-Speed 9144.94 samples/sec Loss 5.0939 LearningRate 0.0134 Epoch: 12 Global Step: 211680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:34,984-Speed 9555.97 samples/sec Loss 5.0540 LearningRate 0.0134 Epoch: 12 Global Step: 211690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:36,070-Speed 9437.14 samples/sec Loss 5.0867 LearningRate 0.0134 Epoch: 12 Global Step: 211700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:37,206-Speed 9022.23 samples/sec Loss 5.0149 LearningRate 0.0134 Epoch: 12 Global Step: 211710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:38,330-Speed 9114.73 samples/sec Loss 5.0725 LearningRate 0.0134 Epoch: 12 Global Step: 211720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:39,426-Speed 9346.41 samples/sec Loss 5.0889 LearningRate 0.0134 Epoch: 12 Global Step: 211730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:40,490-Speed 9628.27 samples/sec Loss 4.9289 LearningRate 0.0134 Epoch: 12 Global Step: 211740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:41,578-Speed 9419.38 samples/sec Loss 5.1909 LearningRate 0.0134 Epoch: 12 Global Step: 211750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:42,705-Speed 9096.57 samples/sec Loss 5.0851 LearningRate 0.0134 Epoch: 12 Global Step: 211760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:43,791-Speed 9428.96 samples/sec Loss 5.0875 LearningRate 0.0134 Epoch: 12 Global Step: 211770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:44,902-Speed 9225.66 samples/sec Loss 5.0334 LearningRate 0.0134 Epoch: 12 Global Step: 211780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:45,980-Speed 9498.97 samples/sec Loss 4.9646 LearningRate 0.0134 Epoch: 12 Global Step: 211790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:47,070-Speed 9400.73 samples/sec Loss 5.0383 LearningRate 0.0134 Epoch: 12 Global Step: 211800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:48,152-Speed 9472.92 samples/sec Loss 5.0185 LearningRate 0.0134 Epoch: 12 Global Step: 211810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:49,266-Speed 9199.80 samples/sec Loss 5.0458 LearningRate 0.0134 Epoch: 12 Global Step: 211820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:50,345-Speed 9494.44 samples/sec Loss 5.0762 LearningRate 0.0134 Epoch: 12 Global Step: 211830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:51,522-Speed 8705.20 samples/sec Loss 4.9824 LearningRate 0.0134 Epoch: 12 Global Step: 211840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:52,628-Speed 9270.87 samples/sec Loss 5.1005 LearningRate 0.0134 Epoch: 12 Global Step: 211850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:53,724-Speed 9347.61 samples/sec Loss 4.9950 LearningRate 0.0133 Epoch: 12 Global Step: 211860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:54,806-Speed 9473.47 samples/sec Loss 5.0555 LearningRate 0.0133 Epoch: 12 Global Step: 211870 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:04:55,910-Speed 9275.97 samples/sec Loss 5.0103 LearningRate 0.0133 Epoch: 12 Global Step: 211880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:57,013-Speed 9291.14 samples/sec Loss 5.0694 LearningRate 0.0133 Epoch: 12 Global Step: 211890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:58,066-Speed 9727.40 samples/sec Loss 5.0748 LearningRate 0.0133 Epoch: 12 Global Step: 211900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:04:59,172-Speed 9264.67 samples/sec Loss 5.0275 LearningRate 0.0133 Epoch: 12 Global Step: 211910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:05:00,280-Speed 9246.85 samples/sec Loss 5.0581 LearningRate 0.0133 Epoch: 12 Global Step: 211920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:05:01,391-Speed 9225.36 samples/sec Loss 5.0994 LearningRate 0.0133 Epoch: 12 Global Step: 211930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:05:02,476-Speed 9441.65 samples/sec Loss 5.0751 LearningRate 0.0133 Epoch: 12 Global Step: 211940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:05:03,579-Speed 9287.42 samples/sec Loss 4.9977 LearningRate 0.0133 Epoch: 12 Global Step: 211950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:05:04,682-Speed 9293.46 samples/sec Loss 5.0520 LearningRate 0.0133 Epoch: 12 Global Step: 211960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:05:05,756-Speed 9538.72 samples/sec Loss 5.0468 LearningRate 0.0133 Epoch: 12 Global Step: 211970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:05:06,862-Speed 9267.84 samples/sec Loss 5.0010 LearningRate 0.0133 Epoch: 12 Global Step: 211980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:05:07,968-Speed 9265.89 samples/sec Loss 5.0724 LearningRate 0.0133 Epoch: 12 Global Step: 211990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:05:09,040-Speed 9555.91 samples/sec Loss 5.1244 LearningRate 0.0133 Epoch: 12 Global Step: 212000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:05:30,970-[lfw][212000]XNorm: 8.396976 Training: 2022-04-11 20:05:30,971-[lfw][212000]Accuracy-Flip: 0.99600+-0.00367 Training: 2022-04-11 20:05:30,971-[lfw][212000]Accuracy-Highest: 0.99683 Training: 2022-04-11 20:05:56,314-[cfp_fp][212000]XNorm: 7.223426 Training: 2022-04-11 20:05:56,314-[cfp_fp][212000]Accuracy-Flip: 0.96757+-0.00876 Training: 2022-04-11 20:05:56,315-[cfp_fp][212000]Accuracy-Highest: 0.96771 Training: 2022-04-11 20:06:18,207-[agedb_30][212000]XNorm: 8.128869 Training: 2022-04-11 20:06:18,208-[agedb_30][212000]Accuracy-Flip: 0.96933+-0.00867 Training: 2022-04-11 20:06:18,208-[agedb_30][212000]Accuracy-Highest: 0.96983 Training: 2022-04-11 20:06:19,305-Speed 145.73 samples/sec Loss 5.0870 LearningRate 0.0133 Epoch: 12 Global Step: 212010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:20,406-Speed 9306.64 samples/sec Loss 5.0613 LearningRate 0.0133 Epoch: 12 Global Step: 212020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:21,496-Speed 9401.55 samples/sec Loss 4.9038 LearningRate 0.0133 Epoch: 12 Global Step: 212030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:22,593-Speed 9344.77 samples/sec Loss 5.1054 LearningRate 0.0133 Epoch: 12 Global Step: 212040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:23,688-Speed 9355.45 samples/sec Loss 5.1498 LearningRate 0.0133 Epoch: 12 Global Step: 212050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:24,782-Speed 9364.61 samples/sec Loss 5.0281 LearningRate 0.0133 Epoch: 12 Global Step: 212060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:25,954-Speed 8749.36 samples/sec Loss 5.1288 LearningRate 0.0133 Epoch: 12 Global Step: 212070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:27,037-Speed 9462.57 samples/sec Loss 5.1157 LearningRate 0.0133 Epoch: 12 Global Step: 212080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:28,136-Speed 9323.45 samples/sec Loss 5.0484 LearningRate 0.0133 Epoch: 12 Global Step: 212090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:29,193-Speed 9692.85 samples/sec Loss 5.0375 LearningRate 0.0133 Epoch: 12 Global Step: 212100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:30,339-Speed 8941.86 samples/sec Loss 5.1613 LearningRate 0.0133 Epoch: 12 Global Step: 212110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:31,409-Speed 9571.77 samples/sec Loss 5.0656 LearningRate 0.0133 Epoch: 12 Global Step: 212120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:32,523-Speed 9204.00 samples/sec Loss 5.1488 LearningRate 0.0133 Epoch: 12 Global Step: 212130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:33,623-Speed 9311.56 samples/sec Loss 5.1096 LearningRate 0.0133 Epoch: 12 Global Step: 212140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:34,698-Speed 9528.66 samples/sec Loss 5.0391 LearningRate 0.0133 Epoch: 12 Global Step: 212150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:35,747-Speed 9771.43 samples/sec Loss 5.0251 LearningRate 0.0133 Epoch: 12 Global Step: 212160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:36,829-Speed 9463.27 samples/sec Loss 5.0723 LearningRate 0.0133 Epoch: 12 Global Step: 212170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:37,954-Speed 9114.45 samples/sec Loss 5.0089 LearningRate 0.0133 Epoch: 12 Global Step: 212180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:39,059-Speed 9275.27 samples/sec Loss 5.1068 LearningRate 0.0133 Epoch: 12 Global Step: 212190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:40,166-Speed 9251.35 samples/sec Loss 4.9822 LearningRate 0.0133 Epoch: 12 Global Step: 212200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:41,315-Speed 8923.31 samples/sec Loss 5.0943 LearningRate 0.0133 Epoch: 12 Global Step: 212210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:42,400-Speed 9438.25 samples/sec Loss 5.0897 LearningRate 0.0133 Epoch: 12 Global Step: 212220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:43,514-Speed 9197.23 samples/sec Loss 5.0810 LearningRate 0.0133 Epoch: 12 Global Step: 212230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:44,630-Speed 9184.39 samples/sec Loss 5.0735 LearningRate 0.0133 Epoch: 12 Global Step: 212240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:45,738-Speed 9247.14 samples/sec Loss 5.0521 LearningRate 0.0133 Epoch: 12 Global Step: 212250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:46,843-Speed 9277.41 samples/sec Loss 4.9621 LearningRate 0.0133 Epoch: 12 Global Step: 212260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:47,911-Speed 9596.24 samples/sec Loss 5.0449 LearningRate 0.0133 Epoch: 12 Global Step: 212270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:48,967-Speed 9704.25 samples/sec Loss 4.9972 LearningRate 0.0133 Epoch: 12 Global Step: 212280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:50,045-Speed 9501.80 samples/sec Loss 4.9712 LearningRate 0.0133 Epoch: 12 Global Step: 212290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:51,126-Speed 9476.91 samples/sec Loss 4.9907 LearningRate 0.0133 Epoch: 12 Global Step: 212300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:52,167-Speed 9840.59 samples/sec Loss 5.0687 LearningRate 0.0132 Epoch: 12 Global Step: 212310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:53,249-Speed 9475.40 samples/sec Loss 5.0729 LearningRate 0.0132 Epoch: 12 Global Step: 212320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:54,348-Speed 9323.44 samples/sec Loss 5.0131 LearningRate 0.0132 Epoch: 12 Global Step: 212330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:55,452-Speed 9288.45 samples/sec Loss 5.0927 LearningRate 0.0132 Epoch: 12 Global Step: 212340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:56,599-Speed 8932.35 samples/sec Loss 5.0218 LearningRate 0.0132 Epoch: 12 Global Step: 212350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:06:57,694-Speed 9363.53 samples/sec Loss 5.0417 LearningRate 0.0132 Epoch: 12 Global Step: 212360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:58,765-Speed 9566.00 samples/sec Loss 5.0693 LearningRate 0.0132 Epoch: 12 Global Step: 212370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:06:59,834-Speed 9577.38 samples/sec Loss 5.0292 LearningRate 0.0132 Epoch: 12 Global Step: 212380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:00,905-Speed 9568.32 samples/sec Loss 5.0478 LearningRate 0.0132 Epoch: 12 Global Step: 212390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:01,983-Speed 9503.17 samples/sec Loss 5.1294 LearningRate 0.0132 Epoch: 12 Global Step: 212400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:03,137-Speed 8881.94 samples/sec Loss 5.0759 LearningRate 0.0132 Epoch: 12 Global Step: 212410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:04,249-Speed 9214.80 samples/sec Loss 4.9399 LearningRate 0.0132 Epoch: 12 Global Step: 212420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:05,326-Speed 9516.66 samples/sec Loss 5.1002 LearningRate 0.0132 Epoch: 12 Global Step: 212430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:06,390-Speed 9639.71 samples/sec Loss 5.0356 LearningRate 0.0132 Epoch: 12 Global Step: 212440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:07,506-Speed 9180.35 samples/sec Loss 5.0779 LearningRate 0.0132 Epoch: 12 Global Step: 212450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:08,586-Speed 9487.99 samples/sec Loss 5.0154 LearningRate 0.0132 Epoch: 12 Global Step: 212460 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:07:09,639-Speed 9727.13 samples/sec Loss 5.0641 LearningRate 0.0132 Epoch: 12 Global Step: 212470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:10,747-Speed 9249.98 samples/sec Loss 5.0491 LearningRate 0.0132 Epoch: 12 Global Step: 212480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:11,873-Speed 9097.44 samples/sec Loss 5.0391 LearningRate 0.0132 Epoch: 12 Global Step: 212490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:12,954-Speed 9478.84 samples/sec Loss 5.0512 LearningRate 0.0132 Epoch: 12 Global Step: 212500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:14,074-Speed 9151.34 samples/sec Loss 5.0255 LearningRate 0.0132 Epoch: 12 Global Step: 212510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:15,160-Speed 9435.38 samples/sec Loss 5.0954 LearningRate 0.0132 Epoch: 12 Global Step: 212520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:16,251-Speed 9388.03 samples/sec Loss 5.0164 LearningRate 0.0132 Epoch: 12 Global Step: 212530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:17,372-Speed 9139.20 samples/sec Loss 5.0707 LearningRate 0.0132 Epoch: 12 Global Step: 212540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:18,440-Speed 9599.39 samples/sec Loss 5.1742 LearningRate 0.0132 Epoch: 12 Global Step: 212550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:19,515-Speed 9529.99 samples/sec Loss 5.0610 LearningRate 0.0132 Epoch: 12 Global Step: 212560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:20,558-Speed 9821.03 samples/sec Loss 5.1344 LearningRate 0.0132 Epoch: 12 Global Step: 212570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:21,600-Speed 9837.69 samples/sec Loss 5.0646 LearningRate 0.0132 Epoch: 12 Global Step: 212580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:22,666-Speed 9607.24 samples/sec Loss 5.0762 LearningRate 0.0132 Epoch: 12 Global Step: 212590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:23,780-Speed 9198.07 samples/sec Loss 5.0925 LearningRate 0.0132 Epoch: 12 Global Step: 212600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:24,848-Speed 9605.43 samples/sec Loss 5.0434 LearningRate 0.0132 Epoch: 12 Global Step: 212610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:07:25,913-Speed 9621.70 samples/sec Loss 4.9966 LearningRate 0.0132 Epoch: 12 Global Step: 212620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:07:26,979-Speed 9606.83 samples/sec Loss 4.9697 LearningRate 0.0132 Epoch: 12 Global Step: 212630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:07:28,095-Speed 9185.70 samples/sec Loss 4.9188 LearningRate 0.0132 Epoch: 12 Global Step: 212640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:07:29,262-Speed 8777.67 samples/sec Loss 5.0349 LearningRate 0.0132 Epoch: 12 Global Step: 212650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:07:30,349-Speed 9430.71 samples/sec Loss 4.9637 LearningRate 0.0132 Epoch: 12 Global Step: 212660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:07:31,420-Speed 9563.63 samples/sec Loss 5.0517 LearningRate 0.0132 Epoch: 12 Global Step: 212670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:07:32,472-Speed 9741.68 samples/sec Loss 5.0291 LearningRate 0.0132 Epoch: 12 Global Step: 212680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:07:33,608-Speed 9022.30 samples/sec Loss 5.0345 LearningRate 0.0132 Epoch: 12 Global Step: 212690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:07:34,722-Speed 9191.63 samples/sec Loss 5.0104 LearningRate 0.0132 Epoch: 12 Global Step: 212700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:07:35,819-Speed 9345.29 samples/sec Loss 5.0979 LearningRate 0.0132 Epoch: 12 Global Step: 212710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:36,887-Speed 9594.82 samples/sec Loss 5.0917 LearningRate 0.0132 Epoch: 12 Global Step: 212720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:37,937-Speed 9753.28 samples/sec Loss 5.0851 LearningRate 0.0132 Epoch: 12 Global Step: 212730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:39,050-Speed 9206.44 samples/sec Loss 5.1089 LearningRate 0.0132 Epoch: 12 Global Step: 212740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:40,111-Speed 9663.98 samples/sec Loss 5.1569 LearningRate 0.0132 Epoch: 12 Global Step: 212750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:41,166-Speed 9707.60 samples/sec Loss 5.1114 LearningRate 0.0132 Epoch: 12 Global Step: 212760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:42,270-Speed 9277.75 samples/sec Loss 5.0493 LearningRate 0.0131 Epoch: 12 Global Step: 212770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:43,374-Speed 9284.20 samples/sec Loss 5.0269 LearningRate 0.0131 Epoch: 12 Global Step: 212780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:44,421-Speed 9783.95 samples/sec Loss 5.0113 LearningRate 0.0131 Epoch: 12 Global Step: 212790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:45,577-Speed 8865.37 samples/sec Loss 5.0845 LearningRate 0.0131 Epoch: 12 Global Step: 212800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:46,638-Speed 9661.12 samples/sec Loss 5.0358 LearningRate 0.0131 Epoch: 12 Global Step: 212810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:47,747-Speed 9243.52 samples/sec Loss 5.0022 LearningRate 0.0131 Epoch: 12 Global Step: 212820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:48,894-Speed 8927.37 samples/sec Loss 5.0008 LearningRate 0.0131 Epoch: 12 Global Step: 212830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:49,978-Speed 9455.44 samples/sec Loss 5.0790 LearningRate 0.0131 Epoch: 12 Global Step: 212840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:51,060-Speed 9473.11 samples/sec Loss 5.0583 LearningRate 0.0131 Epoch: 12 Global Step: 212850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:52,186-Speed 9094.81 samples/sec Loss 5.0640 LearningRate 0.0131 Epoch: 12 Global Step: 212860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:53,297-Speed 9230.77 samples/sec Loss 5.0233 LearningRate 0.0131 Epoch: 12 Global Step: 212870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:54,387-Speed 9402.78 samples/sec Loss 5.0087 LearningRate 0.0131 Epoch: 12 Global Step: 212880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:55,460-Speed 9547.54 samples/sec Loss 5.0302 LearningRate 0.0131 Epoch: 12 Global Step: 212890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:56,576-Speed 9183.56 samples/sec Loss 4.9676 LearningRate 0.0131 Epoch: 12 Global Step: 212900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:57,652-Speed 9524.40 samples/sec Loss 5.0521 LearningRate 0.0131 Epoch: 12 Global Step: 212910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:58,736-Speed 9447.27 samples/sec Loss 5.1754 LearningRate 0.0131 Epoch: 12 Global Step: 212920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:07:59,809-Speed 9552.29 samples/sec Loss 5.0384 LearningRate 0.0131 Epoch: 12 Global Step: 212930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:00,884-Speed 9533.60 samples/sec Loss 5.0427 LearningRate 0.0131 Epoch: 12 Global Step: 212940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:01,955-Speed 9563.88 samples/sec Loss 5.1467 LearningRate 0.0131 Epoch: 12 Global Step: 212950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:03,088-Speed 9044.27 samples/sec Loss 5.0263 LearningRate 0.0131 Epoch: 12 Global Step: 212960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:04,189-Speed 9307.27 samples/sec Loss 5.0638 LearningRate 0.0131 Epoch: 12 Global Step: 212970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:05,251-Speed 9651.74 samples/sec Loss 5.1351 LearningRate 0.0131 Epoch: 12 Global Step: 212980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:06,350-Speed 9322.87 samples/sec Loss 5.0229 LearningRate 0.0131 Epoch: 12 Global Step: 212990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:07,417-Speed 9599.76 samples/sec Loss 5.1103 LearningRate 0.0131 Epoch: 12 Global Step: 213000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:08,513-Speed 9351.44 samples/sec Loss 4.9674 LearningRate 0.0131 Epoch: 12 Global Step: 213010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:09,572-Speed 9674.86 samples/sec Loss 5.0784 LearningRate 0.0131 Epoch: 12 Global Step: 213020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:10,643-Speed 9568.80 samples/sec Loss 5.0938 LearningRate 0.0131 Epoch: 12 Global Step: 213030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:11,733-Speed 9398.62 samples/sec Loss 5.0520 LearningRate 0.0131 Epoch: 12 Global Step: 213040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:12,843-Speed 9237.14 samples/sec Loss 5.0862 LearningRate 0.0131 Epoch: 12 Global Step: 213050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:13,961-Speed 9161.17 samples/sec Loss 4.9996 LearningRate 0.0131 Epoch: 12 Global Step: 213060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:15,057-Speed 9352.02 samples/sec Loss 5.0328 LearningRate 0.0131 Epoch: 12 Global Step: 213070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:16,165-Speed 9250.80 samples/sec Loss 5.1764 LearningRate 0.0131 Epoch: 12 Global Step: 213080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:17,276-Speed 9220.61 samples/sec Loss 5.0403 LearningRate 0.0131 Epoch: 12 Global Step: 213090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:18,361-Speed 9447.46 samples/sec Loss 5.0104 LearningRate 0.0131 Epoch: 12 Global Step: 213100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:19,437-Speed 9525.13 samples/sec Loss 5.0591 LearningRate 0.0131 Epoch: 12 Global Step: 213110 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:08:20,583-Speed 8936.58 samples/sec Loss 4.9975 LearningRate 0.0131 Epoch: 12 Global Step: 213120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:21,691-Speed 9251.93 samples/sec Loss 5.0731 LearningRate 0.0131 Epoch: 12 Global Step: 213130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:22,784-Speed 9373.76 samples/sec Loss 4.9599 LearningRate 0.0131 Epoch: 12 Global Step: 213140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:23,892-Speed 9244.00 samples/sec Loss 5.0204 LearningRate 0.0131 Epoch: 12 Global Step: 213150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:24,974-Speed 9469.08 samples/sec Loss 5.1253 LearningRate 0.0131 Epoch: 12 Global Step: 213160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:26,046-Speed 9561.52 samples/sec Loss 5.0895 LearningRate 0.0131 Epoch: 12 Global Step: 213170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:27,106-Speed 9665.29 samples/sec Loss 4.9714 LearningRate 0.0131 Epoch: 12 Global Step: 213180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:28,219-Speed 9208.16 samples/sec Loss 5.0616 LearningRate 0.0131 Epoch: 12 Global Step: 213190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:29,328-Speed 9242.89 samples/sec Loss 4.9819 LearningRate 0.0131 Epoch: 12 Global Step: 213200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:30,408-Speed 9486.81 samples/sec Loss 4.9815 LearningRate 0.0131 Epoch: 12 Global Step: 213210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:31,516-Speed 9248.22 samples/sec Loss 4.9435 LearningRate 0.0131 Epoch: 12 Global Step: 213220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:32,620-Speed 9277.99 samples/sec Loss 5.0810 LearningRate 0.0130 Epoch: 12 Global Step: 213230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:33,713-Speed 9378.98 samples/sec Loss 5.0487 LearningRate 0.0130 Epoch: 12 Global Step: 213240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:34,790-Speed 9513.31 samples/sec Loss 5.0235 LearningRate 0.0130 Epoch: 12 Global Step: 213250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:35,863-Speed 9543.79 samples/sec Loss 5.0003 LearningRate 0.0130 Epoch: 12 Global Step: 213260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:36,971-Speed 9248.01 samples/sec Loss 4.9381 LearningRate 0.0130 Epoch: 12 Global Step: 213270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:38,061-Speed 9403.84 samples/sec Loss 5.0404 LearningRate 0.0130 Epoch: 12 Global Step: 213280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:39,123-Speed 9647.90 samples/sec Loss 4.9668 LearningRate 0.0130 Epoch: 12 Global Step: 213290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:40,241-Speed 9164.62 samples/sec Loss 4.9579 LearningRate 0.0130 Epoch: 12 Global Step: 213300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:41,419-Speed 8700.32 samples/sec Loss 5.0412 LearningRate 0.0130 Epoch: 12 Global Step: 213310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:42,496-Speed 9519.68 samples/sec Loss 5.0441 LearningRate 0.0130 Epoch: 12 Global Step: 213320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:43,591-Speed 9351.12 samples/sec Loss 5.0627 LearningRate 0.0130 Epoch: 12 Global Step: 213330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:44,751-Speed 8837.68 samples/sec Loss 5.0191 LearningRate 0.0130 Epoch: 12 Global Step: 213340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:45,852-Speed 9302.81 samples/sec Loss 5.0622 LearningRate 0.0130 Epoch: 12 Global Step: 213350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:46,958-Speed 9266.07 samples/sec Loss 4.9043 LearningRate 0.0130 Epoch: 12 Global Step: 213360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:48,046-Speed 9423.15 samples/sec Loss 4.9782 LearningRate 0.0130 Epoch: 12 Global Step: 213370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:49,169-Speed 9128.04 samples/sec Loss 4.9783 LearningRate 0.0130 Epoch: 12 Global Step: 213380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:50,271-Speed 9292.05 samples/sec Loss 5.0127 LearningRate 0.0130 Epoch: 12 Global Step: 213390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:51,369-Speed 9334.41 samples/sec Loss 5.0046 LearningRate 0.0130 Epoch: 12 Global Step: 213400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:52,450-Speed 9475.64 samples/sec Loss 5.0250 LearningRate 0.0130 Epoch: 12 Global Step: 213410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:53,553-Speed 9294.96 samples/sec Loss 4.9885 LearningRate 0.0130 Epoch: 12 Global Step: 213420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:54,636-Speed 9466.81 samples/sec Loss 5.0497 LearningRate 0.0130 Epoch: 12 Global Step: 213430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:08:55,779-Speed 8970.93 samples/sec Loss 5.0658 LearningRate 0.0130 Epoch: 12 Global Step: 213440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:56,855-Speed 9524.58 samples/sec Loss 5.0127 LearningRate 0.0130 Epoch: 12 Global Step: 213450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:57,946-Speed 9387.94 samples/sec Loss 5.0991 LearningRate 0.0130 Epoch: 12 Global Step: 213460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:08:59,029-Speed 9460.48 samples/sec Loss 5.0943 LearningRate 0.0130 Epoch: 12 Global Step: 213470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:00,161-Speed 9058.76 samples/sec Loss 4.9931 LearningRate 0.0130 Epoch: 12 Global Step: 213480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:01,218-Speed 9695.31 samples/sec Loss 5.0277 LearningRate 0.0130 Epoch: 12 Global Step: 213490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:02,334-Speed 9175.14 samples/sec Loss 5.0746 LearningRate 0.0130 Epoch: 12 Global Step: 213500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:03,477-Speed 8963.27 samples/sec Loss 4.9763 LearningRate 0.0130 Epoch: 12 Global Step: 213510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:04,603-Speed 9102.43 samples/sec Loss 4.9335 LearningRate 0.0130 Epoch: 12 Global Step: 213520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:05,695-Speed 9386.31 samples/sec Loss 5.0629 LearningRate 0.0130 Epoch: 12 Global Step: 213530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:06,769-Speed 9544.64 samples/sec Loss 5.0058 LearningRate 0.0130 Epoch: 12 Global Step: 213540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:07,868-Speed 9330.03 samples/sec Loss 4.9637 LearningRate 0.0130 Epoch: 12 Global Step: 213550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:08,985-Speed 9165.57 samples/sec Loss 5.0459 LearningRate 0.0130 Epoch: 12 Global Step: 213560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:10,075-Speed 9400.19 samples/sec Loss 5.0245 LearningRate 0.0130 Epoch: 12 Global Step: 213570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:11,193-Speed 9168.62 samples/sec Loss 5.1206 LearningRate 0.0130 Epoch: 12 Global Step: 213580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:12,281-Speed 9417.40 samples/sec Loss 5.0685 LearningRate 0.0130 Epoch: 12 Global Step: 213590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:13,353-Speed 9553.29 samples/sec Loss 5.0995 LearningRate 0.0130 Epoch: 12 Global Step: 213600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:14,475-Speed 9131.27 samples/sec Loss 4.9158 LearningRate 0.0130 Epoch: 12 Global Step: 213610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:15,604-Speed 9077.34 samples/sec Loss 4.9873 LearningRate 0.0130 Epoch: 12 Global Step: 213620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:16,691-Speed 9430.89 samples/sec Loss 5.0581 LearningRate 0.0130 Epoch: 12 Global Step: 213630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:17,786-Speed 9363.30 samples/sec Loss 5.0579 LearningRate 0.0130 Epoch: 12 Global Step: 213640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:18,886-Speed 9309.41 samples/sec Loss 4.9945 LearningRate 0.0130 Epoch: 12 Global Step: 213650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:20,002-Speed 9186.57 samples/sec Loss 4.9744 LearningRate 0.0130 Epoch: 12 Global Step: 213660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:21,111-Speed 9237.99 samples/sec Loss 4.9976 LearningRate 0.0130 Epoch: 12 Global Step: 213670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:22,225-Speed 9199.23 samples/sec Loss 4.9964 LearningRate 0.0130 Epoch: 12 Global Step: 213680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:23,350-Speed 9108.15 samples/sec Loss 5.1092 LearningRate 0.0130 Epoch: 12 Global Step: 213690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:24,401-Speed 9751.23 samples/sec Loss 5.0605 LearningRate 0.0129 Epoch: 12 Global Step: 213700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:25,462-Speed 9654.86 samples/sec Loss 5.1158 LearningRate 0.0129 Epoch: 12 Global Step: 213710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:26,554-Speed 9382.69 samples/sec Loss 4.9641 LearningRate 0.0129 Epoch: 12 Global Step: 213720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:27,679-Speed 9115.07 samples/sec Loss 4.9805 LearningRate 0.0129 Epoch: 12 Global Step: 213730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:28,770-Speed 9390.77 samples/sec Loss 4.9309 LearningRate 0.0129 Epoch: 12 Global Step: 213740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:29,839-Speed 9582.54 samples/sec Loss 5.0396 LearningRate 0.0129 Epoch: 12 Global Step: 213750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:30,942-Speed 9287.03 samples/sec Loss 4.9335 LearningRate 0.0129 Epoch: 12 Global Step: 213760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:32,048-Speed 9269.11 samples/sec Loss 5.0575 LearningRate 0.0129 Epoch: 12 Global Step: 213770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:33,175-Speed 9094.40 samples/sec Loss 5.0395 LearningRate 0.0129 Epoch: 12 Global Step: 213780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:34,239-Speed 9622.40 samples/sec Loss 5.1232 LearningRate 0.0129 Epoch: 12 Global Step: 213790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:35,328-Speed 9411.67 samples/sec Loss 4.9922 LearningRate 0.0129 Epoch: 12 Global Step: 213800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:36,426-Speed 9331.64 samples/sec Loss 5.0557 LearningRate 0.0129 Epoch: 12 Global Step: 213810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:37,499-Speed 9551.31 samples/sec Loss 5.0949 LearningRate 0.0129 Epoch: 12 Global Step: 213820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:38,598-Speed 9324.48 samples/sec Loss 5.0174 LearningRate 0.0129 Epoch: 12 Global Step: 213830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:39,691-Speed 9369.21 samples/sec Loss 5.0619 LearningRate 0.0129 Epoch: 12 Global Step: 213840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:40,787-Speed 9351.71 samples/sec Loss 5.0175 LearningRate 0.0129 Epoch: 12 Global Step: 213850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:41,882-Speed 9360.33 samples/sec Loss 5.0151 LearningRate 0.0129 Epoch: 12 Global Step: 213860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:42,951-Speed 9579.99 samples/sec Loss 5.1169 LearningRate 0.0129 Epoch: 12 Global Step: 213870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:44,008-Speed 9697.91 samples/sec Loss 5.0653 LearningRate 0.0129 Epoch: 12 Global Step: 213880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:45,086-Speed 9503.57 samples/sec Loss 5.0454 LearningRate 0.0129 Epoch: 12 Global Step: 213890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:46,221-Speed 9023.35 samples/sec Loss 5.0214 LearningRate 0.0129 Epoch: 12 Global Step: 213900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:47,322-Speed 9310.12 samples/sec Loss 4.9761 LearningRate 0.0129 Epoch: 12 Global Step: 213910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:48,360-Speed 9880.88 samples/sec Loss 5.0333 LearningRate 0.0129 Epoch: 12 Global Step: 213920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:49,459-Speed 9316.67 samples/sec Loss 5.0069 LearningRate 0.0129 Epoch: 12 Global Step: 213930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:50,567-Speed 9250.28 samples/sec Loss 5.0171 LearningRate 0.0129 Epoch: 12 Global Step: 213940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:09:51,670-Speed 9290.66 samples/sec Loss 4.9902 LearningRate 0.0129 Epoch: 12 Global Step: 213950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:52,755-Speed 9439.00 samples/sec Loss 5.0903 LearningRate 0.0129 Epoch: 12 Global Step: 213960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:53,820-Speed 9622.76 samples/sec Loss 5.0650 LearningRate 0.0129 Epoch: 12 Global Step: 213970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:54,924-Speed 9279.35 samples/sec Loss 4.9773 LearningRate 0.0129 Epoch: 12 Global Step: 213980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:56,003-Speed 9508.33 samples/sec Loss 5.1213 LearningRate 0.0129 Epoch: 12 Global Step: 213990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:09:57,052-Speed 9766.28 samples/sec Loss 5.0268 LearningRate 0.0129 Epoch: 12 Global Step: 214000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:10:18,859-[lfw][214000]XNorm: 8.206974 Training: 2022-04-11 20:10:18,859-[lfw][214000]Accuracy-Flip: 0.99583+-0.00291 Training: 2022-04-11 20:10:18,860-[lfw][214000]Accuracy-Highest: 0.99683 Training: 2022-04-11 20:10:44,202-[cfp_fp][214000]XNorm: 7.060784 Training: 2022-04-11 20:10:44,203-[cfp_fp][214000]Accuracy-Flip: 0.96557+-0.00925 Training: 2022-04-11 20:10:44,203-[cfp_fp][214000]Accuracy-Highest: 0.96771 Training: 2022-04-11 20:11:06,175-[agedb_30][214000]XNorm: 7.938451 Training: 2022-04-11 20:11:06,176-[agedb_30][214000]Accuracy-Flip: 0.96833+-0.01220 Training: 2022-04-11 20:11:06,176-[agedb_30][214000]Accuracy-Highest: 0.96983 Training: 2022-04-11 20:11:07,288-Speed 145.80 samples/sec Loss 4.9959 LearningRate 0.0129 Epoch: 12 Global Step: 214010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:08,365-Speed 9516.93 samples/sec Loss 5.0938 LearningRate 0.0129 Epoch: 12 Global Step: 214020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:09,429-Speed 9630.96 samples/sec Loss 5.1041 LearningRate 0.0129 Epoch: 12 Global Step: 214030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:10,476-Speed 9786.23 samples/sec Loss 5.0920 LearningRate 0.0129 Epoch: 12 Global Step: 214040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:11,578-Speed 9298.52 samples/sec Loss 5.0656 LearningRate 0.0129 Epoch: 12 Global Step: 214050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:12,698-Speed 9144.65 samples/sec Loss 5.0706 LearningRate 0.0129 Epoch: 12 Global Step: 214060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:13,792-Speed 9373.56 samples/sec Loss 5.0554 LearningRate 0.0129 Epoch: 12 Global Step: 214070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:14,847-Speed 9706.97 samples/sec Loss 4.9393 LearningRate 0.0129 Epoch: 12 Global Step: 214080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:15,947-Speed 9318.21 samples/sec Loss 5.0140 LearningRate 0.0129 Epoch: 12 Global Step: 214090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:17,032-Speed 9445.60 samples/sec Loss 5.0600 LearningRate 0.0129 Epoch: 12 Global Step: 214100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:18,111-Speed 9491.20 samples/sec Loss 5.1123 LearningRate 0.0129 Epoch: 12 Global Step: 214110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:19,211-Speed 9316.67 samples/sec Loss 4.9674 LearningRate 0.0129 Epoch: 12 Global Step: 214120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:20,329-Speed 9160.04 samples/sec Loss 5.0145 LearningRate 0.0129 Epoch: 12 Global Step: 214130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:21,397-Speed 9598.24 samples/sec Loss 5.0580 LearningRate 0.0129 Epoch: 12 Global Step: 214140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:22,492-Speed 9358.46 samples/sec Loss 5.0308 LearningRate 0.0129 Epoch: 12 Global Step: 214150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:23,568-Speed 9521.01 samples/sec Loss 5.0064 LearningRate 0.0128 Epoch: 12 Global Step: 214160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:24,638-Speed 9571.97 samples/sec Loss 5.1390 LearningRate 0.0128 Epoch: 12 Global Step: 214170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:25,678-Speed 9859.54 samples/sec Loss 5.1068 LearningRate 0.0128 Epoch: 12 Global Step: 214180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:26,749-Speed 9560.36 samples/sec Loss 4.9981 LearningRate 0.0128 Epoch: 12 Global Step: 214190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:27,810-Speed 9655.98 samples/sec Loss 4.9449 LearningRate 0.0128 Epoch: 12 Global Step: 214200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:28,879-Speed 9587.13 samples/sec Loss 5.0644 LearningRate 0.0128 Epoch: 12 Global Step: 214210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:29,978-Speed 9326.34 samples/sec Loss 5.1108 LearningRate 0.0128 Epoch: 12 Global Step: 214220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:31,020-Speed 9836.78 samples/sec Loss 5.0982 LearningRate 0.0128 Epoch: 12 Global Step: 214230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:32,102-Speed 9467.89 samples/sec Loss 5.0033 LearningRate 0.0128 Epoch: 12 Global Step: 214240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:33,188-Speed 9434.87 samples/sec Loss 4.9264 LearningRate 0.0128 Epoch: 12 Global Step: 214250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:34,259-Speed 9564.50 samples/sec Loss 5.0109 LearningRate 0.0128 Epoch: 12 Global Step: 214260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:35,311-Speed 9742.74 samples/sec Loss 4.9788 LearningRate 0.0128 Epoch: 12 Global Step: 214270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:36,399-Speed 9413.91 samples/sec Loss 5.0561 LearningRate 0.0128 Epoch: 12 Global Step: 214280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:37,487-Speed 9419.47 samples/sec Loss 5.0159 LearningRate 0.0128 Epoch: 12 Global Step: 214290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:38,606-Speed 9159.73 samples/sec Loss 5.0085 LearningRate 0.0128 Epoch: 12 Global Step: 214300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:39,690-Speed 9448.08 samples/sec Loss 4.9966 LearningRate 0.0128 Epoch: 12 Global Step: 214310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:40,802-Speed 9212.31 samples/sec Loss 5.0413 LearningRate 0.0128 Epoch: 12 Global Step: 214320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:41,889-Speed 9425.53 samples/sec Loss 5.0418 LearningRate 0.0128 Epoch: 12 Global Step: 214330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:42,954-Speed 9623.77 samples/sec Loss 5.1465 LearningRate 0.0128 Epoch: 12 Global Step: 214340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:44,042-Speed 9417.84 samples/sec Loss 4.9785 LearningRate 0.0128 Epoch: 12 Global Step: 214350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:45,124-Speed 9466.64 samples/sec Loss 4.9623 LearningRate 0.0128 Epoch: 12 Global Step: 214360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:46,223-Speed 9325.19 samples/sec Loss 5.0005 LearningRate 0.0128 Epoch: 12 Global Step: 214370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:47,277-Speed 9723.49 samples/sec Loss 5.0249 LearningRate 0.0128 Epoch: 12 Global Step: 214380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:48,407-Speed 9064.87 samples/sec Loss 5.0483 LearningRate 0.0128 Epoch: 12 Global Step: 214390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:49,481-Speed 9536.20 samples/sec Loss 5.0455 LearningRate 0.0128 Epoch: 12 Global Step: 214400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:11:50,545-Speed 9636.33 samples/sec Loss 4.9756 LearningRate 0.0128 Epoch: 12 Global Step: 214410 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:11:51,638-Speed 9368.23 samples/sec Loss 4.9738 LearningRate 0.0128 Epoch: 12 Global Step: 214420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:52,700-Speed 9653.02 samples/sec Loss 5.1161 LearningRate 0.0128 Epoch: 12 Global Step: 214430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:53,775-Speed 9530.68 samples/sec Loss 5.0884 LearningRate 0.0128 Epoch: 12 Global Step: 214440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:54,867-Speed 9382.12 samples/sec Loss 5.0754 LearningRate 0.0128 Epoch: 12 Global Step: 214450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:55,978-Speed 9225.58 samples/sec Loss 5.0536 LearningRate 0.0128 Epoch: 12 Global Step: 214460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:57,060-Speed 9467.95 samples/sec Loss 5.0808 LearningRate 0.0128 Epoch: 12 Global Step: 214470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:58,140-Speed 9487.26 samples/sec Loss 5.0174 LearningRate 0.0128 Epoch: 12 Global Step: 214480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:11:59,200-Speed 9663.21 samples/sec Loss 5.0308 LearningRate 0.0128 Epoch: 12 Global Step: 214490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:12:00,268-Speed 9597.68 samples/sec Loss 5.0614 LearningRate 0.0128 Epoch: 12 Global Step: 214500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:12:01,344-Speed 9521.06 samples/sec Loss 5.0079 LearningRate 0.0128 Epoch: 12 Global Step: 214510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:12:02,449-Speed 9270.10 samples/sec Loss 5.0367 LearningRate 0.0128 Epoch: 12 Global Step: 214520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:03,539-Speed 9400.48 samples/sec Loss 4.9296 LearningRate 0.0128 Epoch: 12 Global Step: 214530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:04,613-Speed 9540.89 samples/sec Loss 5.0221 LearningRate 0.0128 Epoch: 12 Global Step: 214540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:05,712-Speed 9324.84 samples/sec Loss 5.0184 LearningRate 0.0128 Epoch: 12 Global Step: 214550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:06,804-Speed 9386.01 samples/sec Loss 5.0428 LearningRate 0.0128 Epoch: 12 Global Step: 214560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:07,928-Speed 9107.98 samples/sec Loss 4.9959 LearningRate 0.0128 Epoch: 12 Global Step: 214570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:09,055-Speed 9099.74 samples/sec Loss 4.9892 LearningRate 0.0128 Epoch: 12 Global Step: 214580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:10,137-Speed 9461.42 samples/sec Loss 4.9733 LearningRate 0.0128 Epoch: 12 Global Step: 214590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:11,235-Speed 9334.51 samples/sec Loss 5.0622 LearningRate 0.0128 Epoch: 12 Global Step: 214600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:12,338-Speed 9295.79 samples/sec Loss 4.9891 LearningRate 0.0128 Epoch: 12 Global Step: 214610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:13,484-Speed 8939.52 samples/sec Loss 5.1243 LearningRate 0.0128 Epoch: 12 Global Step: 214620 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:12:14,581-Speed 9337.56 samples/sec Loss 5.1394 LearningRate 0.0127 Epoch: 12 Global Step: 214630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:15,675-Speed 9368.80 samples/sec Loss 5.0761 LearningRate 0.0127 Epoch: 12 Global Step: 214640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:16,771-Speed 9347.46 samples/sec Loss 5.0317 LearningRate 0.0127 Epoch: 12 Global Step: 214650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:17,862-Speed 9392.76 samples/sec Loss 5.0332 LearningRate 0.0127 Epoch: 12 Global Step: 214660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:18,990-Speed 9078.23 samples/sec Loss 4.9744 LearningRate 0.0127 Epoch: 12 Global Step: 214670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:20,091-Speed 9311.39 samples/sec Loss 5.0263 LearningRate 0.0127 Epoch: 12 Global Step: 214680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:21,148-Speed 9686.53 samples/sec Loss 5.0278 LearningRate 0.0127 Epoch: 12 Global Step: 214690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:22,238-Speed 9399.63 samples/sec Loss 5.0334 LearningRate 0.0127 Epoch: 12 Global Step: 214700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:23,383-Speed 8955.39 samples/sec Loss 5.0405 LearningRate 0.0127 Epoch: 12 Global Step: 214710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:24,452-Speed 9582.24 samples/sec Loss 4.9840 LearningRate 0.0127 Epoch: 12 Global Step: 214720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:25,535-Speed 9457.08 samples/sec Loss 5.0842 LearningRate 0.0127 Epoch: 12 Global Step: 214730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:26,609-Speed 9545.81 samples/sec Loss 5.0788 LearningRate 0.0127 Epoch: 12 Global Step: 214740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:27,666-Speed 9697.49 samples/sec Loss 5.0418 LearningRate 0.0127 Epoch: 12 Global Step: 214750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:28,782-Speed 9178.54 samples/sec Loss 5.0320 LearningRate 0.0127 Epoch: 12 Global Step: 214760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:29,836-Speed 9720.44 samples/sec Loss 5.0337 LearningRate 0.0127 Epoch: 12 Global Step: 214770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:30,929-Speed 9382.87 samples/sec Loss 5.0006 LearningRate 0.0127 Epoch: 12 Global Step: 214780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:32,052-Speed 9124.95 samples/sec Loss 5.1030 LearningRate 0.0127 Epoch: 12 Global Step: 214790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:33,169-Speed 9173.12 samples/sec Loss 5.1420 LearningRate 0.0127 Epoch: 12 Global Step: 214800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:34,274-Speed 9268.44 samples/sec Loss 4.9614 LearningRate 0.0127 Epoch: 12 Global Step: 214810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:35,365-Speed 9388.19 samples/sec Loss 5.0938 LearningRate 0.0127 Epoch: 12 Global Step: 214820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:36,431-Speed 9610.64 samples/sec Loss 5.0715 LearningRate 0.0127 Epoch: 12 Global Step: 214830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:37,528-Speed 9346.52 samples/sec Loss 4.9815 LearningRate 0.0127 Epoch: 12 Global Step: 214840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:38,585-Speed 9691.26 samples/sec Loss 5.0454 LearningRate 0.0127 Epoch: 12 Global Step: 214850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:39,660-Speed 9529.11 samples/sec Loss 5.0603 LearningRate 0.0127 Epoch: 12 Global Step: 214860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:40,713-Speed 9733.77 samples/sec Loss 4.9915 LearningRate 0.0127 Epoch: 12 Global Step: 214870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:41,780-Speed 9595.16 samples/sec Loss 5.1134 LearningRate 0.0127 Epoch: 12 Global Step: 214880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:42,885-Speed 9282.10 samples/sec Loss 5.0559 LearningRate 0.0127 Epoch: 12 Global Step: 214890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:43,988-Speed 9281.52 samples/sec Loss 5.0321 LearningRate 0.0127 Epoch: 12 Global Step: 214900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:45,067-Speed 9502.67 samples/sec Loss 5.1272 LearningRate 0.0127 Epoch: 12 Global Step: 214910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:46,176-Speed 9236.46 samples/sec Loss 5.1341 LearningRate 0.0127 Epoch: 12 Global Step: 214920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:47,242-Speed 9606.97 samples/sec Loss 4.9524 LearningRate 0.0127 Epoch: 12 Global Step: 214930 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:12:48,343-Speed 9308.10 samples/sec Loss 4.9945 LearningRate 0.0127 Epoch: 12 Global Step: 214940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:49,400-Speed 9692.16 samples/sec Loss 5.0707 LearningRate 0.0127 Epoch: 12 Global Step: 214950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:50,462-Speed 9652.17 samples/sec Loss 4.9526 LearningRate 0.0127 Epoch: 12 Global Step: 214960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:51,530-Speed 9588.55 samples/sec Loss 4.9967 LearningRate 0.0127 Epoch: 12 Global Step: 214970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:52,621-Speed 9391.16 samples/sec Loss 4.9587 LearningRate 0.0127 Epoch: 12 Global Step: 214980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:53,703-Speed 9473.48 samples/sec Loss 5.0156 LearningRate 0.0127 Epoch: 12 Global Step: 214990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:54,781-Speed 9503.51 samples/sec Loss 4.9643 LearningRate 0.0127 Epoch: 12 Global Step: 215000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:55,857-Speed 9520.85 samples/sec Loss 5.0033 LearningRate 0.0127 Epoch: 12 Global Step: 215010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:56,971-Speed 9203.69 samples/sec Loss 5.0010 LearningRate 0.0127 Epoch: 12 Global Step: 215020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:58,043-Speed 9550.04 samples/sec Loss 5.0704 LearningRate 0.0127 Epoch: 12 Global Step: 215030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:12:59,080-Speed 9881.16 samples/sec Loss 5.0011 LearningRate 0.0127 Epoch: 12 Global Step: 215040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:00,177-Speed 9346.48 samples/sec Loss 4.9636 LearningRate 0.0127 Epoch: 12 Global Step: 215050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:01,285-Speed 9244.94 samples/sec Loss 5.0145 LearningRate 0.0127 Epoch: 12 Global Step: 215060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:02,337-Speed 9742.04 samples/sec Loss 5.0338 LearningRate 0.0127 Epoch: 12 Global Step: 215070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:03,423-Speed 9435.85 samples/sec Loss 5.0150 LearningRate 0.0127 Epoch: 12 Global Step: 215080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:04,533-Speed 9224.73 samples/sec Loss 5.0752 LearningRate 0.0127 Epoch: 12 Global Step: 215090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:05,658-Speed 9109.37 samples/sec Loss 5.0369 LearningRate 0.0126 Epoch: 12 Global Step: 215100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:06,798-Speed 8983.41 samples/sec Loss 4.9985 LearningRate 0.0126 Epoch: 12 Global Step: 215110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:07,878-Speed 9489.90 samples/sec Loss 5.0510 LearningRate 0.0126 Epoch: 12 Global Step: 215120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:08,972-Speed 9368.63 samples/sec Loss 5.0673 LearningRate 0.0126 Epoch: 12 Global Step: 215130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:10,047-Speed 9535.31 samples/sec Loss 4.9792 LearningRate 0.0126 Epoch: 12 Global Step: 215140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:11,148-Speed 9302.20 samples/sec Loss 5.0480 LearningRate 0.0126 Epoch: 12 Global Step: 215150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:12,196-Speed 9781.10 samples/sec Loss 4.9509 LearningRate 0.0126 Epoch: 12 Global Step: 215160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:13,274-Speed 9502.99 samples/sec Loss 5.0513 LearningRate 0.0126 Epoch: 12 Global Step: 215170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:14,394-Speed 9149.80 samples/sec Loss 5.0365 LearningRate 0.0126 Epoch: 12 Global Step: 215180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:15,415-Speed 10030.43 samples/sec Loss 4.9866 LearningRate 0.0126 Epoch: 12 Global Step: 215190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:16,484-Speed 9582.96 samples/sec Loss 4.9534 LearningRate 0.0126 Epoch: 12 Global Step: 215200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:17,606-Speed 9135.50 samples/sec Loss 4.9451 LearningRate 0.0126 Epoch: 12 Global Step: 215210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:18,705-Speed 9318.35 samples/sec Loss 5.0826 LearningRate 0.0126 Epoch: 12 Global Step: 215220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:19,785-Speed 9489.73 samples/sec Loss 4.9820 LearningRate 0.0126 Epoch: 12 Global Step: 215230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:20,900-Speed 9194.83 samples/sec Loss 5.0475 LearningRate 0.0126 Epoch: 12 Global Step: 215240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:22,018-Speed 9160.37 samples/sec Loss 5.0844 LearningRate 0.0126 Epoch: 12 Global Step: 215250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:23,133-Speed 9189.17 samples/sec Loss 4.9548 LearningRate 0.0126 Epoch: 12 Global Step: 215260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:24,215-Speed 9476.58 samples/sec Loss 5.0052 LearningRate 0.0126 Epoch: 12 Global Step: 215270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:25,294-Speed 9489.42 samples/sec Loss 5.0446 LearningRate 0.0126 Epoch: 12 Global Step: 215280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:26,394-Speed 9319.13 samples/sec Loss 5.0484 LearningRate 0.0126 Epoch: 12 Global Step: 215290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:27,524-Speed 9065.51 samples/sec Loss 5.0860 LearningRate 0.0126 Epoch: 12 Global Step: 215300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:28,592-Speed 9597.88 samples/sec Loss 4.9807 LearningRate 0.0126 Epoch: 12 Global Step: 215310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:29,732-Speed 8990.75 samples/sec Loss 5.0442 LearningRate 0.0126 Epoch: 12 Global Step: 215320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:30,839-Speed 9254.56 samples/sec Loss 5.0359 LearningRate 0.0126 Epoch: 12 Global Step: 215330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:31,929-Speed 9402.68 samples/sec Loss 4.9727 LearningRate 0.0126 Epoch: 12 Global Step: 215340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:33,011-Speed 9470.79 samples/sec Loss 5.0065 LearningRate 0.0126 Epoch: 12 Global Step: 215350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:34,133-Speed 9126.15 samples/sec Loss 4.9685 LearningRate 0.0126 Epoch: 12 Global Step: 215360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:35,233-Speed 9317.56 samples/sec Loss 5.0653 LearningRate 0.0126 Epoch: 12 Global Step: 215370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:36,314-Speed 9474.01 samples/sec Loss 5.0292 LearningRate 0.0126 Epoch: 12 Global Step: 215380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:37,390-Speed 9520.19 samples/sec Loss 5.0440 LearningRate 0.0126 Epoch: 12 Global Step: 215390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:38,482-Speed 9384.35 samples/sec Loss 4.9982 LearningRate 0.0126 Epoch: 12 Global Step: 215400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:39,565-Speed 9461.24 samples/sec Loss 5.0279 LearningRate 0.0126 Epoch: 12 Global Step: 215410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:40,651-Speed 9431.91 samples/sec Loss 4.9838 LearningRate 0.0126 Epoch: 12 Global Step: 215420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:41,728-Speed 9516.07 samples/sec Loss 5.0013 LearningRate 0.0126 Epoch: 12 Global Step: 215430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:42,841-Speed 9200.98 samples/sec Loss 4.9764 LearningRate 0.0126 Epoch: 12 Global Step: 215440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:43,962-Speed 9151.49 samples/sec Loss 5.1233 LearningRate 0.0126 Epoch: 12 Global Step: 215450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:45,036-Speed 9531.61 samples/sec Loss 4.9767 LearningRate 0.0126 Epoch: 12 Global Step: 215460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:46,088-Speed 9747.68 samples/sec Loss 5.0154 LearningRate 0.0126 Epoch: 12 Global Step: 215470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:47,197-Speed 9239.40 samples/sec Loss 4.9670 LearningRate 0.0126 Epoch: 12 Global Step: 215480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:48,263-Speed 9611.78 samples/sec Loss 4.9948 LearningRate 0.0126 Epoch: 12 Global Step: 215490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:49,315-Speed 9737.38 samples/sec Loss 5.0772 LearningRate 0.0126 Epoch: 12 Global Step: 215500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:50,380-Speed 9624.16 samples/sec Loss 5.0962 LearningRate 0.0126 Epoch: 12 Global Step: 215510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:51,454-Speed 9536.22 samples/sec Loss 5.0230 LearningRate 0.0126 Epoch: 12 Global Step: 215520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:52,563-Speed 9236.24 samples/sec Loss 5.0620 LearningRate 0.0126 Epoch: 12 Global Step: 215530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:53,657-Speed 9369.48 samples/sec Loss 5.1623 LearningRate 0.0126 Epoch: 12 Global Step: 215540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:54,740-Speed 9465.61 samples/sec Loss 5.0248 LearningRate 0.0126 Epoch: 12 Global Step: 215550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:13:55,831-Speed 9385.40 samples/sec Loss 5.1157 LearningRate 0.0126 Epoch: 12 Global Step: 215560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:56,953-Speed 9129.60 samples/sec Loss 5.0863 LearningRate 0.0125 Epoch: 12 Global Step: 215570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:58,038-Speed 9442.07 samples/sec Loss 5.0550 LearningRate 0.0125 Epoch: 12 Global Step: 215580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:13:59,165-Speed 9094.60 samples/sec Loss 5.0369 LearningRate 0.0125 Epoch: 12 Global Step: 215590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:00,275-Speed 9235.78 samples/sec Loss 5.0430 LearningRate 0.0125 Epoch: 12 Global Step: 215600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:01,362-Speed 9426.21 samples/sec Loss 4.9974 LearningRate 0.0125 Epoch: 12 Global Step: 215610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:02,473-Speed 9215.86 samples/sec Loss 4.9341 LearningRate 0.0125 Epoch: 12 Global Step: 215620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:03,586-Speed 9214.96 samples/sec Loss 4.9420 LearningRate 0.0125 Epoch: 12 Global Step: 215630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:04,686-Speed 9316.15 samples/sec Loss 5.0858 LearningRate 0.0125 Epoch: 12 Global Step: 215640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:05,771-Speed 9437.62 samples/sec Loss 5.0352 LearningRate 0.0125 Epoch: 12 Global Step: 215650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:06,865-Speed 9372.66 samples/sec Loss 5.1396 LearningRate 0.0125 Epoch: 12 Global Step: 215660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:07,961-Speed 9345.81 samples/sec Loss 4.9519 LearningRate 0.0125 Epoch: 12 Global Step: 215670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:09,022-Speed 9650.92 samples/sec Loss 5.0059 LearningRate 0.0125 Epoch: 12 Global Step: 215680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:10,134-Speed 9213.69 samples/sec Loss 5.0800 LearningRate 0.0125 Epoch: 12 Global Step: 215690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:11,214-Speed 9492.46 samples/sec Loss 5.0516 LearningRate 0.0125 Epoch: 12 Global Step: 215700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:12,275-Speed 9654.93 samples/sec Loss 5.0487 LearningRate 0.0125 Epoch: 12 Global Step: 215710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:13,339-Speed 9629.68 samples/sec Loss 5.0324 LearningRate 0.0125 Epoch: 12 Global Step: 215720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:14,421-Speed 9471.26 samples/sec Loss 4.9377 LearningRate 0.0125 Epoch: 12 Global Step: 215730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:15,502-Speed 9477.79 samples/sec Loss 4.9786 LearningRate 0.0125 Epoch: 12 Global Step: 215740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:16,597-Speed 9355.16 samples/sec Loss 5.0290 LearningRate 0.0125 Epoch: 12 Global Step: 215750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:17,695-Speed 9328.79 samples/sec Loss 5.0109 LearningRate 0.0125 Epoch: 12 Global Step: 215760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:18,786-Speed 9396.37 samples/sec Loss 4.9644 LearningRate 0.0125 Epoch: 12 Global Step: 215770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:19,884-Speed 9327.29 samples/sec Loss 4.9824 LearningRate 0.0125 Epoch: 12 Global Step: 215780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:20,916-Speed 9928.41 samples/sec Loss 4.9370 LearningRate 0.0125 Epoch: 12 Global Step: 215790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:21,974-Speed 9683.23 samples/sec Loss 5.0972 LearningRate 0.0125 Epoch: 12 Global Step: 215800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:23,039-Speed 9618.73 samples/sec Loss 4.9349 LearningRate 0.0125 Epoch: 12 Global Step: 215810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:24,110-Speed 9578.03 samples/sec Loss 5.1960 LearningRate 0.0125 Epoch: 12 Global Step: 215820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:25,160-Speed 9753.55 samples/sec Loss 5.0873 LearningRate 0.0125 Epoch: 12 Global Step: 215830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:26,214-Speed 9727.27 samples/sec Loss 5.1257 LearningRate 0.0125 Epoch: 12 Global Step: 215840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:27,293-Speed 9493.54 samples/sec Loss 5.0860 LearningRate 0.0125 Epoch: 12 Global Step: 215850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:28,413-Speed 9144.97 samples/sec Loss 5.0650 LearningRate 0.0125 Epoch: 12 Global Step: 215860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:29,466-Speed 9728.46 samples/sec Loss 5.0055 LearningRate 0.0125 Epoch: 12 Global Step: 215870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:30,525-Speed 9678.64 samples/sec Loss 5.0086 LearningRate 0.0125 Epoch: 12 Global Step: 215880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:31,589-Speed 9632.01 samples/sec Loss 5.0150 LearningRate 0.0125 Epoch: 12 Global Step: 215890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:14:32,671-Speed 9473.76 samples/sec Loss 5.0561 LearningRate 0.0125 Epoch: 12 Global Step: 215900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:33,754-Speed 9460.11 samples/sec Loss 4.9561 LearningRate 0.0125 Epoch: 12 Global Step: 215910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:34,867-Speed 9200.42 samples/sec Loss 5.1061 LearningRate 0.0125 Epoch: 12 Global Step: 215920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:35,932-Speed 9624.63 samples/sec Loss 4.9454 LearningRate 0.0125 Epoch: 12 Global Step: 215930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:37,012-Speed 9481.50 samples/sec Loss 5.1136 LearningRate 0.0125 Epoch: 12 Global Step: 215940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:38,130-Speed 9170.61 samples/sec Loss 5.0007 LearningRate 0.0125 Epoch: 12 Global Step: 215950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:39,211-Speed 9475.42 samples/sec Loss 4.9904 LearningRate 0.0125 Epoch: 12 Global Step: 215960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:40,260-Speed 9767.39 samples/sec Loss 5.0823 LearningRate 0.0125 Epoch: 12 Global Step: 215970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:41,324-Speed 9628.57 samples/sec Loss 5.0374 LearningRate 0.0125 Epoch: 12 Global Step: 215980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:42,396-Speed 9560.65 samples/sec Loss 5.0377 LearningRate 0.0125 Epoch: 12 Global Step: 215990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:14:43,491-Speed 9374.10 samples/sec Loss 5.0640 LearningRate 0.0125 Epoch: 12 Global Step: 216000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:15:05,388-[lfw][216000]XNorm: 8.206345 Training: 2022-04-11 20:15:05,388-[lfw][216000]Accuracy-Flip: 0.99667+-0.00258 Training: 2022-04-11 20:15:05,389-[lfw][216000]Accuracy-Highest: 0.99683 Training: 2022-04-11 20:15:30,729-[cfp_fp][216000]XNorm: 7.009395 Training: 2022-04-11 20:15:30,730-[cfp_fp][216000]Accuracy-Flip: 0.96586+-0.00936 Training: 2022-04-11 20:15:30,730-[cfp_fp][216000]Accuracy-Highest: 0.96771 Training: 2022-04-11 20:15:52,629-[agedb_30][216000]XNorm: 7.970877 Training: 2022-04-11 20:15:52,630-[agedb_30][216000]Accuracy-Flip: 0.97033+-0.00980 Training: 2022-04-11 20:15:52,630-[agedb_30][216000]Accuracy-Highest: 0.97033 Training: 2022-04-11 20:15:53,722-Speed 145.80 samples/sec Loss 5.0488 LearningRate 0.0125 Epoch: 12 Global Step: 216010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:15:54,772-Speed 9761.15 samples/sec Loss 4.9203 LearningRate 0.0125 Epoch: 12 Global Step: 216020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:15:55,823-Speed 9741.00 samples/sec Loss 4.9919 LearningRate 0.0125 Epoch: 12 Global Step: 216030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:15:56,998-Speed 8723.72 samples/sec Loss 5.0672 LearningRate 0.0124 Epoch: 12 Global Step: 216040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:15:58,094-Speed 9345.52 samples/sec Loss 4.9774 LearningRate 0.0124 Epoch: 12 Global Step: 216050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:15:59,171-Speed 9510.98 samples/sec Loss 5.0749 LearningRate 0.0124 Epoch: 12 Global Step: 216060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:00,238-Speed 9608.29 samples/sec Loss 5.0936 LearningRate 0.0124 Epoch: 12 Global Step: 216070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:01,345-Speed 9250.20 samples/sec Loss 5.0518 LearningRate 0.0124 Epoch: 12 Global Step: 216080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:02,468-Speed 9123.67 samples/sec Loss 5.0350 LearningRate 0.0124 Epoch: 12 Global Step: 216090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:03,577-Speed 9238.80 samples/sec Loss 4.9979 LearningRate 0.0124 Epoch: 12 Global Step: 216100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:04,637-Speed 9667.30 samples/sec Loss 4.9895 LearningRate 0.0124 Epoch: 12 Global Step: 216110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:05,739-Speed 9302.06 samples/sec Loss 5.1260 LearningRate 0.0124 Epoch: 12 Global Step: 216120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:06,867-Speed 9086.06 samples/sec Loss 4.9828 LearningRate 0.0124 Epoch: 12 Global Step: 216130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:07,947-Speed 9486.48 samples/sec Loss 4.9574 LearningRate 0.0124 Epoch: 12 Global Step: 216140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:09,028-Speed 9479.37 samples/sec Loss 5.0536 LearningRate 0.0124 Epoch: 12 Global Step: 216150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:10,117-Speed 9404.10 samples/sec Loss 4.9567 LearningRate 0.0124 Epoch: 12 Global Step: 216160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:11,223-Speed 9261.13 samples/sec Loss 4.9548 LearningRate 0.0124 Epoch: 12 Global Step: 216170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:12,284-Speed 9661.21 samples/sec Loss 5.0882 LearningRate 0.0124 Epoch: 12 Global Step: 216180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:13,369-Speed 9446.99 samples/sec Loss 5.1502 LearningRate 0.0124 Epoch: 12 Global Step: 216190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:14,460-Speed 9395.11 samples/sec Loss 5.0765 LearningRate 0.0124 Epoch: 12 Global Step: 216200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:15,531-Speed 9565.15 samples/sec Loss 5.1448 LearningRate 0.0124 Epoch: 12 Global Step: 216210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:16,683-Speed 8890.99 samples/sec Loss 5.0610 LearningRate 0.0124 Epoch: 12 Global Step: 216220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:17,777-Speed 9366.95 samples/sec Loss 5.1514 LearningRate 0.0124 Epoch: 12 Global Step: 216230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:18,844-Speed 9605.15 samples/sec Loss 4.9919 LearningRate 0.0124 Epoch: 12 Global Step: 216240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:19,913-Speed 9578.57 samples/sec Loss 5.0364 LearningRate 0.0124 Epoch: 12 Global Step: 216250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:21,033-Speed 9145.91 samples/sec Loss 5.0132 LearningRate 0.0124 Epoch: 12 Global Step: 216260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:22,122-Speed 9413.88 samples/sec Loss 5.1229 LearningRate 0.0124 Epoch: 12 Global Step: 216270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:23,199-Speed 9510.50 samples/sec Loss 5.1409 LearningRate 0.0124 Epoch: 12 Global Step: 216280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:24,340-Speed 8981.04 samples/sec Loss 4.9726 LearningRate 0.0124 Epoch: 12 Global Step: 216290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:25,427-Speed 9424.76 samples/sec Loss 5.0610 LearningRate 0.0124 Epoch: 12 Global Step: 216300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:26,564-Speed 9011.34 samples/sec Loss 4.9234 LearningRate 0.0124 Epoch: 12 Global Step: 216310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:27,628-Speed 9633.07 samples/sec Loss 5.1034 LearningRate 0.0124 Epoch: 12 Global Step: 216320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:28,684-Speed 9699.19 samples/sec Loss 5.0266 LearningRate 0.0124 Epoch: 12 Global Step: 216330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:29,760-Speed 9520.02 samples/sec Loss 4.9333 LearningRate 0.0124 Epoch: 12 Global Step: 216340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:30,838-Speed 9507.42 samples/sec Loss 4.9823 LearningRate 0.0124 Epoch: 12 Global Step: 216350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:31,944-Speed 9268.24 samples/sec Loss 4.9560 LearningRate 0.0124 Epoch: 12 Global Step: 216360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:33,019-Speed 9530.08 samples/sec Loss 5.0220 LearningRate 0.0124 Epoch: 12 Global Step: 216370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:34,108-Speed 9401.51 samples/sec Loss 5.0211 LearningRate 0.0124 Epoch: 12 Global Step: 216380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:35,216-Speed 9255.79 samples/sec Loss 4.9533 LearningRate 0.0124 Epoch: 12 Global Step: 216390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:36,288-Speed 9560.44 samples/sec Loss 5.0329 LearningRate 0.0124 Epoch: 12 Global Step: 216400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:37,426-Speed 9001.89 samples/sec Loss 5.0500 LearningRate 0.0124 Epoch: 12 Global Step: 216410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:38,543-Speed 9169.75 samples/sec Loss 5.0441 LearningRate 0.0124 Epoch: 12 Global Step: 216420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:39,613-Speed 9578.14 samples/sec Loss 5.0301 LearningRate 0.0124 Epoch: 12 Global Step: 216430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:40,675-Speed 9645.45 samples/sec Loss 5.0062 LearningRate 0.0124 Epoch: 12 Global Step: 216440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:41,805-Speed 9074.79 samples/sec Loss 5.0303 LearningRate 0.0124 Epoch: 12 Global Step: 216450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:16:42,895-Speed 9396.55 samples/sec Loss 4.9851 LearningRate 0.0124 Epoch: 12 Global Step: 216460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:44,018-Speed 9125.50 samples/sec Loss 4.9845 LearningRate 0.0124 Epoch: 12 Global Step: 216470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:45,067-Speed 9761.14 samples/sec Loss 5.0489 LearningRate 0.0124 Epoch: 12 Global Step: 216480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:46,145-Speed 9501.85 samples/sec Loss 5.0657 LearningRate 0.0124 Epoch: 12 Global Step: 216490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:47,222-Speed 9521.18 samples/sec Loss 5.0264 LearningRate 0.0124 Epoch: 12 Global Step: 216500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:48,333-Speed 9221.30 samples/sec Loss 5.0424 LearningRate 0.0123 Epoch: 12 Global Step: 216510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:49,424-Speed 9388.53 samples/sec Loss 4.9985 LearningRate 0.0123 Epoch: 12 Global Step: 216520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:50,499-Speed 9535.07 samples/sec Loss 5.0783 LearningRate 0.0123 Epoch: 12 Global Step: 216530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:51,599-Speed 9315.36 samples/sec Loss 5.0249 LearningRate 0.0123 Epoch: 12 Global Step: 216540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:52,703-Speed 9280.73 samples/sec Loss 5.1229 LearningRate 0.0123 Epoch: 12 Global Step: 216550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:53,845-Speed 8973.93 samples/sec Loss 4.9592 LearningRate 0.0123 Epoch: 12 Global Step: 216560 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:16:54,930-Speed 9439.32 samples/sec Loss 5.0490 LearningRate 0.0123 Epoch: 12 Global Step: 216570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:56,028-Speed 9336.61 samples/sec Loss 4.9288 LearningRate 0.0123 Epoch: 12 Global Step: 216580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:57,092-Speed 9629.53 samples/sec Loss 4.9971 LearningRate 0.0123 Epoch: 12 Global Step: 216590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:58,200-Speed 9246.52 samples/sec Loss 4.9451 LearningRate 0.0123 Epoch: 12 Global Step: 216600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:16:59,283-Speed 9460.81 samples/sec Loss 5.0599 LearningRate 0.0123 Epoch: 12 Global Step: 216610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:00,346-Speed 9637.14 samples/sec Loss 5.1061 LearningRate 0.0123 Epoch: 12 Global Step: 216620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:01,426-Speed 9481.98 samples/sec Loss 5.1149 LearningRate 0.0123 Epoch: 12 Global Step: 216630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:02,540-Speed 9199.44 samples/sec Loss 4.9693 LearningRate 0.0123 Epoch: 12 Global Step: 216640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:03,619-Speed 9498.37 samples/sec Loss 5.0490 LearningRate 0.0123 Epoch: 12 Global Step: 216650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:04,685-Speed 9611.20 samples/sec Loss 5.0224 LearningRate 0.0123 Epoch: 12 Global Step: 216660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:05,800-Speed 9192.36 samples/sec Loss 5.0083 LearningRate 0.0123 Epoch: 12 Global Step: 216670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:06,896-Speed 9345.92 samples/sec Loss 5.0771 LearningRate 0.0123 Epoch: 12 Global Step: 216680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:08,030-Speed 9041.96 samples/sec Loss 4.9797 LearningRate 0.0123 Epoch: 12 Global Step: 216690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:09,100-Speed 9573.89 samples/sec Loss 5.0735 LearningRate 0.0123 Epoch: 12 Global Step: 216700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:10,161-Speed 9655.72 samples/sec Loss 4.9930 LearningRate 0.0123 Epoch: 12 Global Step: 216710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:11,270-Speed 9242.76 samples/sec Loss 4.9339 LearningRate 0.0123 Epoch: 12 Global Step: 216720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:12,468-Speed 8553.51 samples/sec Loss 4.9276 LearningRate 0.0123 Epoch: 12 Global Step: 216730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:13,581-Speed 9202.66 samples/sec Loss 5.0672 LearningRate 0.0123 Epoch: 12 Global Step: 216740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:14,674-Speed 9381.82 samples/sec Loss 5.0028 LearningRate 0.0123 Epoch: 12 Global Step: 216750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:15,758-Speed 9443.87 samples/sec Loss 5.0146 LearningRate 0.0123 Epoch: 12 Global Step: 216760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:16,846-Speed 9423.20 samples/sec Loss 4.9881 LearningRate 0.0123 Epoch: 12 Global Step: 216770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:17,923-Speed 9509.71 samples/sec Loss 4.9722 LearningRate 0.0123 Epoch: 12 Global Step: 216780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:18,998-Speed 9529.00 samples/sec Loss 5.0402 LearningRate 0.0123 Epoch: 12 Global Step: 216790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:20,086-Speed 9418.10 samples/sec Loss 4.9145 LearningRate 0.0123 Epoch: 12 Global Step: 216800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:21,142-Speed 9704.04 samples/sec Loss 5.1144 LearningRate 0.0123 Epoch: 12 Global Step: 216810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:22,210-Speed 9589.72 samples/sec Loss 4.9954 LearningRate 0.0123 Epoch: 12 Global Step: 216820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:23,283-Speed 9549.80 samples/sec Loss 5.0406 LearningRate 0.0123 Epoch: 12 Global Step: 216830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:24,393-Speed 9239.69 samples/sec Loss 5.0703 LearningRate 0.0123 Epoch: 12 Global Step: 216840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:25,472-Speed 9502.16 samples/sec Loss 5.0561 LearningRate 0.0123 Epoch: 12 Global Step: 216850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:17:26,542-Speed 9575.42 samples/sec Loss 4.9845 LearningRate 0.0123 Epoch: 12 Global Step: 216860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:27,629-Speed 9421.67 samples/sec Loss 5.1211 LearningRate 0.0123 Epoch: 12 Global Step: 216870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:28,705-Speed 9522.37 samples/sec Loss 4.8719 LearningRate 0.0123 Epoch: 12 Global Step: 216880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:29,765-Speed 9674.18 samples/sec Loss 4.9122 LearningRate 0.0123 Epoch: 12 Global Step: 216890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:30,861-Speed 9346.24 samples/sec Loss 4.9357 LearningRate 0.0123 Epoch: 12 Global Step: 216900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:31,934-Speed 9548.74 samples/sec Loss 5.0011 LearningRate 0.0123 Epoch: 12 Global Step: 216910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:33,011-Speed 9515.91 samples/sec Loss 5.0172 LearningRate 0.0123 Epoch: 12 Global Step: 216920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:34,106-Speed 9357.43 samples/sec Loss 5.0484 LearningRate 0.0123 Epoch: 12 Global Step: 216930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:35,205-Speed 9323.00 samples/sec Loss 5.0239 LearningRate 0.0123 Epoch: 12 Global Step: 216940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:36,356-Speed 8902.13 samples/sec Loss 5.0055 LearningRate 0.0123 Epoch: 12 Global Step: 216950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:37,429-Speed 9546.53 samples/sec Loss 4.9617 LearningRate 0.0123 Epoch: 12 Global Step: 216960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:38,492-Speed 9641.89 samples/sec Loss 5.0616 LearningRate 0.0123 Epoch: 12 Global Step: 216970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:17:39,766-Speed 8042.26 samples/sec Loss 5.0640 LearningRate 0.0123 Epoch: 12 Global Step: 216980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:08,502-Speed 356.35 samples/sec Loss 4.5408 LearningRate 0.0122 Epoch: 13 Global Step: 216990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:09,592-Speed 9398.47 samples/sec Loss 4.2905 LearningRate 0.0122 Epoch: 13 Global Step: 217000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:10,983-Speed 7367.97 samples/sec Loss 4.3195 LearningRate 0.0122 Epoch: 13 Global Step: 217010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:12,386-Speed 7306.18 samples/sec Loss 4.3776 LearningRate 0.0122 Epoch: 13 Global Step: 217020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:13,644-Speed 8141.70 samples/sec Loss 4.3003 LearningRate 0.0122 Epoch: 13 Global Step: 217030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:14,971-Speed 7723.75 samples/sec Loss 4.2562 LearningRate 0.0122 Epoch: 13 Global Step: 217040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:16,285-Speed 7797.93 samples/sec Loss 4.3716 LearningRate 0.0122 Epoch: 13 Global Step: 217050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:17,407-Speed 9137.39 samples/sec Loss 4.3648 LearningRate 0.0122 Epoch: 13 Global Step: 217060 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:18:18,573-Speed 8786.81 samples/sec Loss 4.3196 LearningRate 0.0122 Epoch: 13 Global Step: 217070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:19,779-Speed 8491.79 samples/sec Loss 4.3851 LearningRate 0.0122 Epoch: 13 Global Step: 217080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:20,877-Speed 9340.18 samples/sec Loss 4.4009 LearningRate 0.0122 Epoch: 13 Global Step: 217090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:21,983-Speed 9256.16 samples/sec Loss 4.3528 LearningRate 0.0122 Epoch: 13 Global Step: 217100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:23,106-Speed 9126.37 samples/sec Loss 4.3077 LearningRate 0.0122 Epoch: 13 Global Step: 217110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:24,198-Speed 9384.94 samples/sec Loss 4.3818 LearningRate 0.0122 Epoch: 13 Global Step: 217120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:25,413-Speed 8433.72 samples/sec Loss 4.3432 LearningRate 0.0122 Epoch: 13 Global Step: 217130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:26,514-Speed 9307.54 samples/sec Loss 4.3823 LearningRate 0.0122 Epoch: 13 Global Step: 217140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:27,612-Speed 9331.81 samples/sec Loss 4.2962 LearningRate 0.0122 Epoch: 13 Global Step: 217150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:28,757-Speed 8950.98 samples/sec Loss 4.2852 LearningRate 0.0122 Epoch: 13 Global Step: 217160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:29,877-Speed 9146.22 samples/sec Loss 4.3170 LearningRate 0.0122 Epoch: 13 Global Step: 217170 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:18:30,978-Speed 9310.27 samples/sec Loss 4.3794 LearningRate 0.0122 Epoch: 13 Global Step: 217180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:32,070-Speed 9379.51 samples/sec Loss 4.2996 LearningRate 0.0122 Epoch: 13 Global Step: 217190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:33,186-Speed 9181.69 samples/sec Loss 4.2404 LearningRate 0.0122 Epoch: 13 Global Step: 217200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:34,328-Speed 8976.11 samples/sec Loss 4.3452 LearningRate 0.0122 Epoch: 13 Global Step: 217210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:35,463-Speed 9033.58 samples/sec Loss 4.3240 LearningRate 0.0122 Epoch: 13 Global Step: 217220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:36,583-Speed 9145.25 samples/sec Loss 4.3453 LearningRate 0.0122 Epoch: 13 Global Step: 217230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:37,702-Speed 9157.19 samples/sec Loss 4.4223 LearningRate 0.0122 Epoch: 13 Global Step: 217240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:38,896-Speed 8584.42 samples/sec Loss 4.3617 LearningRate 0.0122 Epoch: 13 Global Step: 217250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:39,963-Speed 9600.50 samples/sec Loss 4.3685 LearningRate 0.0122 Epoch: 13 Global Step: 217260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:41,054-Speed 9387.44 samples/sec Loss 4.3888 LearningRate 0.0122 Epoch: 13 Global Step: 217270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:42,163-Speed 9246.19 samples/sec Loss 4.3576 LearningRate 0.0122 Epoch: 13 Global Step: 217280 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:18:43,283-Speed 9147.81 samples/sec Loss 4.2875 LearningRate 0.0122 Epoch: 13 Global Step: 217290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:44,354-Speed 9561.37 samples/sec Loss 4.2282 LearningRate 0.0122 Epoch: 13 Global Step: 217300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:45,464-Speed 9235.43 samples/sec Loss 4.3612 LearningRate 0.0122 Epoch: 13 Global Step: 217310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:46,549-Speed 9446.08 samples/sec Loss 4.4186 LearningRate 0.0122 Epoch: 13 Global Step: 217320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:47,666-Speed 9171.07 samples/sec Loss 4.2834 LearningRate 0.0122 Epoch: 13 Global Step: 217330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:48,781-Speed 9188.80 samples/sec Loss 4.3409 LearningRate 0.0122 Epoch: 13 Global Step: 217340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:49,889-Speed 9246.08 samples/sec Loss 4.3461 LearningRate 0.0122 Epoch: 13 Global Step: 217350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:51,020-Speed 9060.26 samples/sec Loss 4.4727 LearningRate 0.0122 Epoch: 13 Global Step: 217360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:52,094-Speed 9542.04 samples/sec Loss 4.3586 LearningRate 0.0122 Epoch: 13 Global Step: 217370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:53,160-Speed 9605.10 samples/sec Loss 4.3842 LearningRate 0.0122 Epoch: 13 Global Step: 217380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:54,225-Speed 9629.94 samples/sec Loss 4.3932 LearningRate 0.0122 Epoch: 13 Global Step: 217390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:55,338-Speed 9203.97 samples/sec Loss 4.3328 LearningRate 0.0122 Epoch: 13 Global Step: 217400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:18:56,470-Speed 9048.30 samples/sec Loss 4.3430 LearningRate 0.0122 Epoch: 13 Global Step: 217410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:18:57,573-Speed 9292.78 samples/sec Loss 4.3836 LearningRate 0.0122 Epoch: 13 Global Step: 217420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:18:58,689-Speed 9176.99 samples/sec Loss 4.3219 LearningRate 0.0122 Epoch: 13 Global Step: 217430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:18:59,824-Speed 9024.57 samples/sec Loss 4.2975 LearningRate 0.0122 Epoch: 13 Global Step: 217440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:19:00,932-Speed 9249.50 samples/sec Loss 4.2877 LearningRate 0.0122 Epoch: 13 Global Step: 217450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:19:02,057-Speed 9110.46 samples/sec Loss 4.3137 LearningRate 0.0122 Epoch: 13 Global Step: 217460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:19:03,202-Speed 8946.50 samples/sec Loss 4.3157 LearningRate 0.0121 Epoch: 13 Global Step: 217470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:19:04,310-Speed 9247.27 samples/sec Loss 4.3655 LearningRate 0.0121 Epoch: 13 Global Step: 217480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:19:05,375-Speed 9622.53 samples/sec Loss 4.3617 LearningRate 0.0121 Epoch: 13 Global Step: 217490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:19:06,454-Speed 9494.14 samples/sec Loss 4.3684 LearningRate 0.0121 Epoch: 13 Global Step: 217500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:19:07,559-Speed 9274.32 samples/sec Loss 4.2825 LearningRate 0.0121 Epoch: 13 Global Step: 217510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:08,688-Speed 9072.87 samples/sec Loss 4.4126 LearningRate 0.0121 Epoch: 13 Global Step: 217520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:09,728-Speed 9858.41 samples/sec Loss 4.3544 LearningRate 0.0121 Epoch: 13 Global Step: 217530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:10,826-Speed 9327.71 samples/sec Loss 4.4034 LearningRate 0.0121 Epoch: 13 Global Step: 217540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:11,920-Speed 9367.17 samples/sec Loss 4.3960 LearningRate 0.0121 Epoch: 13 Global Step: 217550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:12,994-Speed 9538.13 samples/sec Loss 4.3811 LearningRate 0.0121 Epoch: 13 Global Step: 217560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:14,049-Speed 9715.80 samples/sec Loss 4.3995 LearningRate 0.0121 Epoch: 13 Global Step: 217570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:15,515-Speed 6989.03 samples/sec Loss 4.3349 LearningRate 0.0121 Epoch: 13 Global Step: 217580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:16,945-Speed 7162.53 samples/sec Loss 4.3824 LearningRate 0.0121 Epoch: 13 Global Step: 217590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:18,032-Speed 9425.31 samples/sec Loss 4.3445 LearningRate 0.0121 Epoch: 13 Global Step: 217600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:19,274-Speed 8252.12 samples/sec Loss 4.3822 LearningRate 0.0121 Epoch: 13 Global Step: 217610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:20,532-Speed 8144.67 samples/sec Loss 4.3494 LearningRate 0.0121 Epoch: 13 Global Step: 217620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:21,828-Speed 7904.55 samples/sec Loss 4.3431 LearningRate 0.0121 Epoch: 13 Global Step: 217630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:22,932-Speed 9282.90 samples/sec Loss 4.3347 LearningRate 0.0121 Epoch: 13 Global Step: 217640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:24,031-Speed 9322.42 samples/sec Loss 4.3830 LearningRate 0.0121 Epoch: 13 Global Step: 217650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:25,102-Speed 9564.99 samples/sec Loss 4.4408 LearningRate 0.0121 Epoch: 13 Global Step: 217660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:26,158-Speed 9709.99 samples/sec Loss 4.3369 LearningRate 0.0121 Epoch: 13 Global Step: 217670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:27,403-Speed 8225.29 samples/sec Loss 4.4222 LearningRate 0.0121 Epoch: 13 Global Step: 217680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:28,497-Speed 9371.84 samples/sec Loss 4.4146 LearningRate 0.0121 Epoch: 13 Global Step: 217690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:29,584-Speed 9418.83 samples/sec Loss 4.4441 LearningRate 0.0121 Epoch: 13 Global Step: 217700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:30,670-Speed 9437.90 samples/sec Loss 4.4633 LearningRate 0.0121 Epoch: 13 Global Step: 217710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:31,793-Speed 9123.88 samples/sec Loss 4.4208 LearningRate 0.0121 Epoch: 13 Global Step: 217720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:32,860-Speed 9600.82 samples/sec Loss 4.3827 LearningRate 0.0121 Epoch: 13 Global Step: 217730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:33,961-Speed 9304.95 samples/sec Loss 4.4109 LearningRate 0.0121 Epoch: 13 Global Step: 217740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:35,042-Speed 9475.87 samples/sec Loss 4.4505 LearningRate 0.0121 Epoch: 13 Global Step: 217750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:36,137-Speed 9361.79 samples/sec Loss 4.4306 LearningRate 0.0121 Epoch: 13 Global Step: 217760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:37,261-Speed 9118.63 samples/sec Loss 4.4663 LearningRate 0.0121 Epoch: 13 Global Step: 217770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:38,387-Speed 9099.73 samples/sec Loss 4.4026 LearningRate 0.0121 Epoch: 13 Global Step: 217780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:39,488-Speed 9302.44 samples/sec Loss 4.3485 LearningRate 0.0121 Epoch: 13 Global Step: 217790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:40,578-Speed 9406.90 samples/sec Loss 4.4283 LearningRate 0.0121 Epoch: 13 Global Step: 217800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:41,679-Speed 9303.14 samples/sec Loss 4.4761 LearningRate 0.0121 Epoch: 13 Global Step: 217810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:42,734-Speed 9721.11 samples/sec Loss 4.4146 LearningRate 0.0121 Epoch: 13 Global Step: 217820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:43,796-Speed 9649.71 samples/sec Loss 4.5240 LearningRate 0.0121 Epoch: 13 Global Step: 217830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:44,924-Speed 9079.72 samples/sec Loss 4.4857 LearningRate 0.0121 Epoch: 13 Global Step: 217840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:46,016-Speed 9381.54 samples/sec Loss 4.3595 LearningRate 0.0121 Epoch: 13 Global Step: 217850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:47,150-Speed 9035.27 samples/sec Loss 4.3844 LearningRate 0.0121 Epoch: 13 Global Step: 217860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:48,238-Speed 9422.99 samples/sec Loss 4.3166 LearningRate 0.0121 Epoch: 13 Global Step: 217870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:49,352-Speed 9192.52 samples/sec Loss 4.4088 LearningRate 0.0121 Epoch: 13 Global Step: 217880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:50,459-Speed 9252.83 samples/sec Loss 4.3423 LearningRate 0.0121 Epoch: 13 Global Step: 217890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:51,554-Speed 9357.65 samples/sec Loss 4.4765 LearningRate 0.0121 Epoch: 13 Global Step: 217900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:52,724-Speed 8759.96 samples/sec Loss 4.4300 LearningRate 0.0121 Epoch: 13 Global Step: 217910 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:19:53,841-Speed 9174.46 samples/sec Loss 4.4609 LearningRate 0.0121 Epoch: 13 Global Step: 217920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:54,927-Speed 9432.08 samples/sec Loss 4.3841 LearningRate 0.0121 Epoch: 13 Global Step: 217930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:56,025-Speed 9338.07 samples/sec Loss 4.4320 LearningRate 0.0121 Epoch: 13 Global Step: 217940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:57,175-Speed 8907.70 samples/sec Loss 4.4769 LearningRate 0.0120 Epoch: 13 Global Step: 217950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:58,284-Speed 9239.17 samples/sec Loss 4.3469 LearningRate 0.0120 Epoch: 13 Global Step: 217960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:19:59,389-Speed 9266.31 samples/sec Loss 4.4220 LearningRate 0.0120 Epoch: 13 Global Step: 217970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:20:00,490-Speed 9311.75 samples/sec Loss 4.4507 LearningRate 0.0120 Epoch: 13 Global Step: 217980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:20:01,683-Speed 8589.61 samples/sec Loss 4.4553 LearningRate 0.0120 Epoch: 13 Global Step: 217990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:20:02,781-Speed 9332.79 samples/sec Loss 4.3450 LearningRate 0.0120 Epoch: 13 Global Step: 218000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:20:24,528-[lfw][218000]XNorm: 8.244316 Training: 2022-04-11 20:20:24,529-[lfw][218000]Accuracy-Flip: 0.99700+-0.00314 Training: 2022-04-11 20:20:24,529-[lfw][218000]Accuracy-Highest: 0.99700 Training: 2022-04-11 20:20:49,697-[cfp_fp][218000]XNorm: 7.073361 Training: 2022-04-11 20:20:49,698-[cfp_fp][218000]Accuracy-Flip: 0.96500+-0.01145 Training: 2022-04-11 20:20:49,698-[cfp_fp][218000]Accuracy-Highest: 0.96771 Training: 2022-04-11 20:21:11,403-[agedb_30][218000]XNorm: 7.942713 Training: 2022-04-11 20:21:11,404-[agedb_30][218000]Accuracy-Flip: 0.96833+-0.00972 Training: 2022-04-11 20:21:11,404-[agedb_30][218000]Accuracy-Highest: 0.97033 Training: 2022-04-11 20:21:12,502-Speed 146.87 samples/sec Loss 4.4353 LearningRate 0.0120 Epoch: 13 Global Step: 218010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:13,546-Speed 9813.48 samples/sec Loss 4.5222 LearningRate 0.0120 Epoch: 13 Global Step: 218020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:14,620-Speed 9542.76 samples/sec Loss 4.4397 LearningRate 0.0120 Epoch: 13 Global Step: 218030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:15,738-Speed 9166.06 samples/sec Loss 4.4036 LearningRate 0.0120 Epoch: 13 Global Step: 218040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:16,811-Speed 9546.27 samples/sec Loss 4.3977 LearningRate 0.0120 Epoch: 13 Global Step: 218050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:17,887-Speed 9520.41 samples/sec Loss 4.3959 LearningRate 0.0120 Epoch: 13 Global Step: 218060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:18,969-Speed 9575.36 samples/sec Loss 4.4232 LearningRate 0.0120 Epoch: 13 Global Step: 218070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:20,043-Speed 9539.86 samples/sec Loss 4.4734 LearningRate 0.0120 Epoch: 13 Global Step: 218080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:21,139-Speed 9351.91 samples/sec Loss 4.4179 LearningRate 0.0120 Epoch: 13 Global Step: 218090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:22,242-Speed 9290.02 samples/sec Loss 4.4591 LearningRate 0.0120 Epoch: 13 Global Step: 218100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:23,333-Speed 9399.97 samples/sec Loss 4.4614 LearningRate 0.0120 Epoch: 13 Global Step: 218110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:24,414-Speed 9473.74 samples/sec Loss 4.4781 LearningRate 0.0120 Epoch: 13 Global Step: 218120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:25,497-Speed 9465.58 samples/sec Loss 4.4025 LearningRate 0.0120 Epoch: 13 Global Step: 218130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:26,610-Speed 9205.43 samples/sec Loss 4.5334 LearningRate 0.0120 Epoch: 13 Global Step: 218140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:27,671-Speed 9657.91 samples/sec Loss 4.4266 LearningRate 0.0120 Epoch: 13 Global Step: 218150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:28,768-Speed 9336.84 samples/sec Loss 4.4542 LearningRate 0.0120 Epoch: 13 Global Step: 218160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:29,847-Speed 9496.93 samples/sec Loss 4.4354 LearningRate 0.0120 Epoch: 13 Global Step: 218170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:30,977-Speed 9068.53 samples/sec Loss 4.4904 LearningRate 0.0120 Epoch: 13 Global Step: 218180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:32,087-Speed 9237.60 samples/sec Loss 4.4857 LearningRate 0.0120 Epoch: 13 Global Step: 218190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:33,207-Speed 9140.46 samples/sec Loss 4.4469 LearningRate 0.0120 Epoch: 13 Global Step: 218200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:34,303-Speed 9355.69 samples/sec Loss 4.3888 LearningRate 0.0120 Epoch: 13 Global Step: 218210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:35,400-Speed 9337.21 samples/sec Loss 4.4902 LearningRate 0.0120 Epoch: 13 Global Step: 218220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:36,485-Speed 9452.87 samples/sec Loss 4.4755 LearningRate 0.0120 Epoch: 13 Global Step: 218230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:37,556-Speed 9564.80 samples/sec Loss 4.4233 LearningRate 0.0120 Epoch: 13 Global Step: 218240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:38,682-Speed 9099.54 samples/sec Loss 4.4288 LearningRate 0.0120 Epoch: 13 Global Step: 218250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:39,801-Speed 9159.94 samples/sec Loss 4.4670 LearningRate 0.0120 Epoch: 13 Global Step: 218260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:40,896-Speed 9350.74 samples/sec Loss 4.5793 LearningRate 0.0120 Epoch: 13 Global Step: 218270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:41,997-Speed 9306.20 samples/sec Loss 4.4619 LearningRate 0.0120 Epoch: 13 Global Step: 218280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:43,083-Speed 9439.77 samples/sec Loss 4.5622 LearningRate 0.0120 Epoch: 13 Global Step: 218290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:44,208-Speed 9102.56 samples/sec Loss 4.4409 LearningRate 0.0120 Epoch: 13 Global Step: 218300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:45,274-Speed 9615.87 samples/sec Loss 4.4654 LearningRate 0.0120 Epoch: 13 Global Step: 218310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:46,372-Speed 9334.23 samples/sec Loss 4.5110 LearningRate 0.0120 Epoch: 13 Global Step: 218320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:47,480-Speed 9247.71 samples/sec Loss 4.4432 LearningRate 0.0120 Epoch: 13 Global Step: 218330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:48,588-Speed 9242.90 samples/sec Loss 4.4538 LearningRate 0.0120 Epoch: 13 Global Step: 218340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:49,702-Speed 9200.17 samples/sec Loss 4.5361 LearningRate 0.0120 Epoch: 13 Global Step: 218350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:50,763-Speed 9659.67 samples/sec Loss 4.4696 LearningRate 0.0120 Epoch: 13 Global Step: 218360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:51,860-Speed 9342.89 samples/sec Loss 4.4126 LearningRate 0.0120 Epoch: 13 Global Step: 218370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:21:52,970-Speed 9234.59 samples/sec Loss 4.4102 LearningRate 0.0120 Epoch: 13 Global Step: 218380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:54,067-Speed 9334.69 samples/sec Loss 4.4852 LearningRate 0.0120 Epoch: 13 Global Step: 218390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:55,232-Speed 8794.78 samples/sec Loss 4.4835 LearningRate 0.0120 Epoch: 13 Global Step: 218400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:56,322-Speed 9405.05 samples/sec Loss 4.4978 LearningRate 0.0120 Epoch: 13 Global Step: 218410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:57,424-Speed 9293.57 samples/sec Loss 4.4938 LearningRate 0.0120 Epoch: 13 Global Step: 218420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:58,510-Speed 9437.08 samples/sec Loss 4.5067 LearningRate 0.0119 Epoch: 13 Global Step: 218430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:21:59,623-Speed 9205.76 samples/sec Loss 4.3824 LearningRate 0.0119 Epoch: 13 Global Step: 218440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:00,689-Speed 9605.08 samples/sec Loss 4.4669 LearningRate 0.0119 Epoch: 13 Global Step: 218450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:01,789-Speed 9319.88 samples/sec Loss 4.4528 LearningRate 0.0119 Epoch: 13 Global Step: 218460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:02,876-Speed 9424.26 samples/sec Loss 4.4906 LearningRate 0.0119 Epoch: 13 Global Step: 218470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:03,961-Speed 9441.01 samples/sec Loss 4.4954 LearningRate 0.0119 Epoch: 13 Global Step: 218480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:05,045-Speed 9457.92 samples/sec Loss 4.4322 LearningRate 0.0119 Epoch: 13 Global Step: 218490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:06,121-Speed 9514.97 samples/sec Loss 4.3921 LearningRate 0.0119 Epoch: 13 Global Step: 218500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:07,236-Speed 9195.92 samples/sec Loss 4.4762 LearningRate 0.0119 Epoch: 13 Global Step: 218510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:08,327-Speed 9388.48 samples/sec Loss 4.5842 LearningRate 0.0119 Epoch: 13 Global Step: 218520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:09,446-Speed 9158.45 samples/sec Loss 4.3899 LearningRate 0.0119 Epoch: 13 Global Step: 218530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:10,531-Speed 9449.31 samples/sec Loss 4.5502 LearningRate 0.0119 Epoch: 13 Global Step: 218540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:11,619-Speed 9412.36 samples/sec Loss 4.4747 LearningRate 0.0119 Epoch: 13 Global Step: 218550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:12,694-Speed 9536.59 samples/sec Loss 4.4248 LearningRate 0.0119 Epoch: 13 Global Step: 218560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:13,774-Speed 9487.51 samples/sec Loss 4.5164 LearningRate 0.0119 Epoch: 13 Global Step: 218570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:14,819-Speed 9801.19 samples/sec Loss 4.5208 LearningRate 0.0119 Epoch: 13 Global Step: 218580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:15,866-Speed 9785.99 samples/sec Loss 4.4608 LearningRate 0.0119 Epoch: 13 Global Step: 218590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:16,955-Speed 9415.17 samples/sec Loss 4.4839 LearningRate 0.0119 Epoch: 13 Global Step: 218600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:18,125-Speed 8754.84 samples/sec Loss 4.4444 LearningRate 0.0119 Epoch: 13 Global Step: 218610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:19,244-Speed 9151.88 samples/sec Loss 4.5240 LearningRate 0.0119 Epoch: 13 Global Step: 218620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:20,331-Speed 9424.33 samples/sec Loss 4.4380 LearningRate 0.0119 Epoch: 13 Global Step: 218630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:21,397-Speed 9613.34 samples/sec Loss 4.4824 LearningRate 0.0119 Epoch: 13 Global Step: 218640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:22,510-Speed 9210.52 samples/sec Loss 4.6109 LearningRate 0.0119 Epoch: 13 Global Step: 218650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:23,647-Speed 9010.17 samples/sec Loss 4.5433 LearningRate 0.0119 Epoch: 13 Global Step: 218660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:24,755-Speed 9248.48 samples/sec Loss 4.5816 LearningRate 0.0119 Epoch: 13 Global Step: 218670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:25,865-Speed 9227.38 samples/sec Loss 4.4346 LearningRate 0.0119 Epoch: 13 Global Step: 218680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:26,950-Speed 9446.11 samples/sec Loss 4.5679 LearningRate 0.0119 Epoch: 13 Global Step: 218690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:28,044-Speed 9369.07 samples/sec Loss 4.4877 LearningRate 0.0119 Epoch: 13 Global Step: 218700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:29,157-Speed 9199.94 samples/sec Loss 4.4800 LearningRate 0.0119 Epoch: 13 Global Step: 218710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:30,236-Speed 9502.62 samples/sec Loss 4.5633 LearningRate 0.0119 Epoch: 13 Global Step: 218720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:31,312-Speed 9521.74 samples/sec Loss 4.4611 LearningRate 0.0119 Epoch: 13 Global Step: 218730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:32,398-Speed 9428.48 samples/sec Loss 4.4636 LearningRate 0.0119 Epoch: 13 Global Step: 218740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:33,452-Speed 9718.18 samples/sec Loss 4.5458 LearningRate 0.0119 Epoch: 13 Global Step: 218750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:34,533-Speed 9480.85 samples/sec Loss 4.5218 LearningRate 0.0119 Epoch: 13 Global Step: 218760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:35,663-Speed 9070.59 samples/sec Loss 4.4592 LearningRate 0.0119 Epoch: 13 Global Step: 218770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:36,771-Speed 9249.90 samples/sec Loss 4.3951 LearningRate 0.0119 Epoch: 13 Global Step: 218780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:37,845-Speed 9535.53 samples/sec Loss 4.5150 LearningRate 0.0119 Epoch: 13 Global Step: 218790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:38,960-Speed 9190.66 samples/sec Loss 4.5045 LearningRate 0.0119 Epoch: 13 Global Step: 218800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:40,040-Speed 9488.91 samples/sec Loss 4.4309 LearningRate 0.0119 Epoch: 13 Global Step: 218810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:41,134-Speed 9361.17 samples/sec Loss 4.5025 LearningRate 0.0119 Epoch: 13 Global Step: 218820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:42,205-Speed 9570.55 samples/sec Loss 4.5395 LearningRate 0.0119 Epoch: 13 Global Step: 218830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:43,303-Speed 9326.59 samples/sec Loss 4.5469 LearningRate 0.0119 Epoch: 13 Global Step: 218840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:44,441-Speed 9005.57 samples/sec Loss 4.5125 LearningRate 0.0119 Epoch: 13 Global Step: 218850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:45,563-Speed 9135.94 samples/sec Loss 4.4241 LearningRate 0.0119 Epoch: 13 Global Step: 218860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:46,642-Speed 9503.08 samples/sec Loss 4.3530 LearningRate 0.0119 Epoch: 13 Global Step: 218870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:47,723-Speed 9476.42 samples/sec Loss 4.5702 LearningRate 0.0119 Epoch: 13 Global Step: 218880 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:22:48,815-Speed 9381.70 samples/sec Loss 4.4725 LearningRate 0.0119 Epoch: 13 Global Step: 218890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:49,965-Speed 8905.38 samples/sec Loss 4.5055 LearningRate 0.0119 Epoch: 13 Global Step: 218900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:22:51,039-Speed 9538.87 samples/sec Loss 4.5019 LearningRate 0.0118 Epoch: 13 Global Step: 218910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:22:52,099-Speed 9668.75 samples/sec Loss 4.4399 LearningRate 0.0118 Epoch: 13 Global Step: 218920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:22:53,217-Speed 9170.60 samples/sec Loss 4.4865 LearningRate 0.0118 Epoch: 13 Global Step: 218930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:22:54,342-Speed 9107.70 samples/sec Loss 4.5703 LearningRate 0.0118 Epoch: 13 Global Step: 218940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:22:55,413-Speed 9562.96 samples/sec Loss 4.5093 LearningRate 0.0118 Epoch: 13 Global Step: 218950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:22:56,485-Speed 9553.70 samples/sec Loss 4.5563 LearningRate 0.0118 Epoch: 13 Global Step: 218960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:22:57,592-Speed 9256.12 samples/sec Loss 4.5941 LearningRate 0.0118 Epoch: 13 Global Step: 218970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:22:58,686-Speed 9363.48 samples/sec Loss 4.4746 LearningRate 0.0118 Epoch: 13 Global Step: 218980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:22:59,776-Speed 9401.31 samples/sec Loss 4.4733 LearningRate 0.0118 Epoch: 13 Global Step: 218990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:00,908-Speed 9055.27 samples/sec Loss 4.4977 LearningRate 0.0118 Epoch: 13 Global Step: 219000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:01,991-Speed 9459.64 samples/sec Loss 4.4912 LearningRate 0.0118 Epoch: 13 Global Step: 219010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:03,077-Speed 9436.07 samples/sec Loss 4.4949 LearningRate 0.0118 Epoch: 13 Global Step: 219020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:04,203-Speed 9098.49 samples/sec Loss 4.5168 LearningRate 0.0118 Epoch: 13 Global Step: 219030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:05,288-Speed 9454.39 samples/sec Loss 4.5234 LearningRate 0.0118 Epoch: 13 Global Step: 219040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:06,377-Speed 9403.89 samples/sec Loss 4.5246 LearningRate 0.0118 Epoch: 13 Global Step: 219050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:07,511-Speed 9036.09 samples/sec Loss 4.5808 LearningRate 0.0118 Epoch: 13 Global Step: 219060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:08,573-Speed 9651.97 samples/sec Loss 4.5264 LearningRate 0.0118 Epoch: 13 Global Step: 219070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:09,668-Speed 9356.90 samples/sec Loss 4.4602 LearningRate 0.0118 Epoch: 13 Global Step: 219080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:10,723-Speed 9711.12 samples/sec Loss 4.5148 LearningRate 0.0118 Epoch: 13 Global Step: 219090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:11,818-Speed 9350.58 samples/sec Loss 4.4778 LearningRate 0.0118 Epoch: 13 Global Step: 219100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:12,913-Speed 9356.39 samples/sec Loss 4.5813 LearningRate 0.0118 Epoch: 13 Global Step: 219110 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:23:14,011-Speed 9334.33 samples/sec Loss 4.5488 LearningRate 0.0118 Epoch: 13 Global Step: 219120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:15,111-Speed 9318.68 samples/sec Loss 4.5327 LearningRate 0.0118 Epoch: 13 Global Step: 219130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:16,239-Speed 9082.61 samples/sec Loss 4.4884 LearningRate 0.0118 Epoch: 13 Global Step: 219140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:17,385-Speed 8937.99 samples/sec Loss 4.6454 LearningRate 0.0118 Epoch: 13 Global Step: 219150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:18,552-Speed 8782.14 samples/sec Loss 4.5499 LearningRate 0.0118 Epoch: 13 Global Step: 219160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:19,646-Speed 9366.52 samples/sec Loss 4.4622 LearningRate 0.0118 Epoch: 13 Global Step: 219170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:20,708-Speed 9643.05 samples/sec Loss 4.5229 LearningRate 0.0118 Epoch: 13 Global Step: 219180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:21,811-Speed 9287.97 samples/sec Loss 4.5661 LearningRate 0.0118 Epoch: 13 Global Step: 219190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:22,911-Speed 9324.48 samples/sec Loss 4.4918 LearningRate 0.0118 Epoch: 13 Global Step: 219200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:23,984-Speed 9548.28 samples/sec Loss 4.5319 LearningRate 0.0118 Epoch: 13 Global Step: 219210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:25,070-Speed 9431.79 samples/sec Loss 4.4495 LearningRate 0.0118 Epoch: 13 Global Step: 219220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:26,144-Speed 9543.25 samples/sec Loss 4.5907 LearningRate 0.0118 Epoch: 13 Global Step: 219230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:27,225-Speed 9484.88 samples/sec Loss 4.5226 LearningRate 0.0118 Epoch: 13 Global Step: 219240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:28,321-Speed 9340.15 samples/sec Loss 4.5681 LearningRate 0.0118 Epoch: 13 Global Step: 219250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:29,465-Speed 8959.62 samples/sec Loss 4.5745 LearningRate 0.0118 Epoch: 13 Global Step: 219260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:30,581-Speed 9183.88 samples/sec Loss 4.4552 LearningRate 0.0118 Epoch: 13 Global Step: 219270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:31,656-Speed 9527.96 samples/sec Loss 4.5546 LearningRate 0.0118 Epoch: 13 Global Step: 219280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:32,741-Speed 9444.38 samples/sec Loss 4.4443 LearningRate 0.0118 Epoch: 13 Global Step: 219290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:23:33,851-Speed 9223.37 samples/sec Loss 4.6109 LearningRate 0.0118 Epoch: 13 Global Step: 219300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:34,991-Speed 8999.60 samples/sec Loss 4.5741 LearningRate 0.0118 Epoch: 13 Global Step: 219310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:36,073-Speed 9470.65 samples/sec Loss 4.6169 LearningRate 0.0118 Epoch: 13 Global Step: 219320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:37,152-Speed 9492.01 samples/sec Loss 4.6597 LearningRate 0.0118 Epoch: 13 Global Step: 219330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:38,291-Speed 9001.47 samples/sec Loss 4.6043 LearningRate 0.0118 Epoch: 13 Global Step: 219340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:39,359-Speed 9592.73 samples/sec Loss 4.5755 LearningRate 0.0118 Epoch: 13 Global Step: 219350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:40,434-Speed 9528.39 samples/sec Loss 4.5720 LearningRate 0.0118 Epoch: 13 Global Step: 219360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:41,472-Speed 9879.69 samples/sec Loss 4.5315 LearningRate 0.0118 Epoch: 13 Global Step: 219370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:42,507-Speed 9894.75 samples/sec Loss 4.5411 LearningRate 0.0118 Epoch: 13 Global Step: 219380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:43,588-Speed 9483.61 samples/sec Loss 4.6150 LearningRate 0.0118 Epoch: 13 Global Step: 219390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:44,704-Speed 9175.31 samples/sec Loss 4.5699 LearningRate 0.0117 Epoch: 13 Global Step: 219400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:45,796-Speed 9389.34 samples/sec Loss 4.5619 LearningRate 0.0117 Epoch: 13 Global Step: 219410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:46,840-Speed 9811.63 samples/sec Loss 4.5440 LearningRate 0.0117 Epoch: 13 Global Step: 219420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:47,964-Speed 9113.25 samples/sec Loss 4.5443 LearningRate 0.0117 Epoch: 13 Global Step: 219430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:49,080-Speed 9186.63 samples/sec Loss 4.5661 LearningRate 0.0117 Epoch: 13 Global Step: 219440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:50,143-Speed 9632.48 samples/sec Loss 4.4725 LearningRate 0.0117 Epoch: 13 Global Step: 219450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:51,202-Speed 9679.91 samples/sec Loss 4.5263 LearningRate 0.0117 Epoch: 13 Global Step: 219460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:52,304-Speed 9298.15 samples/sec Loss 4.5644 LearningRate 0.0117 Epoch: 13 Global Step: 219470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:53,440-Speed 9021.79 samples/sec Loss 4.6131 LearningRate 0.0117 Epoch: 13 Global Step: 219480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:54,535-Speed 9356.65 samples/sec Loss 4.6374 LearningRate 0.0117 Epoch: 13 Global Step: 219490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:55,596-Speed 9655.28 samples/sec Loss 4.5230 LearningRate 0.0117 Epoch: 13 Global Step: 219500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:56,706-Speed 9227.95 samples/sec Loss 4.6438 LearningRate 0.0117 Epoch: 13 Global Step: 219510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:57,849-Speed 8960.45 samples/sec Loss 4.5214 LearningRate 0.0117 Epoch: 13 Global Step: 219520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:23:58,966-Speed 9180.29 samples/sec Loss 4.5779 LearningRate 0.0117 Epoch: 13 Global Step: 219530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:00,070-Speed 9280.42 samples/sec Loss 4.5977 LearningRate 0.0117 Epoch: 13 Global Step: 219540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:01,115-Speed 9808.38 samples/sec Loss 4.5473 LearningRate 0.0117 Epoch: 13 Global Step: 219550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:02,214-Speed 9321.47 samples/sec Loss 4.6411 LearningRate 0.0117 Epoch: 13 Global Step: 219560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:03,288-Speed 9542.31 samples/sec Loss 4.5450 LearningRate 0.0117 Epoch: 13 Global Step: 219570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:04,413-Speed 9107.54 samples/sec Loss 4.5640 LearningRate 0.0117 Epoch: 13 Global Step: 219580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:05,509-Speed 9346.22 samples/sec Loss 4.6020 LearningRate 0.0117 Epoch: 13 Global Step: 219590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:06,590-Speed 9477.17 samples/sec Loss 4.5929 LearningRate 0.0117 Epoch: 13 Global Step: 219600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:07,679-Speed 9410.42 samples/sec Loss 4.6217 LearningRate 0.0117 Epoch: 13 Global Step: 219610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:08,814-Speed 9028.19 samples/sec Loss 4.5888 LearningRate 0.0117 Epoch: 13 Global Step: 219620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:09,915-Speed 9305.25 samples/sec Loss 4.5712 LearningRate 0.0117 Epoch: 13 Global Step: 219630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:10,996-Speed 9478.67 samples/sec Loss 4.5651 LearningRate 0.0117 Epoch: 13 Global Step: 219640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:12,097-Speed 9301.84 samples/sec Loss 4.6129 LearningRate 0.0117 Epoch: 13 Global Step: 219650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:13,195-Speed 9329.13 samples/sec Loss 4.5818 LearningRate 0.0117 Epoch: 13 Global Step: 219660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:14,274-Speed 9502.80 samples/sec Loss 4.6003 LearningRate 0.0117 Epoch: 13 Global Step: 219670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:15,368-Speed 9363.70 samples/sec Loss 4.6203 LearningRate 0.0117 Epoch: 13 Global Step: 219680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:16,461-Speed 9374.42 samples/sec Loss 4.5713 LearningRate 0.0117 Epoch: 13 Global Step: 219690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:17,555-Speed 9370.50 samples/sec Loss 4.6517 LearningRate 0.0117 Epoch: 13 Global Step: 219700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:18,676-Speed 9141.68 samples/sec Loss 4.5913 LearningRate 0.0117 Epoch: 13 Global Step: 219710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:19,744-Speed 9592.21 samples/sec Loss 4.6493 LearningRate 0.0117 Epoch: 13 Global Step: 219720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:20,808-Speed 9630.65 samples/sec Loss 4.6184 LearningRate 0.0117 Epoch: 13 Global Step: 219730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:21,890-Speed 9468.40 samples/sec Loss 4.4847 LearningRate 0.0117 Epoch: 13 Global Step: 219740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:22,988-Speed 9328.78 samples/sec Loss 4.6231 LearningRate 0.0117 Epoch: 13 Global Step: 219750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:24,068-Speed 9495.14 samples/sec Loss 4.6119 LearningRate 0.0117 Epoch: 13 Global Step: 219760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:25,145-Speed 9512.93 samples/sec Loss 4.6335 LearningRate 0.0117 Epoch: 13 Global Step: 219770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:26,226-Speed 9485.41 samples/sec Loss 4.5680 LearningRate 0.0117 Epoch: 13 Global Step: 219780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:27,291-Speed 9618.89 samples/sec Loss 4.5049 LearningRate 0.0117 Epoch: 13 Global Step: 219790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:28,375-Speed 9448.03 samples/sec Loss 4.5928 LearningRate 0.0117 Epoch: 13 Global Step: 219800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:29,450-Speed 9526.91 samples/sec Loss 4.4876 LearningRate 0.0117 Epoch: 13 Global Step: 219810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:30,566-Speed 9184.82 samples/sec Loss 4.5452 LearningRate 0.0117 Epoch: 13 Global Step: 219820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:31,634-Speed 9596.49 samples/sec Loss 4.4803 LearningRate 0.0117 Epoch: 13 Global Step: 219830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:32,740-Speed 9257.53 samples/sec Loss 4.5884 LearningRate 0.0117 Epoch: 13 Global Step: 219840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:33,878-Speed 9004.46 samples/sec Loss 4.6153 LearningRate 0.0117 Epoch: 13 Global Step: 219850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:35,016-Speed 9005.74 samples/sec Loss 4.5410 LearningRate 0.0117 Epoch: 13 Global Step: 219860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:36,120-Speed 9279.38 samples/sec Loss 4.6497 LearningRate 0.0117 Epoch: 13 Global Step: 219870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:24:37,179-Speed 9678.54 samples/sec Loss 4.5616 LearningRate 0.0117 Epoch: 13 Global Step: 219880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:38,275-Speed 9354.42 samples/sec Loss 4.5848 LearningRate 0.0116 Epoch: 13 Global Step: 219890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:39,365-Speed 9399.76 samples/sec Loss 4.5843 LearningRate 0.0116 Epoch: 13 Global Step: 219900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:40,426-Speed 9655.02 samples/sec Loss 4.6322 LearningRate 0.0116 Epoch: 13 Global Step: 219910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:41,500-Speed 9533.57 samples/sec Loss 4.6305 LearningRate 0.0116 Epoch: 13 Global Step: 219920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:42,596-Speed 9355.32 samples/sec Loss 4.5368 LearningRate 0.0116 Epoch: 13 Global Step: 219930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:43,680-Speed 9444.78 samples/sec Loss 4.6321 LearningRate 0.0116 Epoch: 13 Global Step: 219940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:44,746-Speed 9614.65 samples/sec Loss 4.6479 LearningRate 0.0116 Epoch: 13 Global Step: 219950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:45,825-Speed 9498.89 samples/sec Loss 4.5221 LearningRate 0.0116 Epoch: 13 Global Step: 219960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:46,913-Speed 9415.57 samples/sec Loss 4.5598 LearningRate 0.0116 Epoch: 13 Global Step: 219970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:47,999-Speed 9434.33 samples/sec Loss 4.4752 LearningRate 0.0116 Epoch: 13 Global Step: 219980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:49,154-Speed 8877.20 samples/sec Loss 4.4904 LearningRate 0.0116 Epoch: 13 Global Step: 219990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:24:50,262-Speed 9246.77 samples/sec Loss 4.5567 LearningRate 0.0116 Epoch: 13 Global Step: 220000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:25:12,455-[lfw][220000]XNorm: 8.136507 Training: 2022-04-11 20:25:12,455-[lfw][220000]Accuracy-Flip: 0.99567+-0.00238 Training: 2022-04-11 20:25:12,456-[lfw][220000]Accuracy-Highest: 0.99700 Training: 2022-04-11 20:25:38,090-[cfp_fp][220000]XNorm: 6.992759 Training: 2022-04-11 20:25:38,091-[cfp_fp][220000]Accuracy-Flip: 0.96543+-0.00753 Training: 2022-04-11 20:25:38,091-[cfp_fp][220000]Accuracy-Highest: 0.96771 Training: 2022-04-11 20:26:00,229-[agedb_30][220000]XNorm: 7.909859 Training: 2022-04-11 20:26:00,230-[agedb_30][220000]Accuracy-Flip: 0.96850+-0.00996 Training: 2022-04-11 20:26:00,230-[agedb_30][220000]Accuracy-Highest: 0.97033 Training: 2022-04-11 20:26:01,324-Speed 144.10 samples/sec Loss 4.5845 LearningRate 0.0116 Epoch: 13 Global Step: 220010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:02,404-Speed 9489.37 samples/sec Loss 4.6570 LearningRate 0.0116 Epoch: 13 Global Step: 220020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:03,458-Speed 9715.43 samples/sec Loss 4.5880 LearningRate 0.0116 Epoch: 13 Global Step: 220030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:05,390-Speed 5301.83 samples/sec Loss 4.6890 LearningRate 0.0116 Epoch: 13 Global Step: 220040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:06,464-Speed 9540.09 samples/sec Loss 4.6199 LearningRate 0.0116 Epoch: 13 Global Step: 220050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:07,550-Speed 9439.49 samples/sec Loss 4.5780 LearningRate 0.0116 Epoch: 13 Global Step: 220060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:08,693-Speed 8965.37 samples/sec Loss 4.6834 LearningRate 0.0116 Epoch: 13 Global Step: 220070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:09,742-Speed 9766.88 samples/sec Loss 4.5892 LearningRate 0.0116 Epoch: 13 Global Step: 220080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:10,805-Speed 9632.56 samples/sec Loss 4.5998 LearningRate 0.0116 Epoch: 13 Global Step: 220090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:11,931-Speed 9101.90 samples/sec Loss 4.6143 LearningRate 0.0116 Epoch: 13 Global Step: 220100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:13,023-Speed 9384.59 samples/sec Loss 4.5450 LearningRate 0.0116 Epoch: 13 Global Step: 220110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:14,151-Speed 9079.38 samples/sec Loss 4.6633 LearningRate 0.0116 Epoch: 13 Global Step: 220120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:15,192-Speed 9846.85 samples/sec Loss 4.5947 LearningRate 0.0116 Epoch: 13 Global Step: 220130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:16,274-Speed 9470.10 samples/sec Loss 4.6049 LearningRate 0.0116 Epoch: 13 Global Step: 220140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:17,376-Speed 9297.73 samples/sec Loss 4.4895 LearningRate 0.0116 Epoch: 13 Global Step: 220150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:18,438-Speed 9645.45 samples/sec Loss 4.6523 LearningRate 0.0116 Epoch: 13 Global Step: 220160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:19,522-Speed 9456.13 samples/sec Loss 4.5720 LearningRate 0.0116 Epoch: 13 Global Step: 220170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:20,628-Speed 9265.78 samples/sec Loss 4.6209 LearningRate 0.0116 Epoch: 13 Global Step: 220180 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 20:26:21,749-Speed 9135.24 samples/sec Loss 4.5982 LearningRate 0.0116 Epoch: 13 Global Step: 220190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:22,856-Speed 9264.43 samples/sec Loss 4.5177 LearningRate 0.0116 Epoch: 13 Global Step: 220200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:23,946-Speed 9397.59 samples/sec Loss 4.6723 LearningRate 0.0116 Epoch: 13 Global Step: 220210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:25,024-Speed 9504.92 samples/sec Loss 4.5833 LearningRate 0.0116 Epoch: 13 Global Step: 220220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:26,123-Speed 9320.12 samples/sec Loss 4.6549 LearningRate 0.0116 Epoch: 13 Global Step: 220230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:27,267-Speed 8955.34 samples/sec Loss 4.5953 LearningRate 0.0116 Epoch: 13 Global Step: 220240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:28,324-Speed 9693.72 samples/sec Loss 4.5857 LearningRate 0.0116 Epoch: 13 Global Step: 220250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:29,442-Speed 9162.46 samples/sec Loss 4.6305 LearningRate 0.0116 Epoch: 13 Global Step: 220260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:30,490-Speed 9777.20 samples/sec Loss 4.5698 LearningRate 0.0116 Epoch: 13 Global Step: 220270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:31,575-Speed 9437.76 samples/sec Loss 4.6330 LearningRate 0.0116 Epoch: 13 Global Step: 220280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:32,710-Speed 9028.51 samples/sec Loss 4.5985 LearningRate 0.0116 Epoch: 13 Global Step: 220290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:33,804-Speed 9364.39 samples/sec Loss 4.5910 LearningRate 0.0116 Epoch: 13 Global Step: 220300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:34,924-Speed 9154.11 samples/sec Loss 4.5622 LearningRate 0.0116 Epoch: 13 Global Step: 220310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:36,062-Speed 9002.63 samples/sec Loss 4.5766 LearningRate 0.0116 Epoch: 13 Global Step: 220320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:37,151-Speed 9406.67 samples/sec Loss 4.6277 LearningRate 0.0116 Epoch: 13 Global Step: 220330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:38,279-Speed 9085.85 samples/sec Loss 4.5972 LearningRate 0.0116 Epoch: 13 Global Step: 220340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:39,380-Speed 9305.71 samples/sec Loss 4.6686 LearningRate 0.0116 Epoch: 13 Global Step: 220350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:40,484-Speed 9281.82 samples/sec Loss 4.6949 LearningRate 0.0116 Epoch: 13 Global Step: 220360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:41,572-Speed 9414.08 samples/sec Loss 4.5316 LearningRate 0.0116 Epoch: 13 Global Step: 220370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:42,631-Speed 9678.36 samples/sec Loss 4.6007 LearningRate 0.0115 Epoch: 13 Global Step: 220380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:43,762-Speed 9062.51 samples/sec Loss 4.6464 LearningRate 0.0115 Epoch: 13 Global Step: 220390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:44,853-Speed 9383.51 samples/sec Loss 4.6200 LearningRate 0.0115 Epoch: 13 Global Step: 220400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:45,933-Speed 9489.74 samples/sec Loss 4.6339 LearningRate 0.0115 Epoch: 13 Global Step: 220410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:47,038-Speed 9272.17 samples/sec Loss 4.7108 LearningRate 0.0115 Epoch: 13 Global Step: 220420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:48,137-Speed 9329.75 samples/sec Loss 4.6476 LearningRate 0.0115 Epoch: 13 Global Step: 220430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:49,221-Speed 9446.87 samples/sec Loss 4.5987 LearningRate 0.0115 Epoch: 13 Global Step: 220440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:50,305-Speed 9452.08 samples/sec Loss 4.5986 LearningRate 0.0115 Epoch: 13 Global Step: 220450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:26:51,362-Speed 9697.07 samples/sec Loss 4.5404 LearningRate 0.0115 Epoch: 13 Global Step: 220460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:52,513-Speed 8901.14 samples/sec Loss 4.8258 LearningRate 0.0115 Epoch: 13 Global Step: 220470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:53,635-Speed 9149.32 samples/sec Loss 4.5757 LearningRate 0.0115 Epoch: 13 Global Step: 220480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:54,737-Speed 9301.86 samples/sec Loss 4.5109 LearningRate 0.0115 Epoch: 13 Global Step: 220490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:55,821-Speed 9446.49 samples/sec Loss 4.6039 LearningRate 0.0115 Epoch: 13 Global Step: 220500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:56,914-Speed 9382.40 samples/sec Loss 4.7096 LearningRate 0.0115 Epoch: 13 Global Step: 220510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:58,002-Speed 9410.43 samples/sec Loss 4.6403 LearningRate 0.0115 Epoch: 13 Global Step: 220520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:26:59,045-Speed 9826.40 samples/sec Loss 4.6317 LearningRate 0.0115 Epoch: 13 Global Step: 220530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:00,110-Speed 9621.31 samples/sec Loss 4.5648 LearningRate 0.0115 Epoch: 13 Global Step: 220540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:01,248-Speed 9005.29 samples/sec Loss 4.5985 LearningRate 0.0115 Epoch: 13 Global Step: 220550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:02,392-Speed 8957.50 samples/sec Loss 4.6597 LearningRate 0.0115 Epoch: 13 Global Step: 220560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:03,464-Speed 9550.34 samples/sec Loss 4.5754 LearningRate 0.0115 Epoch: 13 Global Step: 220570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:04,556-Speed 9383.43 samples/sec Loss 4.6107 LearningRate 0.0115 Epoch: 13 Global Step: 220580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:05,691-Speed 9032.51 samples/sec Loss 4.6012 LearningRate 0.0115 Epoch: 13 Global Step: 220590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:06,779-Speed 9415.20 samples/sec Loss 4.6499 LearningRate 0.0115 Epoch: 13 Global Step: 220600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:07,923-Speed 8962.51 samples/sec Loss 4.5854 LearningRate 0.0115 Epoch: 13 Global Step: 220610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:09,010-Speed 9423.75 samples/sec Loss 4.6265 LearningRate 0.0115 Epoch: 13 Global Step: 220620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:10,127-Speed 9172.79 samples/sec Loss 4.5983 LearningRate 0.0115 Epoch: 13 Global Step: 220630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:11,254-Speed 9085.48 samples/sec Loss 4.6492 LearningRate 0.0115 Epoch: 13 Global Step: 220640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:12,355-Speed 9310.40 samples/sec Loss 4.4900 LearningRate 0.0115 Epoch: 13 Global Step: 220650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:13,456-Speed 9304.76 samples/sec Loss 4.6833 LearningRate 0.0115 Epoch: 13 Global Step: 220660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:14,562-Speed 9265.96 samples/sec Loss 4.6485 LearningRate 0.0115 Epoch: 13 Global Step: 220670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:15,648-Speed 9428.74 samples/sec Loss 4.6441 LearningRate 0.0115 Epoch: 13 Global Step: 220680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:16,731-Speed 9473.06 samples/sec Loss 4.6246 LearningRate 0.0115 Epoch: 13 Global Step: 220690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:17,797-Speed 9607.66 samples/sec Loss 4.6962 LearningRate 0.0115 Epoch: 13 Global Step: 220700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:18,916-Speed 9153.26 samples/sec Loss 4.6237 LearningRate 0.0115 Epoch: 13 Global Step: 220710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:20,029-Speed 9211.42 samples/sec Loss 4.6498 LearningRate 0.0115 Epoch: 13 Global Step: 220720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:21,112-Speed 9457.91 samples/sec Loss 4.5979 LearningRate 0.0115 Epoch: 13 Global Step: 220730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:22,221-Speed 9239.77 samples/sec Loss 4.6456 LearningRate 0.0115 Epoch: 13 Global Step: 220740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:23,303-Speed 9467.89 samples/sec Loss 4.5894 LearningRate 0.0115 Epoch: 13 Global Step: 220750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 20:27:24,416-Speed 9209.92 samples/sec Loss 4.6002 LearningRate 0.0115 Epoch: 13 Global Step: 220760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 20:27:25,513-Speed 9335.30 samples/sec Loss 4.7069 LearningRate 0.0115 Epoch: 13 Global Step: 220770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 20:27:26,576-Speed 9644.43 samples/sec Loss 4.5723 LearningRate 0.0115 Epoch: 13 Global Step: 220780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 20:27:27,709-Speed 9043.81 samples/sec Loss 4.5585 LearningRate 0.0115 Epoch: 13 Global Step: 220790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 20:27:28,834-Speed 9106.52 samples/sec Loss 4.6151 LearningRate 0.0115 Epoch: 13 Global Step: 220800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 20:27:29,960-Speed 9097.48 samples/sec Loss 4.5582 LearningRate 0.0115 Epoch: 13 Global Step: 220810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 20:27:31,088-Speed 9080.11 samples/sec Loss 4.5688 LearningRate 0.0115 Epoch: 13 Global Step: 220820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 20:27:32,138-Speed 9757.40 samples/sec Loss 4.5754 LearningRate 0.0115 Epoch: 13 Global Step: 220830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 20:27:33,263-Speed 9110.40 samples/sec Loss 4.6683 LearningRate 0.0115 Epoch: 13 Global Step: 220840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 20:27:34,355-Speed 9379.59 samples/sec Loss 4.7314 LearningRate 0.0115 Epoch: 13 Global Step: 220850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:35,461-Speed 9275.45 samples/sec Loss 4.5030 LearningRate 0.0115 Epoch: 13 Global Step: 220860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:36,586-Speed 9106.72 samples/sec Loss 4.6712 LearningRate 0.0114 Epoch: 13 Global Step: 220870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:37,663-Speed 9517.98 samples/sec Loss 4.5258 LearningRate 0.0114 Epoch: 13 Global Step: 220880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:38,786-Speed 9119.19 samples/sec Loss 4.6007 LearningRate 0.0114 Epoch: 13 Global Step: 220890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:39,854-Speed 9599.39 samples/sec Loss 4.6183 LearningRate 0.0114 Epoch: 13 Global Step: 220900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:40,917-Speed 9637.59 samples/sec Loss 4.6864 LearningRate 0.0114 Epoch: 13 Global Step: 220910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:42,031-Speed 9190.51 samples/sec Loss 4.6515 LearningRate 0.0114 Epoch: 13 Global Step: 220920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:43,122-Speed 9395.19 samples/sec Loss 4.5930 LearningRate 0.0114 Epoch: 13 Global Step: 220930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:44,261-Speed 8992.58 samples/sec Loss 4.6678 LearningRate 0.0114 Epoch: 13 Global Step: 220940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:45,374-Speed 9209.45 samples/sec Loss 4.6284 LearningRate 0.0114 Epoch: 13 Global Step: 220950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:46,438-Speed 9629.86 samples/sec Loss 4.7051 LearningRate 0.0114 Epoch: 13 Global Step: 220960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:47,526-Speed 9414.76 samples/sec Loss 4.6432 LearningRate 0.0114 Epoch: 13 Global Step: 220970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:48,646-Speed 9150.65 samples/sec Loss 4.5324 LearningRate 0.0114 Epoch: 13 Global Step: 220980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:49,727-Speed 9472.76 samples/sec Loss 4.6827 LearningRate 0.0114 Epoch: 13 Global Step: 220990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:50,828-Speed 9312.13 samples/sec Loss 4.6046 LearningRate 0.0114 Epoch: 13 Global Step: 221000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:51,916-Speed 9414.29 samples/sec Loss 4.5940 LearningRate 0.0114 Epoch: 13 Global Step: 221010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:53,017-Speed 9317.11 samples/sec Loss 4.6575 LearningRate 0.0114 Epoch: 13 Global Step: 221020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:54,104-Speed 9426.98 samples/sec Loss 4.6290 LearningRate 0.0114 Epoch: 13 Global Step: 221030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:55,206-Speed 9301.54 samples/sec Loss 4.6861 LearningRate 0.0114 Epoch: 13 Global Step: 221040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:56,302-Speed 9343.45 samples/sec Loss 4.5839 LearningRate 0.0114 Epoch: 13 Global Step: 221050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:27:57,361-Speed 9675.18 samples/sec Loss 4.6731 LearningRate 0.0114 Epoch: 13 Global Step: 221060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:58,470-Speed 9239.96 samples/sec Loss 4.5908 LearningRate 0.0114 Epoch: 13 Global Step: 221070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:27:59,536-Speed 9608.58 samples/sec Loss 4.6781 LearningRate 0.0114 Epoch: 13 Global Step: 221080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:00,599-Speed 9638.02 samples/sec Loss 4.7107 LearningRate 0.0114 Epoch: 13 Global Step: 221090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:01,698-Speed 9326.10 samples/sec Loss 4.6335 LearningRate 0.0114 Epoch: 13 Global Step: 221100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:02,869-Speed 8748.76 samples/sec Loss 4.7191 LearningRate 0.0114 Epoch: 13 Global Step: 221110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:04,015-Speed 8941.00 samples/sec Loss 4.6332 LearningRate 0.0114 Epoch: 13 Global Step: 221120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:05,108-Speed 9377.92 samples/sec Loss 4.6438 LearningRate 0.0114 Epoch: 13 Global Step: 221130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:06,207-Speed 9323.89 samples/sec Loss 4.6544 LearningRate 0.0114 Epoch: 13 Global Step: 221140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:07,293-Speed 9427.41 samples/sec Loss 4.6714 LearningRate 0.0114 Epoch: 13 Global Step: 221150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:08,396-Speed 9294.95 samples/sec Loss 4.6611 LearningRate 0.0114 Epoch: 13 Global Step: 221160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:09,500-Speed 9277.32 samples/sec Loss 4.6123 LearningRate 0.0114 Epoch: 13 Global Step: 221170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:10,555-Speed 9713.84 samples/sec Loss 4.6723 LearningRate 0.0114 Epoch: 13 Global Step: 221180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:11,667-Speed 9213.33 samples/sec Loss 4.6745 LearningRate 0.0114 Epoch: 13 Global Step: 221190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:12,744-Speed 9513.63 samples/sec Loss 4.5585 LearningRate 0.0114 Epoch: 13 Global Step: 221200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:13,874-Speed 9072.28 samples/sec Loss 4.6544 LearningRate 0.0114 Epoch: 13 Global Step: 221210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:15,005-Speed 9058.88 samples/sec Loss 4.6537 LearningRate 0.0114 Epoch: 13 Global Step: 221220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:16,079-Speed 9534.39 samples/sec Loss 4.6599 LearningRate 0.0114 Epoch: 13 Global Step: 221230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:17,129-Speed 9769.18 samples/sec Loss 4.5573 LearningRate 0.0114 Epoch: 13 Global Step: 221240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:18,252-Speed 9119.51 samples/sec Loss 4.6241 LearningRate 0.0114 Epoch: 13 Global Step: 221250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:19,355-Speed 9287.34 samples/sec Loss 4.6645 LearningRate 0.0114 Epoch: 13 Global Step: 221260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:20,426-Speed 9564.40 samples/sec Loss 4.7039 LearningRate 0.0114 Epoch: 13 Global Step: 221270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:21,530-Speed 9287.92 samples/sec Loss 4.7186 LearningRate 0.0114 Epoch: 13 Global Step: 221280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:22,629-Speed 9319.80 samples/sec Loss 4.6755 LearningRate 0.0114 Epoch: 13 Global Step: 221290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:23,817-Speed 8628.79 samples/sec Loss 4.7429 LearningRate 0.0114 Epoch: 13 Global Step: 221300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:24,898-Speed 9473.85 samples/sec Loss 4.6348 LearningRate 0.0114 Epoch: 13 Global Step: 221310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:26,017-Speed 9157.43 samples/sec Loss 4.6182 LearningRate 0.0114 Epoch: 13 Global Step: 221320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:27,086-Speed 9579.40 samples/sec Loss 4.6110 LearningRate 0.0114 Epoch: 13 Global Step: 221330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:28,205-Speed 9162.97 samples/sec Loss 4.6084 LearningRate 0.0114 Epoch: 13 Global Step: 221340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:29,372-Speed 8776.79 samples/sec Loss 4.6194 LearningRate 0.0114 Epoch: 13 Global Step: 221350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:30,463-Speed 9394.10 samples/sec Loss 4.6786 LearningRate 0.0113 Epoch: 13 Global Step: 221360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:31,549-Speed 9436.18 samples/sec Loss 4.7142 LearningRate 0.0113 Epoch: 13 Global Step: 221370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:32,725-Speed 8713.04 samples/sec Loss 4.6216 LearningRate 0.0113 Epoch: 13 Global Step: 221380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:33,804-Speed 9496.27 samples/sec Loss 4.6669 LearningRate 0.0113 Epoch: 13 Global Step: 221390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:34,889-Speed 9460.86 samples/sec Loss 4.6750 LearningRate 0.0113 Epoch: 13 Global Step: 221400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:35,961-Speed 9554.18 samples/sec Loss 4.5601 LearningRate 0.0113 Epoch: 13 Global Step: 221410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:37,007-Speed 9794.77 samples/sec Loss 4.6553 LearningRate 0.0113 Epoch: 13 Global Step: 221420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:38,087-Speed 9485.39 samples/sec Loss 4.6558 LearningRate 0.0113 Epoch: 13 Global Step: 221430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:39,193-Speed 9266.49 samples/sec Loss 4.6848 LearningRate 0.0113 Epoch: 13 Global Step: 221440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:40,339-Speed 8941.92 samples/sec Loss 4.6388 LearningRate 0.0113 Epoch: 13 Global Step: 221450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:41,415-Speed 9527.32 samples/sec Loss 4.7367 LearningRate 0.0113 Epoch: 13 Global Step: 221460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:42,539-Speed 9110.91 samples/sec Loss 4.6359 LearningRate 0.0113 Epoch: 13 Global Step: 221470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:43,696-Speed 8855.67 samples/sec Loss 4.6439 LearningRate 0.0113 Epoch: 13 Global Step: 221480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:44,770-Speed 9546.54 samples/sec Loss 4.6566 LearningRate 0.0113 Epoch: 13 Global Step: 221490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:45,870-Speed 9309.93 samples/sec Loss 4.6571 LearningRate 0.0113 Epoch: 13 Global Step: 221500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:46,954-Speed 9449.30 samples/sec Loss 4.5637 LearningRate 0.0113 Epoch: 13 Global Step: 221510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:48,059-Speed 9278.45 samples/sec Loss 4.6189 LearningRate 0.0113 Epoch: 13 Global Step: 221520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:49,146-Speed 9424.84 samples/sec Loss 4.6171 LearningRate 0.0113 Epoch: 13 Global Step: 221530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:50,228-Speed 9473.79 samples/sec Loss 4.6731 LearningRate 0.0113 Epoch: 13 Global Step: 221540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:28:51,327-Speed 9330.81 samples/sec Loss 4.6765 LearningRate 0.0113 Epoch: 13 Global Step: 221550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:52,422-Speed 9352.99 samples/sec Loss 4.6470 LearningRate 0.0113 Epoch: 13 Global Step: 221560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:53,521-Speed 9327.39 samples/sec Loss 4.6327 LearningRate 0.0113 Epoch: 13 Global Step: 221570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:54,601-Speed 9486.15 samples/sec Loss 4.6671 LearningRate 0.0113 Epoch: 13 Global Step: 221580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:55,724-Speed 9122.12 samples/sec Loss 4.5925 LearningRate 0.0113 Epoch: 13 Global Step: 221590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:56,830-Speed 9260.32 samples/sec Loss 4.7398 LearningRate 0.0113 Epoch: 13 Global Step: 221600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:57,908-Speed 9515.12 samples/sec Loss 4.6945 LearningRate 0.0113 Epoch: 13 Global Step: 221610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:28:59,000-Speed 9385.98 samples/sec Loss 4.6074 LearningRate 0.0113 Epoch: 13 Global Step: 221620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:00,098-Speed 9328.47 samples/sec Loss 4.7857 LearningRate 0.0113 Epoch: 13 Global Step: 221630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:01,155-Speed 9690.66 samples/sec Loss 4.6613 LearningRate 0.0113 Epoch: 13 Global Step: 221640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:02,246-Speed 9395.46 samples/sec Loss 4.6479 LearningRate 0.0113 Epoch: 13 Global Step: 221650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:03,349-Speed 9284.07 samples/sec Loss 4.6665 LearningRate 0.0113 Epoch: 13 Global Step: 221660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:04,426-Speed 9514.22 samples/sec Loss 4.6605 LearningRate 0.0113 Epoch: 13 Global Step: 221670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:05,475-Speed 9770.08 samples/sec Loss 4.6518 LearningRate 0.0113 Epoch: 13 Global Step: 221680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:06,549-Speed 9541.03 samples/sec Loss 4.5916 LearningRate 0.0113 Epoch: 13 Global Step: 221690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:07,658-Speed 9232.66 samples/sec Loss 4.6379 LearningRate 0.0113 Epoch: 13 Global Step: 221700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:08,733-Speed 9535.30 samples/sec Loss 4.6717 LearningRate 0.0113 Epoch: 13 Global Step: 221710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:09,795-Speed 9641.58 samples/sec Loss 4.6374 LearningRate 0.0113 Epoch: 13 Global Step: 221720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:10,882-Speed 9428.70 samples/sec Loss 4.6947 LearningRate 0.0113 Epoch: 13 Global Step: 221730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:11,993-Speed 9224.35 samples/sec Loss 4.6669 LearningRate 0.0113 Epoch: 13 Global Step: 221740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:13,092-Speed 9329.55 samples/sec Loss 4.5835 LearningRate 0.0113 Epoch: 13 Global Step: 221750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:14,219-Speed 9087.40 samples/sec Loss 4.6194 LearningRate 0.0113 Epoch: 13 Global Step: 221760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:15,327-Speed 9250.97 samples/sec Loss 4.6592 LearningRate 0.0113 Epoch: 13 Global Step: 221770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:16,426-Speed 9320.87 samples/sec Loss 4.6014 LearningRate 0.0113 Epoch: 13 Global Step: 221780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:17,552-Speed 9100.25 samples/sec Loss 4.5678 LearningRate 0.0113 Epoch: 13 Global Step: 221790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:18,659-Speed 9254.96 samples/sec Loss 4.5870 LearningRate 0.0113 Epoch: 13 Global Step: 221800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:19,745-Speed 9438.53 samples/sec Loss 4.5876 LearningRate 0.0113 Epoch: 13 Global Step: 221810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:20,841-Speed 9342.16 samples/sec Loss 4.7051 LearningRate 0.0113 Epoch: 13 Global Step: 221820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:21,918-Speed 9523.31 samples/sec Loss 4.6362 LearningRate 0.0113 Epoch: 13 Global Step: 221830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:23,013-Speed 9350.57 samples/sec Loss 4.6646 LearningRate 0.0113 Epoch: 13 Global Step: 221840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:24,145-Speed 9056.81 samples/sec Loss 4.6167 LearningRate 0.0113 Epoch: 13 Global Step: 221850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:25,209-Speed 9628.17 samples/sec Loss 4.7234 LearningRate 0.0112 Epoch: 13 Global Step: 221860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:26,304-Speed 9351.68 samples/sec Loss 4.6367 LearningRate 0.0112 Epoch: 13 Global Step: 221870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:27,369-Speed 9625.03 samples/sec Loss 4.7513 LearningRate 0.0112 Epoch: 13 Global Step: 221880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:28,488-Speed 9160.31 samples/sec Loss 4.7321 LearningRate 0.0112 Epoch: 13 Global Step: 221890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:29,653-Speed 8793.16 samples/sec Loss 4.6876 LearningRate 0.0112 Epoch: 13 Global Step: 221900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:30,732-Speed 9493.44 samples/sec Loss 4.7924 LearningRate 0.0112 Epoch: 13 Global Step: 221910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:31,794-Speed 9650.59 samples/sec Loss 4.6301 LearningRate 0.0112 Epoch: 13 Global Step: 221920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:32,916-Speed 9135.23 samples/sec Loss 4.7504 LearningRate 0.0112 Epoch: 13 Global Step: 221930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:34,061-Speed 8949.26 samples/sec Loss 4.6103 LearningRate 0.0112 Epoch: 13 Global Step: 221940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:35,191-Speed 9059.98 samples/sec Loss 4.7229 LearningRate 0.0112 Epoch: 13 Global Step: 221950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:29:36,260-Speed 9591.63 samples/sec Loss 4.6626 LearningRate 0.0112 Epoch: 13 Global Step: 221960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:37,339-Speed 9492.56 samples/sec Loss 4.6798 LearningRate 0.0112 Epoch: 13 Global Step: 221970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:38,423-Speed 9450.84 samples/sec Loss 4.7227 LearningRate 0.0112 Epoch: 13 Global Step: 221980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:39,482-Speed 9676.25 samples/sec Loss 4.7522 LearningRate 0.0112 Epoch: 13 Global Step: 221990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:29:40,586-Speed 9285.05 samples/sec Loss 4.6769 LearningRate 0.0112 Epoch: 13 Global Step: 222000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:30:02,521-[lfw][222000]XNorm: 7.941319 Training: 2022-04-11 20:30:02,522-[lfw][222000]Accuracy-Flip: 0.99583+-0.00291 Training: 2022-04-11 20:30:02,522-[lfw][222000]Accuracy-Highest: 0.99700 Training: 2022-04-11 20:30:27,847-[cfp_fp][222000]XNorm: 6.859523 Training: 2022-04-11 20:30:27,848-[cfp_fp][222000]Accuracy-Flip: 0.96757+-0.00864 Training: 2022-04-11 20:30:27,848-[cfp_fp][222000]Accuracy-Highest: 0.96771 Training: 2022-04-11 20:30:49,707-[agedb_30][222000]XNorm: 7.695806 Training: 2022-04-11 20:30:49,707-[agedb_30][222000]Accuracy-Flip: 0.96867+-0.00951 Training: 2022-04-11 20:30:49,707-[agedb_30][222000]Accuracy-Highest: 0.97033 Training: 2022-04-11 20:30:50,799-Speed 145.84 samples/sec Loss 4.6504 LearningRate 0.0112 Epoch: 13 Global Step: 222010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:30:51,862-Speed 9644.51 samples/sec Loss 4.6524 LearningRate 0.0112 Epoch: 13 Global Step: 222020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:30:52,963-Speed 9308.25 samples/sec Loss 4.6034 LearningRate 0.0112 Epoch: 13 Global Step: 222030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:30:54,076-Speed 9203.49 samples/sec Loss 4.5794 LearningRate 0.0112 Epoch: 13 Global Step: 222040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:30:55,149-Speed 9552.49 samples/sec Loss 4.7122 LearningRate 0.0112 Epoch: 13 Global Step: 222050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:30:56,270-Speed 9142.74 samples/sec Loss 4.7014 LearningRate 0.0112 Epoch: 13 Global Step: 222060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:30:57,423-Speed 8884.84 samples/sec Loss 4.7026 LearningRate 0.0112 Epoch: 13 Global Step: 222070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:30:58,497-Speed 9544.53 samples/sec Loss 4.5587 LearningRate 0.0112 Epoch: 13 Global Step: 222080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:30:59,587-Speed 9395.93 samples/sec Loss 4.6813 LearningRate 0.0112 Epoch: 13 Global Step: 222090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:00,696-Speed 9243.55 samples/sec Loss 4.6275 LearningRate 0.0112 Epoch: 13 Global Step: 222100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:01,778-Speed 9468.93 samples/sec Loss 4.7020 LearningRate 0.0112 Epoch: 13 Global Step: 222110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:02,887-Speed 9238.40 samples/sec Loss 4.7710 LearningRate 0.0112 Epoch: 13 Global Step: 222120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:04,016-Speed 9078.64 samples/sec Loss 4.7219 LearningRate 0.0112 Epoch: 13 Global Step: 222130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:05,137-Speed 9132.76 samples/sec Loss 4.6930 LearningRate 0.0112 Epoch: 13 Global Step: 222140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:06,247-Speed 9230.89 samples/sec Loss 4.6522 LearningRate 0.0112 Epoch: 13 Global Step: 222150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:07,333-Speed 9438.43 samples/sec Loss 4.7081 LearningRate 0.0112 Epoch: 13 Global Step: 222160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:08,448-Speed 9186.82 samples/sec Loss 4.7262 LearningRate 0.0112 Epoch: 13 Global Step: 222170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:09,547-Speed 9324.87 samples/sec Loss 4.6243 LearningRate 0.0112 Epoch: 13 Global Step: 222180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:10,626-Speed 9497.83 samples/sec Loss 4.6587 LearningRate 0.0112 Epoch: 13 Global Step: 222190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:11,684-Speed 9679.70 samples/sec Loss 4.6319 LearningRate 0.0112 Epoch: 13 Global Step: 222200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:12,772-Speed 9415.97 samples/sec Loss 4.6208 LearningRate 0.0112 Epoch: 13 Global Step: 222210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:14,176-Speed 7299.25 samples/sec Loss 4.6701 LearningRate 0.0112 Epoch: 13 Global Step: 222220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:15,255-Speed 9495.34 samples/sec Loss 4.6144 LearningRate 0.0112 Epoch: 13 Global Step: 222230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:16,393-Speed 9005.67 samples/sec Loss 4.6273 LearningRate 0.0112 Epoch: 13 Global Step: 222240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 20:31:17,483-Speed 9396.29 samples/sec Loss 4.6470 LearningRate 0.0112 Epoch: 13 Global Step: 222250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:18,618-Speed 9030.82 samples/sec Loss 4.6804 LearningRate 0.0112 Epoch: 13 Global Step: 222260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:19,665-Speed 9785.97 samples/sec Loss 4.6268 LearningRate 0.0112 Epoch: 13 Global Step: 222270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:20,780-Speed 9191.24 samples/sec Loss 4.6004 LearningRate 0.0112 Epoch: 13 Global Step: 222280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:21,913-Speed 9039.59 samples/sec Loss 4.6631 LearningRate 0.0112 Epoch: 13 Global Step: 222290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:23,007-Speed 9370.26 samples/sec Loss 4.7193 LearningRate 0.0112 Epoch: 13 Global Step: 222300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:24,076-Speed 9581.01 samples/sec Loss 4.6623 LearningRate 0.0112 Epoch: 13 Global Step: 222310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:25,176-Speed 9317.15 samples/sec Loss 4.7275 LearningRate 0.0112 Epoch: 13 Global Step: 222320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:26,268-Speed 9374.75 samples/sec Loss 4.6121 LearningRate 0.0112 Epoch: 13 Global Step: 222330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:27,338-Speed 9582.91 samples/sec Loss 4.5667 LearningRate 0.0112 Epoch: 13 Global Step: 222340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:28,412-Speed 9541.55 samples/sec Loss 4.6689 LearningRate 0.0112 Epoch: 13 Global Step: 222350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:29,490-Speed 9501.11 samples/sec Loss 4.7737 LearningRate 0.0111 Epoch: 13 Global Step: 222360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 20:31:30,563-Speed 9547.89 samples/sec Loss 4.5572 LearningRate 0.0111 Epoch: 13 Global Step: 222370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:31,647-Speed 9454.90 samples/sec Loss 4.7353 LearningRate 0.0111 Epoch: 13 Global Step: 222380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:32,715-Speed 9597.97 samples/sec Loss 4.7152 LearningRate 0.0111 Epoch: 13 Global Step: 222390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:33,765-Speed 9754.25 samples/sec Loss 4.6727 LearningRate 0.0111 Epoch: 13 Global Step: 222400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:34,844-Speed 9495.88 samples/sec Loss 4.6569 LearningRate 0.0111 Epoch: 13 Global Step: 222410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:35,932-Speed 9416.32 samples/sec Loss 4.6965 LearningRate 0.0111 Epoch: 13 Global Step: 222420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:37,047-Speed 9192.74 samples/sec Loss 4.7600 LearningRate 0.0111 Epoch: 13 Global Step: 222430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:38,152-Speed 9272.13 samples/sec Loss 4.6529 LearningRate 0.0111 Epoch: 13 Global Step: 222440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:39,255-Speed 9296.84 samples/sec Loss 4.7369 LearningRate 0.0111 Epoch: 13 Global Step: 222450 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:31:40,330-Speed 9529.05 samples/sec Loss 4.7891 LearningRate 0.0111 Epoch: 13 Global Step: 222460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:41,398-Speed 9594.66 samples/sec Loss 4.8462 LearningRate 0.0111 Epoch: 13 Global Step: 222470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:31:42,508-Speed 9229.19 samples/sec Loss 4.6565 LearningRate 0.0111 Epoch: 13 Global Step: 222480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:31:43,637-Speed 9074.38 samples/sec Loss 4.7906 LearningRate 0.0111 Epoch: 13 Global Step: 222490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:31:44,704-Speed 9604.05 samples/sec Loss 4.6987 LearningRate 0.0111 Epoch: 13 Global Step: 222500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:31:45,781-Speed 9509.81 samples/sec Loss 4.6343 LearningRate 0.0111 Epoch: 13 Global Step: 222510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:31:46,890-Speed 9236.93 samples/sec Loss 4.7195 LearningRate 0.0111 Epoch: 13 Global Step: 222520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:31:47,987-Speed 9342.23 samples/sec Loss 4.6947 LearningRate 0.0111 Epoch: 13 Global Step: 222530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:31:49,080-Speed 9375.79 samples/sec Loss 4.6033 LearningRate 0.0111 Epoch: 13 Global Step: 222540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:31:50,173-Speed 9372.99 samples/sec Loss 4.6444 LearningRate 0.0111 Epoch: 13 Global Step: 222550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:31:51,306-Speed 9049.47 samples/sec Loss 4.7493 LearningRate 0.0111 Epoch: 13 Global Step: 222560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:31:52,388-Speed 9474.30 samples/sec Loss 4.7128 LearningRate 0.0111 Epoch: 13 Global Step: 222570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:53,492-Speed 9279.07 samples/sec Loss 4.6645 LearningRate 0.0111 Epoch: 13 Global Step: 222580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:54,660-Speed 8772.28 samples/sec Loss 4.6817 LearningRate 0.0111 Epoch: 13 Global Step: 222590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:55,732-Speed 9556.97 samples/sec Loss 4.6655 LearningRate 0.0111 Epoch: 13 Global Step: 222600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:56,775-Speed 9824.31 samples/sec Loss 4.6720 LearningRate 0.0111 Epoch: 13 Global Step: 222610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:57,853-Speed 9503.11 samples/sec Loss 4.6786 LearningRate 0.0111 Epoch: 13 Global Step: 222620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:31:58,928-Speed 9527.71 samples/sec Loss 4.7278 LearningRate 0.0111 Epoch: 13 Global Step: 222630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:00,024-Speed 9350.26 samples/sec Loss 4.6999 LearningRate 0.0111 Epoch: 13 Global Step: 222640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:01,134-Speed 9234.16 samples/sec Loss 4.7140 LearningRate 0.0111 Epoch: 13 Global Step: 222650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:02,230-Speed 9350.83 samples/sec Loss 4.8009 LearningRate 0.0111 Epoch: 13 Global Step: 222660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:03,333-Speed 9287.36 samples/sec Loss 4.7656 LearningRate 0.0111 Epoch: 13 Global Step: 222670 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:32:04,395-Speed 9649.02 samples/sec Loss 4.6879 LearningRate 0.0111 Epoch: 13 Global Step: 222680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:05,487-Speed 9382.32 samples/sec Loss 4.7091 LearningRate 0.0111 Epoch: 13 Global Step: 222690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:06,586-Speed 9327.13 samples/sec Loss 4.7695 LearningRate 0.0111 Epoch: 13 Global Step: 222700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:07,658-Speed 9558.49 samples/sec Loss 4.8514 LearningRate 0.0111 Epoch: 13 Global Step: 222710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:08,777-Speed 9155.85 samples/sec Loss 4.7602 LearningRate 0.0111 Epoch: 13 Global Step: 222720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:09,878-Speed 9306.84 samples/sec Loss 4.6883 LearningRate 0.0111 Epoch: 13 Global Step: 222730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:10,950-Speed 9556.53 samples/sec Loss 4.7315 LearningRate 0.0111 Epoch: 13 Global Step: 222740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:12,048-Speed 9328.95 samples/sec Loss 4.6575 LearningRate 0.0111 Epoch: 13 Global Step: 222750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:13,161-Speed 9207.36 samples/sec Loss 4.6914 LearningRate 0.0111 Epoch: 13 Global Step: 222760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:14,278-Speed 9176.98 samples/sec Loss 4.7405 LearningRate 0.0111 Epoch: 13 Global Step: 222770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:15,320-Speed 9825.31 samples/sec Loss 4.6517 LearningRate 0.0111 Epoch: 13 Global Step: 222780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:16,403-Speed 9467.76 samples/sec Loss 4.7087 LearningRate 0.0111 Epoch: 13 Global Step: 222790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:17,471-Speed 9593.40 samples/sec Loss 4.6491 LearningRate 0.0111 Epoch: 13 Global Step: 222800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:18,613-Speed 8972.53 samples/sec Loss 4.7812 LearningRate 0.0111 Epoch: 13 Global Step: 222810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:19,714-Speed 9305.60 samples/sec Loss 4.6604 LearningRate 0.0111 Epoch: 13 Global Step: 222820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:20,850-Speed 9018.50 samples/sec Loss 4.6273 LearningRate 0.0111 Epoch: 13 Global Step: 222830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:21,947-Speed 9337.08 samples/sec Loss 4.6891 LearningRate 0.0111 Epoch: 13 Global Step: 222840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:23,053-Speed 9264.76 samples/sec Loss 4.6126 LearningRate 0.0111 Epoch: 13 Global Step: 222850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:24,114-Speed 9663.44 samples/sec Loss 4.7214 LearningRate 0.0110 Epoch: 13 Global Step: 222860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:25,183-Speed 9580.01 samples/sec Loss 4.7470 LearningRate 0.0110 Epoch: 13 Global Step: 222870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:26,297-Speed 9200.93 samples/sec Loss 4.7325 LearningRate 0.0110 Epoch: 13 Global Step: 222880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:27,356-Speed 9677.81 samples/sec Loss 4.6072 LearningRate 0.0110 Epoch: 13 Global Step: 222890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:28,410-Speed 9719.65 samples/sec Loss 4.6836 LearningRate 0.0110 Epoch: 13 Global Step: 222900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:29,492-Speed 9465.57 samples/sec Loss 4.7232 LearningRate 0.0110 Epoch: 13 Global Step: 222910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:30,602-Speed 9231.57 samples/sec Loss 4.7059 LearningRate 0.0110 Epoch: 13 Global Step: 222920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:31,733-Speed 9063.48 samples/sec Loss 4.6469 LearningRate 0.0110 Epoch: 13 Global Step: 222930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:32,821-Speed 9419.16 samples/sec Loss 4.6663 LearningRate 0.0110 Epoch: 13 Global Step: 222940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:33,904-Speed 9454.48 samples/sec Loss 4.7567 LearningRate 0.0110 Epoch: 13 Global Step: 222950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:35,027-Speed 9125.58 samples/sec Loss 4.6350 LearningRate 0.0110 Epoch: 13 Global Step: 222960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:36,135-Speed 9246.45 samples/sec Loss 4.7420 LearningRate 0.0110 Epoch: 13 Global Step: 222970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:37,198-Speed 9642.07 samples/sec Loss 4.7115 LearningRate 0.0110 Epoch: 13 Global Step: 222980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:32:38,292-Speed 9372.21 samples/sec Loss 4.7173 LearningRate 0.0110 Epoch: 13 Global Step: 222990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:32:39,405-Speed 9204.63 samples/sec Loss 4.6770 LearningRate 0.0110 Epoch: 13 Global Step: 223000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:32:40,458-Speed 9735.97 samples/sec Loss 4.7223 LearningRate 0.0110 Epoch: 13 Global Step: 223010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:32:41,539-Speed 9474.73 samples/sec Loss 4.7892 LearningRate 0.0110 Epoch: 13 Global Step: 223020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:32:43,448-Speed 5367.87 samples/sec Loss 4.7228 LearningRate 0.0110 Epoch: 13 Global Step: 223030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:32:44,536-Speed 9415.51 samples/sec Loss 4.7557 LearningRate 0.0110 Epoch: 13 Global Step: 223040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:32:45,642-Speed 9266.88 samples/sec Loss 4.5952 LearningRate 0.0110 Epoch: 13 Global Step: 223050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:32:46,767-Speed 9107.17 samples/sec Loss 4.8346 LearningRate 0.0110 Epoch: 13 Global Step: 223060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:32:47,847-Speed 9484.92 samples/sec Loss 4.7509 LearningRate 0.0110 Epoch: 13 Global Step: 223070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:32:48,942-Speed 9361.32 samples/sec Loss 4.7397 LearningRate 0.0110 Epoch: 13 Global Step: 223080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:50,050-Speed 9245.04 samples/sec Loss 4.7949 LearningRate 0.0110 Epoch: 13 Global Step: 223090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:51,133-Speed 9463.30 samples/sec Loss 4.6926 LearningRate 0.0110 Epoch: 13 Global Step: 223100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:52,248-Speed 9191.51 samples/sec Loss 4.6371 LearningRate 0.0110 Epoch: 13 Global Step: 223110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:53,333-Speed 9437.49 samples/sec Loss 4.6235 LearningRate 0.0110 Epoch: 13 Global Step: 223120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:54,390-Speed 9697.47 samples/sec Loss 4.6944 LearningRate 0.0110 Epoch: 13 Global Step: 223130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:55,448-Speed 9679.70 samples/sec Loss 4.7570 LearningRate 0.0110 Epoch: 13 Global Step: 223140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:56,584-Speed 9022.89 samples/sec Loss 4.7230 LearningRate 0.0110 Epoch: 13 Global Step: 223150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:57,637-Speed 9731.80 samples/sec Loss 4.6474 LearningRate 0.0110 Epoch: 13 Global Step: 223160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:58,751-Speed 9194.53 samples/sec Loss 4.6888 LearningRate 0.0110 Epoch: 13 Global Step: 223170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:32:59,822-Speed 9560.93 samples/sec Loss 4.6842 LearningRate 0.0110 Epoch: 13 Global Step: 223180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:00,914-Speed 9388.94 samples/sec Loss 4.7251 LearningRate 0.0110 Epoch: 13 Global Step: 223190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:02,049-Speed 9026.50 samples/sec Loss 4.6637 LearningRate 0.0110 Epoch: 13 Global Step: 223200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:03,106-Speed 9700.91 samples/sec Loss 4.7002 LearningRate 0.0110 Epoch: 13 Global Step: 223210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:04,170-Speed 9622.71 samples/sec Loss 4.7057 LearningRate 0.0110 Epoch: 13 Global Step: 223220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:05,243-Speed 9553.65 samples/sec Loss 4.6934 LearningRate 0.0110 Epoch: 13 Global Step: 223230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:33:06,334-Speed 9385.01 samples/sec Loss 4.7369 LearningRate 0.0110 Epoch: 13 Global Step: 223240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:33:07,429-Speed 9361.46 samples/sec Loss 4.5658 LearningRate 0.0110 Epoch: 13 Global Step: 223250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:33:08,509-Speed 9493.01 samples/sec Loss 4.6495 LearningRate 0.0110 Epoch: 13 Global Step: 223260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:33:09,626-Speed 9165.99 samples/sec Loss 4.6924 LearningRate 0.0110 Epoch: 13 Global Step: 223270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:33:10,715-Speed 9410.43 samples/sec Loss 4.6116 LearningRate 0.0110 Epoch: 13 Global Step: 223280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:33:11,819-Speed 9279.97 samples/sec Loss 4.7478 LearningRate 0.0110 Epoch: 13 Global Step: 223290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:33:12,887-Speed 9591.67 samples/sec Loss 4.7988 LearningRate 0.0110 Epoch: 13 Global Step: 223300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:33:13,972-Speed 9441.63 samples/sec Loss 4.7197 LearningRate 0.0110 Epoch: 13 Global Step: 223310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:33:15,096-Speed 9206.23 samples/sec Loss 4.8092 LearningRate 0.0110 Epoch: 13 Global Step: 223320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:33:16,204-Speed 9253.80 samples/sec Loss 4.6616 LearningRate 0.0110 Epoch: 13 Global Step: 223330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:17,281-Speed 9506.64 samples/sec Loss 4.7054 LearningRate 0.0110 Epoch: 13 Global Step: 223340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:18,398-Speed 9172.46 samples/sec Loss 4.7728 LearningRate 0.0110 Epoch: 13 Global Step: 223350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:19,507-Speed 9242.27 samples/sec Loss 4.7089 LearningRate 0.0109 Epoch: 13 Global Step: 223360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:20,589-Speed 9470.16 samples/sec Loss 4.7293 LearningRate 0.0109 Epoch: 13 Global Step: 223370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:21,663-Speed 9547.01 samples/sec Loss 4.7130 LearningRate 0.0109 Epoch: 13 Global Step: 223380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:22,772-Speed 9237.48 samples/sec Loss 4.7782 LearningRate 0.0109 Epoch: 13 Global Step: 223390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:23,856-Speed 9451.83 samples/sec Loss 4.7514 LearningRate 0.0109 Epoch: 13 Global Step: 223400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:24,979-Speed 9125.82 samples/sec Loss 4.6329 LearningRate 0.0109 Epoch: 13 Global Step: 223410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:26,073-Speed 9361.78 samples/sec Loss 4.7415 LearningRate 0.0109 Epoch: 13 Global Step: 223420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:27,176-Speed 9283.30 samples/sec Loss 4.5718 LearningRate 0.0109 Epoch: 13 Global Step: 223430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:28,256-Speed 9492.01 samples/sec Loss 4.6869 LearningRate 0.0109 Epoch: 13 Global Step: 223440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:29,348-Speed 9378.85 samples/sec Loss 4.7300 LearningRate 0.0109 Epoch: 13 Global Step: 223450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:30,439-Speed 9390.71 samples/sec Loss 4.6089 LearningRate 0.0109 Epoch: 13 Global Step: 223460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:31,522-Speed 9464.88 samples/sec Loss 4.8520 LearningRate 0.0109 Epoch: 13 Global Step: 223470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:32,655-Speed 9044.33 samples/sec Loss 4.6917 LearningRate 0.0109 Epoch: 13 Global Step: 223480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:33,730-Speed 9535.51 samples/sec Loss 4.6680 LearningRate 0.0109 Epoch: 13 Global Step: 223490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:34,835-Speed 9271.55 samples/sec Loss 4.6985 LearningRate 0.0109 Epoch: 13 Global Step: 223500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:35,928-Speed 9368.56 samples/sec Loss 4.6747 LearningRate 0.0109 Epoch: 13 Global Step: 223510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:37,065-Speed 9012.94 samples/sec Loss 4.7699 LearningRate 0.0109 Epoch: 13 Global Step: 223520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:38,158-Speed 9387.27 samples/sec Loss 4.7062 LearningRate 0.0109 Epoch: 13 Global Step: 223530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:39,287-Speed 9074.20 samples/sec Loss 4.6690 LearningRate 0.0109 Epoch: 13 Global Step: 223540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:40,354-Speed 9605.01 samples/sec Loss 4.6186 LearningRate 0.0109 Epoch: 13 Global Step: 223550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:41,421-Speed 9605.31 samples/sec Loss 4.7026 LearningRate 0.0109 Epoch: 13 Global Step: 223560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:42,574-Speed 8884.79 samples/sec Loss 4.7480 LearningRate 0.0109 Epoch: 13 Global Step: 223570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:43,682-Speed 9248.09 samples/sec Loss 4.6905 LearningRate 0.0109 Epoch: 13 Global Step: 223580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:44,756-Speed 9541.52 samples/sec Loss 4.6053 LearningRate 0.0109 Epoch: 13 Global Step: 223590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:45,861-Speed 9267.99 samples/sec Loss 4.6256 LearningRate 0.0109 Epoch: 13 Global Step: 223600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:46,973-Speed 9209.69 samples/sec Loss 4.7793 LearningRate 0.0109 Epoch: 13 Global Step: 223610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:48,055-Speed 9470.46 samples/sec Loss 4.6407 LearningRate 0.0109 Epoch: 13 Global Step: 223620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:49,107-Speed 9744.60 samples/sec Loss 4.6965 LearningRate 0.0109 Epoch: 13 Global Step: 223630 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:33:50,184-Speed 9510.63 samples/sec Loss 4.6027 LearningRate 0.0109 Epoch: 13 Global Step: 223640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:51,282-Speed 9330.71 samples/sec Loss 4.7430 LearningRate 0.0109 Epoch: 13 Global Step: 223650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:52,368-Speed 9440.00 samples/sec Loss 4.6281 LearningRate 0.0109 Epoch: 13 Global Step: 223660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:53,445-Speed 9508.93 samples/sec Loss 4.6986 LearningRate 0.0109 Epoch: 13 Global Step: 223670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:54,531-Speed 9434.70 samples/sec Loss 4.7618 LearningRate 0.0109 Epoch: 13 Global Step: 223680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:55,597-Speed 9611.77 samples/sec Loss 4.7463 LearningRate 0.0109 Epoch: 13 Global Step: 223690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:56,666-Speed 9584.77 samples/sec Loss 4.6997 LearningRate 0.0109 Epoch: 13 Global Step: 223700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:57,721-Speed 9712.47 samples/sec Loss 4.6826 LearningRate 0.0109 Epoch: 13 Global Step: 223710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:58,859-Speed 9003.64 samples/sec Loss 4.8064 LearningRate 0.0109 Epoch: 13 Global Step: 223720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:33:59,983-Speed 9113.86 samples/sec Loss 4.7021 LearningRate 0.0109 Epoch: 13 Global Step: 223730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:01,077-Speed 9372.31 samples/sec Loss 4.7800 LearningRate 0.0109 Epoch: 13 Global Step: 223740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:02,169-Speed 9383.70 samples/sec Loss 4.7830 LearningRate 0.0109 Epoch: 13 Global Step: 223750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:03,301-Speed 9047.63 samples/sec Loss 4.6655 LearningRate 0.0109 Epoch: 13 Global Step: 223760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:05,364-Speed 4965.25 samples/sec Loss 4.7272 LearningRate 0.0109 Epoch: 13 Global Step: 223770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:06,454-Speed 9401.24 samples/sec Loss 4.6710 LearningRate 0.0109 Epoch: 13 Global Step: 223780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:08,394-Speed 5279.61 samples/sec Loss 4.7919 LearningRate 0.0109 Epoch: 13 Global Step: 223790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:09,490-Speed 9355.87 samples/sec Loss 4.7430 LearningRate 0.0109 Epoch: 13 Global Step: 223800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:10,580-Speed 9391.05 samples/sec Loss 4.7353 LearningRate 0.0109 Epoch: 13 Global Step: 223810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:11,656-Speed 9523.89 samples/sec Loss 4.6150 LearningRate 0.0109 Epoch: 13 Global Step: 223820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:12,679-Speed 10020.57 samples/sec Loss 4.7190 LearningRate 0.0109 Epoch: 13 Global Step: 223830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:13,780-Speed 9303.61 samples/sec Loss 4.6873 LearningRate 0.0109 Epoch: 13 Global Step: 223840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:14,882-Speed 9301.37 samples/sec Loss 4.8202 LearningRate 0.0109 Epoch: 13 Global Step: 223850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:15,968-Speed 9431.21 samples/sec Loss 4.7277 LearningRate 0.0109 Epoch: 13 Global Step: 223860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:17,042-Speed 9544.29 samples/sec Loss 4.6425 LearningRate 0.0108 Epoch: 13 Global Step: 223870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:18,077-Speed 9892.94 samples/sec Loss 4.7858 LearningRate 0.0108 Epoch: 13 Global Step: 223880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:19,165-Speed 9417.31 samples/sec Loss 4.7330 LearningRate 0.0108 Epoch: 13 Global Step: 223890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:20,238-Speed 9554.87 samples/sec Loss 4.7032 LearningRate 0.0108 Epoch: 13 Global Step: 223900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:21,345-Speed 9252.95 samples/sec Loss 4.8242 LearningRate 0.0108 Epoch: 13 Global Step: 223910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:22,442-Speed 9337.45 samples/sec Loss 4.5918 LearningRate 0.0108 Epoch: 13 Global Step: 223920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:23,498-Speed 9705.19 samples/sec Loss 4.7409 LearningRate 0.0108 Epoch: 13 Global Step: 223930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:24,583-Speed 9445.53 samples/sec Loss 4.7551 LearningRate 0.0108 Epoch: 13 Global Step: 223940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:34:25,657-Speed 9536.52 samples/sec Loss 4.7576 LearningRate 0.0108 Epoch: 13 Global Step: 223950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:26,790-Speed 9042.29 samples/sec Loss 4.6702 LearningRate 0.0108 Epoch: 13 Global Step: 223960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:27,882-Speed 9387.10 samples/sec Loss 4.7463 LearningRate 0.0108 Epoch: 13 Global Step: 223970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:29,008-Speed 9093.49 samples/sec Loss 4.6359 LearningRate 0.0108 Epoch: 13 Global Step: 223980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:30,108-Speed 9314.28 samples/sec Loss 4.6279 LearningRate 0.0108 Epoch: 13 Global Step: 223990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:31,197-Speed 9409.21 samples/sec Loss 4.7833 LearningRate 0.0108 Epoch: 13 Global Step: 224000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:34:53,347-[lfw][224000]XNorm: 7.955453 Training: 2022-04-11 20:34:53,348-[lfw][224000]Accuracy-Flip: 0.99733+-0.00249 Training: 2022-04-11 20:34:53,348-[lfw][224000]Accuracy-Highest: 0.99733 Training: 2022-04-11 20:35:18,935-[cfp_fp][224000]XNorm: 6.844160 Training: 2022-04-11 20:35:18,935-[cfp_fp][224000]Accuracy-Flip: 0.96686+-0.00857 Training: 2022-04-11 20:35:18,936-[cfp_fp][224000]Accuracy-Highest: 0.96771 Training: 2022-04-11 20:35:40,962-[agedb_30][224000]XNorm: 7.721459 Training: 2022-04-11 20:35:40,963-[agedb_30][224000]Accuracy-Flip: 0.97033+-0.00865 Training: 2022-04-11 20:35:40,963-[agedb_30][224000]Accuracy-Highest: 0.97033 Training: 2022-04-11 20:35:42,073-Speed 144.48 samples/sec Loss 4.7132 LearningRate 0.0108 Epoch: 13 Global Step: 224010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:35:43,164-Speed 9389.31 samples/sec Loss 4.7341 LearningRate 0.0108 Epoch: 13 Global Step: 224020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:35:44,261-Speed 9341.29 samples/sec Loss 4.7417 LearningRate 0.0108 Epoch: 13 Global Step: 224030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:35:45,328-Speed 9598.76 samples/sec Loss 4.7974 LearningRate 0.0108 Epoch: 13 Global Step: 224040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:35:46,459-Speed 9061.55 samples/sec Loss 4.6594 LearningRate 0.0108 Epoch: 13 Global Step: 224050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:35:47,614-Speed 8871.16 samples/sec Loss 4.7517 LearningRate 0.0108 Epoch: 13 Global Step: 224060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:35:48,713-Speed 9326.01 samples/sec Loss 4.7299 LearningRate 0.0108 Epoch: 13 Global Step: 224070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:35:49,813-Speed 9310.17 samples/sec Loss 4.6914 LearningRate 0.0108 Epoch: 13 Global Step: 224080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:35:50,878-Speed 9622.13 samples/sec Loss 4.7777 LearningRate 0.0108 Epoch: 13 Global Step: 224090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:35:51,980-Speed 9295.50 samples/sec Loss 4.7948 LearningRate 0.0108 Epoch: 13 Global Step: 224100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:35:53,063-Speed 9468.01 samples/sec Loss 4.7934 LearningRate 0.0108 Epoch: 13 Global Step: 224110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:35:54,179-Speed 9172.95 samples/sec Loss 4.7639 LearningRate 0.0108 Epoch: 13 Global Step: 224120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:35:55,260-Speed 9484.98 samples/sec Loss 4.6493 LearningRate 0.0108 Epoch: 13 Global Step: 224130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:35:56,347-Speed 9422.96 samples/sec Loss 4.7634 LearningRate 0.0108 Epoch: 13 Global Step: 224140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:35:57,467-Speed 9146.13 samples/sec Loss 4.7315 LearningRate 0.0108 Epoch: 13 Global Step: 224150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:35:58,584-Speed 9175.59 samples/sec Loss 4.8527 LearningRate 0.0108 Epoch: 13 Global Step: 224160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:35:59,665-Speed 9475.46 samples/sec Loss 4.6618 LearningRate 0.0108 Epoch: 13 Global Step: 224170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:00,782-Speed 9175.12 samples/sec Loss 4.7576 LearningRate 0.0108 Epoch: 13 Global Step: 224180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:01,844-Speed 9646.15 samples/sec Loss 4.6563 LearningRate 0.0108 Epoch: 13 Global Step: 224190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:03,002-Speed 8846.83 samples/sec Loss 4.8032 LearningRate 0.0108 Epoch: 13 Global Step: 224200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:04,080-Speed 9506.65 samples/sec Loss 4.7026 LearningRate 0.0108 Epoch: 13 Global Step: 224210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:05,173-Speed 9379.18 samples/sec Loss 4.7995 LearningRate 0.0108 Epoch: 13 Global Step: 224220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:06,217-Speed 9814.32 samples/sec Loss 4.6783 LearningRate 0.0108 Epoch: 13 Global Step: 224230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:07,258-Speed 9837.78 samples/sec Loss 4.7422 LearningRate 0.0108 Epoch: 13 Global Step: 224240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:08,409-Speed 8897.94 samples/sec Loss 4.6973 LearningRate 0.0108 Epoch: 13 Global Step: 224250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:09,468-Speed 9681.22 samples/sec Loss 4.6745 LearningRate 0.0108 Epoch: 13 Global Step: 224260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:10,525-Speed 9688.80 samples/sec Loss 4.7056 LearningRate 0.0108 Epoch: 13 Global Step: 224270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:11,605-Speed 9490.25 samples/sec Loss 4.6690 LearningRate 0.0108 Epoch: 13 Global Step: 224280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:12,710-Speed 9267.52 samples/sec Loss 4.7528 LearningRate 0.0108 Epoch: 13 Global Step: 224290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:13,803-Speed 9378.16 samples/sec Loss 4.7437 LearningRate 0.0108 Epoch: 13 Global Step: 224300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:14,904-Speed 9300.26 samples/sec Loss 4.6528 LearningRate 0.0108 Epoch: 13 Global Step: 224310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:15,971-Speed 9606.57 samples/sec Loss 4.7053 LearningRate 0.0108 Epoch: 13 Global Step: 224320 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:36:17,086-Speed 9195.07 samples/sec Loss 4.7587 LearningRate 0.0108 Epoch: 13 Global Step: 224330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:18,155-Speed 9581.53 samples/sec Loss 4.7514 LearningRate 0.0108 Epoch: 13 Global Step: 224340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:19,253-Speed 9334.65 samples/sec Loss 4.7371 LearningRate 0.0108 Epoch: 13 Global Step: 224350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:20,336-Speed 9463.60 samples/sec Loss 4.7043 LearningRate 0.0108 Epoch: 13 Global Step: 224360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:21,438-Speed 9290.71 samples/sec Loss 4.6837 LearningRate 0.0107 Epoch: 13 Global Step: 224370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:22,588-Speed 8914.36 samples/sec Loss 4.7172 LearningRate 0.0107 Epoch: 13 Global Step: 224380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:23,664-Speed 9521.34 samples/sec Loss 4.7694 LearningRate 0.0107 Epoch: 13 Global Step: 224390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:24,724-Speed 9664.59 samples/sec Loss 4.7066 LearningRate 0.0107 Epoch: 13 Global Step: 224400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:25,819-Speed 9359.89 samples/sec Loss 4.7327 LearningRate 0.0107 Epoch: 13 Global Step: 224410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:26,876-Speed 9690.77 samples/sec Loss 4.6845 LearningRate 0.0107 Epoch: 13 Global Step: 224420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:27,988-Speed 9214.85 samples/sec Loss 4.8004 LearningRate 0.0107 Epoch: 13 Global Step: 224430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:29,080-Speed 9383.90 samples/sec Loss 4.6722 LearningRate 0.0107 Epoch: 13 Global Step: 224440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:30,168-Speed 9411.08 samples/sec Loss 4.7121 LearningRate 0.0107 Epoch: 13 Global Step: 224450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:31,227-Speed 9674.82 samples/sec Loss 4.6721 LearningRate 0.0107 Epoch: 13 Global Step: 224460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:32,270-Speed 9831.63 samples/sec Loss 4.7743 LearningRate 0.0107 Epoch: 13 Global Step: 224470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:33,364-Speed 9364.11 samples/sec Loss 4.6949 LearningRate 0.0107 Epoch: 13 Global Step: 224480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:34,475-Speed 9221.77 samples/sec Loss 4.6099 LearningRate 0.0107 Epoch: 13 Global Step: 224490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:35,566-Speed 9398.21 samples/sec Loss 4.6869 LearningRate 0.0107 Epoch: 13 Global Step: 224500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:36,656-Speed 9398.07 samples/sec Loss 4.7588 LearningRate 0.0107 Epoch: 13 Global Step: 224510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:37,788-Speed 9049.97 samples/sec Loss 4.8121 LearningRate 0.0107 Epoch: 13 Global Step: 224520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:38,873-Speed 9440.60 samples/sec Loss 4.7175 LearningRate 0.0107 Epoch: 13 Global Step: 224530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:41,987-Speed 3289.32 samples/sec Loss 4.7960 LearningRate 0.0107 Epoch: 13 Global Step: 224540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:43,044-Speed 9689.22 samples/sec Loss 4.7098 LearningRate 0.0107 Epoch: 13 Global Step: 224550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:44,192-Speed 8926.96 samples/sec Loss 4.6733 LearningRate 0.0107 Epoch: 13 Global Step: 224560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:46,100-Speed 5370.15 samples/sec Loss 4.6955 LearningRate 0.0107 Epoch: 13 Global Step: 224570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:48,143-Speed 5014.52 samples/sec Loss 4.6196 LearningRate 0.0107 Epoch: 13 Global Step: 224580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:49,192-Speed 9762.35 samples/sec Loss 4.7581 LearningRate 0.0107 Epoch: 13 Global Step: 224590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:51,222-Speed 5046.30 samples/sec Loss 4.6276 LearningRate 0.0107 Epoch: 13 Global Step: 224600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:52,354-Speed 9057.57 samples/sec Loss 4.7170 LearningRate 0.0107 Epoch: 13 Global Step: 224610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:36:53,414-Speed 9667.51 samples/sec Loss 4.6817 LearningRate 0.0107 Epoch: 13 Global Step: 224620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:54,473-Speed 9674.40 samples/sec Loss 4.6961 LearningRate 0.0107 Epoch: 13 Global Step: 224630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:55,592-Speed 9157.55 samples/sec Loss 4.8314 LearningRate 0.0107 Epoch: 13 Global Step: 224640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:56,715-Speed 9121.21 samples/sec Loss 4.7274 LearningRate 0.0107 Epoch: 13 Global Step: 224650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:57,802-Speed 9423.89 samples/sec Loss 4.7093 LearningRate 0.0107 Epoch: 13 Global Step: 224660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:36:58,905-Speed 9295.55 samples/sec Loss 4.7725 LearningRate 0.0107 Epoch: 13 Global Step: 224670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:00,017-Speed 9213.46 samples/sec Loss 4.6529 LearningRate 0.0107 Epoch: 13 Global Step: 224680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:01,117-Speed 9315.35 samples/sec Loss 4.7753 LearningRate 0.0107 Epoch: 13 Global Step: 224690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:02,177-Speed 9659.97 samples/sec Loss 4.8247 LearningRate 0.0107 Epoch: 13 Global Step: 224700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:03,270-Speed 9374.19 samples/sec Loss 4.6418 LearningRate 0.0107 Epoch: 13 Global Step: 224710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:37:04,408-Speed 9008.86 samples/sec Loss 4.7145 LearningRate 0.0107 Epoch: 13 Global Step: 224720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:37:05,527-Speed 9154.90 samples/sec Loss 4.6090 LearningRate 0.0107 Epoch: 13 Global Step: 224730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:37:06,627-Speed 9318.75 samples/sec Loss 4.7766 LearningRate 0.0107 Epoch: 13 Global Step: 224740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:37:07,725-Speed 9326.60 samples/sec Loss 4.7562 LearningRate 0.0107 Epoch: 13 Global Step: 224750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:37:08,822-Speed 9338.46 samples/sec Loss 4.6318 LearningRate 0.0107 Epoch: 13 Global Step: 224760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:37:09,917-Speed 9360.72 samples/sec Loss 4.7987 LearningRate 0.0107 Epoch: 13 Global Step: 224770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:37:11,042-Speed 9105.92 samples/sec Loss 4.7119 LearningRate 0.0107 Epoch: 13 Global Step: 224780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:37:12,155-Speed 9206.31 samples/sec Loss 4.7031 LearningRate 0.0107 Epoch: 13 Global Step: 224790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:37:13,249-Speed 9368.21 samples/sec Loss 4.7164 LearningRate 0.0107 Epoch: 13 Global Step: 224800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:37:14,318-Speed 9588.28 samples/sec Loss 4.7739 LearningRate 0.0107 Epoch: 13 Global Step: 224810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:15,401-Speed 9461.89 samples/sec Loss 4.7302 LearningRate 0.0107 Epoch: 13 Global Step: 224820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:16,524-Speed 9124.91 samples/sec Loss 4.6295 LearningRate 0.0107 Epoch: 13 Global Step: 224830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:17,601-Speed 9509.61 samples/sec Loss 4.6673 LearningRate 0.0107 Epoch: 13 Global Step: 224840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:18,678-Speed 9516.27 samples/sec Loss 4.8240 LearningRate 0.0107 Epoch: 13 Global Step: 224850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:19,742-Speed 9627.25 samples/sec Loss 4.7975 LearningRate 0.0107 Epoch: 13 Global Step: 224860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:20,841-Speed 9324.73 samples/sec Loss 4.6581 LearningRate 0.0107 Epoch: 13 Global Step: 224870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:21,902-Speed 9661.23 samples/sec Loss 4.7649 LearningRate 0.0107 Epoch: 13 Global Step: 224880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:23,008-Speed 9265.13 samples/sec Loss 4.6961 LearningRate 0.0106 Epoch: 13 Global Step: 224890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:24,108-Speed 9312.41 samples/sec Loss 4.7441 LearningRate 0.0106 Epoch: 13 Global Step: 224900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:37:25,195-Speed 9430.57 samples/sec Loss 4.7571 LearningRate 0.0106 Epoch: 13 Global Step: 224910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:26,301-Speed 9259.50 samples/sec Loss 4.6628 LearningRate 0.0106 Epoch: 13 Global Step: 224920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:27,374-Speed 9550.73 samples/sec Loss 4.7658 LearningRate 0.0106 Epoch: 13 Global Step: 224930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:28,427-Speed 9729.46 samples/sec Loss 4.7176 LearningRate 0.0106 Epoch: 13 Global Step: 224940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:29,601-Speed 8727.32 samples/sec Loss 4.7339 LearningRate 0.0106 Epoch: 13 Global Step: 224950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:30,693-Speed 9384.13 samples/sec Loss 4.7200 LearningRate 0.0106 Epoch: 13 Global Step: 224960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:31,770-Speed 9517.37 samples/sec Loss 4.6078 LearningRate 0.0106 Epoch: 13 Global Step: 224970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:32,887-Speed 9172.86 samples/sec Loss 4.7895 LearningRate 0.0106 Epoch: 13 Global Step: 224980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:34,045-Speed 8854.70 samples/sec Loss 4.7447 LearningRate 0.0106 Epoch: 13 Global Step: 224990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:35,132-Speed 9424.45 samples/sec Loss 4.7559 LearningRate 0.0106 Epoch: 13 Global Step: 225000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:36,220-Speed 9417.52 samples/sec Loss 4.6526 LearningRate 0.0106 Epoch: 13 Global Step: 225010 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:37:37,285-Speed 9623.78 samples/sec Loss 4.6809 LearningRate 0.0106 Epoch: 13 Global Step: 225020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:38,386-Speed 9300.43 samples/sec Loss 4.7749 LearningRate 0.0106 Epoch: 13 Global Step: 225030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:39,484-Speed 9335.99 samples/sec Loss 4.6515 LearningRate 0.0106 Epoch: 13 Global Step: 225040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:40,591-Speed 9255.27 samples/sec Loss 4.7458 LearningRate 0.0106 Epoch: 13 Global Step: 225050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:41,719-Speed 9077.22 samples/sec Loss 4.8076 LearningRate 0.0106 Epoch: 13 Global Step: 225060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:42,823-Speed 9280.82 samples/sec Loss 4.7698 LearningRate 0.0106 Epoch: 13 Global Step: 225070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:43,912-Speed 9407.32 samples/sec Loss 4.7069 LearningRate 0.0106 Epoch: 13 Global Step: 225080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:45,013-Speed 9308.37 samples/sec Loss 4.8078 LearningRate 0.0106 Epoch: 13 Global Step: 225090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:46,090-Speed 9515.27 samples/sec Loss 4.7501 LearningRate 0.0106 Epoch: 13 Global Step: 225100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:47,176-Speed 9435.78 samples/sec Loss 4.7745 LearningRate 0.0106 Epoch: 13 Global Step: 225110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:48,289-Speed 9204.95 samples/sec Loss 4.7592 LearningRate 0.0106 Epoch: 13 Global Step: 225120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:49,393-Speed 9281.92 samples/sec Loss 4.7458 LearningRate 0.0106 Epoch: 13 Global Step: 225130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:50,479-Speed 9437.33 samples/sec Loss 4.7825 LearningRate 0.0106 Epoch: 13 Global Step: 225140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:51,606-Speed 9092.68 samples/sec Loss 4.7785 LearningRate 0.0106 Epoch: 13 Global Step: 225150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:52,706-Speed 9316.85 samples/sec Loss 4.7815 LearningRate 0.0106 Epoch: 13 Global Step: 225160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:53,790-Speed 9452.67 samples/sec Loss 4.7009 LearningRate 0.0106 Epoch: 13 Global Step: 225170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:54,904-Speed 9197.68 samples/sec Loss 4.7423 LearningRate 0.0106 Epoch: 13 Global Step: 225180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:55,994-Speed 9394.37 samples/sec Loss 4.7410 LearningRate 0.0106 Epoch: 13 Global Step: 225190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:57,098-Speed 9286.69 samples/sec Loss 4.7045 LearningRate 0.0106 Epoch: 13 Global Step: 225200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:58,239-Speed 8980.45 samples/sec Loss 4.7642 LearningRate 0.0106 Epoch: 13 Global Step: 225210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:37:59,355-Speed 9172.77 samples/sec Loss 4.6280 LearningRate 0.0106 Epoch: 13 Global Step: 225220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:00,469-Speed 9199.15 samples/sec Loss 4.6941 LearningRate 0.0106 Epoch: 13 Global Step: 225230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:01,556-Speed 9424.05 samples/sec Loss 4.6824 LearningRate 0.0106 Epoch: 13 Global Step: 225240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:02,665-Speed 9244.42 samples/sec Loss 4.6623 LearningRate 0.0106 Epoch: 13 Global Step: 225250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:03,771-Speed 9256.17 samples/sec Loss 4.6652 LearningRate 0.0106 Epoch: 13 Global Step: 225260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:04,863-Speed 9391.17 samples/sec Loss 4.6937 LearningRate 0.0106 Epoch: 13 Global Step: 225270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:05,937-Speed 9539.76 samples/sec Loss 4.7307 LearningRate 0.0106 Epoch: 13 Global Step: 225280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:07,014-Speed 9512.96 samples/sec Loss 4.7278 LearningRate 0.0106 Epoch: 13 Global Step: 225290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:08,125-Speed 9227.14 samples/sec Loss 4.7917 LearningRate 0.0106 Epoch: 13 Global Step: 225300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:09,184-Speed 9672.53 samples/sec Loss 4.7949 LearningRate 0.0106 Epoch: 13 Global Step: 225310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:10,255-Speed 9566.52 samples/sec Loss 4.7010 LearningRate 0.0106 Epoch: 13 Global Step: 225320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:11,372-Speed 9173.20 samples/sec Loss 4.7641 LearningRate 0.0106 Epoch: 13 Global Step: 225330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:12,484-Speed 9208.08 samples/sec Loss 4.7560 LearningRate 0.0106 Epoch: 13 Global Step: 225340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:13,588-Speed 9279.51 samples/sec Loss 4.6839 LearningRate 0.0106 Epoch: 13 Global Step: 225350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:14,668-Speed 9490.05 samples/sec Loss 4.7327 LearningRate 0.0106 Epoch: 13 Global Step: 225360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:15,733-Speed 9626.64 samples/sec Loss 4.6867 LearningRate 0.0106 Epoch: 13 Global Step: 225370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:16,850-Speed 9166.73 samples/sec Loss 4.6935 LearningRate 0.0106 Epoch: 13 Global Step: 225380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:17,942-Speed 9390.33 samples/sec Loss 4.7344 LearningRate 0.0106 Epoch: 13 Global Step: 225390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:19,076-Speed 9031.36 samples/sec Loss 4.6977 LearningRate 0.0105 Epoch: 13 Global Step: 225400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:20,187-Speed 9217.47 samples/sec Loss 4.8016 LearningRate 0.0105 Epoch: 13 Global Step: 225410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:21,253-Speed 9613.60 samples/sec Loss 4.7524 LearningRate 0.0105 Epoch: 13 Global Step: 225420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:22,341-Speed 9421.36 samples/sec Loss 4.7198 LearningRate 0.0105 Epoch: 13 Global Step: 225430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:23,420-Speed 9500.02 samples/sec Loss 4.7826 LearningRate 0.0105 Epoch: 13 Global Step: 225440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:24,479-Speed 9670.39 samples/sec Loss 4.7322 LearningRate 0.0105 Epoch: 13 Global Step: 225450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:25,527-Speed 9774.78 samples/sec Loss 4.6361 LearningRate 0.0105 Epoch: 13 Global Step: 225460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:26,592-Speed 9629.97 samples/sec Loss 4.7342 LearningRate 0.0105 Epoch: 13 Global Step: 225470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:27,689-Speed 9335.88 samples/sec Loss 4.7882 LearningRate 0.0105 Epoch: 13 Global Step: 225480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:28,766-Speed 9511.79 samples/sec Loss 4.7635 LearningRate 0.0105 Epoch: 13 Global Step: 225490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:29,829-Speed 9638.27 samples/sec Loss 4.8191 LearningRate 0.0105 Epoch: 13 Global Step: 225500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:30,877-Speed 9777.62 samples/sec Loss 4.7871 LearningRate 0.0105 Epoch: 13 Global Step: 225510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:31,956-Speed 9497.71 samples/sec Loss 4.7168 LearningRate 0.0105 Epoch: 13 Global Step: 225520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:33,046-Speed 9393.50 samples/sec Loss 4.7219 LearningRate 0.0105 Epoch: 13 Global Step: 225530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:34,133-Speed 9430.02 samples/sec Loss 4.7171 LearningRate 0.0105 Epoch: 13 Global Step: 225540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:35,236-Speed 9292.86 samples/sec Loss 4.7107 LearningRate 0.0105 Epoch: 13 Global Step: 225550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:36,268-Speed 9921.42 samples/sec Loss 4.7411 LearningRate 0.0105 Epoch: 13 Global Step: 225560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:37,362-Speed 9372.21 samples/sec Loss 4.7027 LearningRate 0.0105 Epoch: 13 Global Step: 225570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:38,441-Speed 9489.00 samples/sec Loss 4.8281 LearningRate 0.0105 Epoch: 13 Global Step: 225580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:39,583-Speed 8972.02 samples/sec Loss 4.7316 LearningRate 0.0105 Epoch: 13 Global Step: 225590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:40,661-Speed 9506.44 samples/sec Loss 4.5895 LearningRate 0.0105 Epoch: 13 Global Step: 225600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:41,742-Speed 9481.61 samples/sec Loss 4.7584 LearningRate 0.0105 Epoch: 13 Global Step: 225610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:42,862-Speed 9148.65 samples/sec Loss 4.7554 LearningRate 0.0105 Epoch: 13 Global Step: 225620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:43,981-Speed 9158.20 samples/sec Loss 4.7813 LearningRate 0.0105 Epoch: 13 Global Step: 225630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:45,058-Speed 9515.04 samples/sec Loss 4.6917 LearningRate 0.0105 Epoch: 13 Global Step: 225640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:46,109-Speed 9750.09 samples/sec Loss 4.6993 LearningRate 0.0105 Epoch: 13 Global Step: 225650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:47,192-Speed 9458.08 samples/sec Loss 4.6170 LearningRate 0.0105 Epoch: 13 Global Step: 225660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:48,268-Speed 9527.03 samples/sec Loss 4.7835 LearningRate 0.0105 Epoch: 13 Global Step: 225670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:49,386-Speed 9158.31 samples/sec Loss 4.6361 LearningRate 0.0105 Epoch: 13 Global Step: 225680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:50,469-Speed 9467.19 samples/sec Loss 4.8166 LearningRate 0.0105 Epoch: 13 Global Step: 225690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:51,551-Speed 9465.56 samples/sec Loss 4.6625 LearningRate 0.0105 Epoch: 13 Global Step: 225700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:38:52,666-Speed 9199.09 samples/sec Loss 4.7085 LearningRate 0.0105 Epoch: 13 Global Step: 225710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:53,781-Speed 9183.60 samples/sec Loss 4.7685 LearningRate 0.0105 Epoch: 13 Global Step: 225720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:54,880-Speed 9324.92 samples/sec Loss 4.7703 LearningRate 0.0105 Epoch: 13 Global Step: 225730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:55,946-Speed 9605.73 samples/sec Loss 4.6870 LearningRate 0.0105 Epoch: 13 Global Step: 225740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:57,023-Speed 9518.15 samples/sec Loss 4.7383 LearningRate 0.0105 Epoch: 13 Global Step: 225750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:58,125-Speed 9294.75 samples/sec Loss 4.7506 LearningRate 0.0105 Epoch: 13 Global Step: 225760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:38:59,188-Speed 9642.89 samples/sec Loss 4.7822 LearningRate 0.0105 Epoch: 13 Global Step: 225770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:00,309-Speed 9135.29 samples/sec Loss 4.7085 LearningRate 0.0105 Epoch: 13 Global Step: 225780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:01,395-Speed 9438.33 samples/sec Loss 4.7795 LearningRate 0.0105 Epoch: 13 Global Step: 225790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:02,464-Speed 9578.95 samples/sec Loss 4.6735 LearningRate 0.0105 Epoch: 13 Global Step: 225800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:03,562-Speed 9335.28 samples/sec Loss 4.8094 LearningRate 0.0105 Epoch: 13 Global Step: 225810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:04,671-Speed 9242.99 samples/sec Loss 4.6964 LearningRate 0.0105 Epoch: 13 Global Step: 225820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:05,764-Speed 9369.33 samples/sec Loss 4.7987 LearningRate 0.0105 Epoch: 13 Global Step: 225830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:06,865-Speed 9306.46 samples/sec Loss 4.7637 LearningRate 0.0105 Epoch: 13 Global Step: 225840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:07,916-Speed 9754.67 samples/sec Loss 4.8009 LearningRate 0.0105 Epoch: 13 Global Step: 225850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:39:08,999-Speed 9455.03 samples/sec Loss 4.7401 LearningRate 0.0105 Epoch: 13 Global Step: 225860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:39:10,119-Speed 9149.81 samples/sec Loss 4.7595 LearningRate 0.0105 Epoch: 13 Global Step: 225870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:39:11,207-Speed 9420.95 samples/sec Loss 4.6928 LearningRate 0.0105 Epoch: 13 Global Step: 225880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:39:12,238-Speed 9930.68 samples/sec Loss 4.7238 LearningRate 0.0105 Epoch: 13 Global Step: 225890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:39:13,308-Speed 9577.82 samples/sec Loss 4.7953 LearningRate 0.0105 Epoch: 13 Global Step: 225900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:39:14,404-Speed 9345.24 samples/sec Loss 4.8092 LearningRate 0.0104 Epoch: 13 Global Step: 225910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:39:15,464-Speed 9675.01 samples/sec Loss 4.6517 LearningRate 0.0104 Epoch: 13 Global Step: 225920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:39:16,539-Speed 9527.55 samples/sec Loss 4.7908 LearningRate 0.0104 Epoch: 13 Global Step: 225930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:39:17,616-Speed 9511.52 samples/sec Loss 4.6858 LearningRate 0.0104 Epoch: 13 Global Step: 225940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:39:18,706-Speed 9403.71 samples/sec Loss 4.6808 LearningRate 0.0104 Epoch: 13 Global Step: 225950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:19,789-Speed 9461.08 samples/sec Loss 4.7817 LearningRate 0.0104 Epoch: 13 Global Step: 225960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:20,862-Speed 9542.44 samples/sec Loss 4.7600 LearningRate 0.0104 Epoch: 13 Global Step: 225970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:21,947-Speed 9449.20 samples/sec Loss 4.6986 LearningRate 0.0104 Epoch: 13 Global Step: 225980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:23,040-Speed 9382.57 samples/sec Loss 4.7630 LearningRate 0.0104 Epoch: 13 Global Step: 225990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:24,108-Speed 9589.88 samples/sec Loss 4.7866 LearningRate 0.0104 Epoch: 13 Global Step: 226000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:39:46,181-[lfw][226000]XNorm: 7.981218 Training: 2022-04-11 20:39:46,181-[lfw][226000]Accuracy-Flip: 0.99700+-0.00287 Training: 2022-04-11 20:39:46,182-[lfw][226000]Accuracy-Highest: 0.99733 Training: 2022-04-11 20:40:11,357-[cfp_fp][226000]XNorm: 6.844564 Training: 2022-04-11 20:40:11,357-[cfp_fp][226000]Accuracy-Flip: 0.96771+-0.01066 Training: 2022-04-11 20:40:11,357-[cfp_fp][226000]Accuracy-Highest: 0.96771 Training: 2022-04-11 20:40:33,057-[agedb_30][226000]XNorm: 7.700978 Training: 2022-04-11 20:40:33,058-[agedb_30][226000]Accuracy-Flip: 0.96900+-0.00920 Training: 2022-04-11 20:40:33,058-[agedb_30][226000]Accuracy-Highest: 0.97033 Training: 2022-04-11 20:40:34,139-Speed 146.22 samples/sec Loss 4.7565 LearningRate 0.0104 Epoch: 13 Global Step: 226010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:35,208-Speed 9587.76 samples/sec Loss 4.7886 LearningRate 0.0104 Epoch: 13 Global Step: 226020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:36,270-Speed 9641.48 samples/sec Loss 4.7578 LearningRate 0.0104 Epoch: 13 Global Step: 226030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:37,339-Speed 9584.03 samples/sec Loss 4.6880 LearningRate 0.0104 Epoch: 13 Global Step: 226040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:38,453-Speed 9203.50 samples/sec Loss 4.7353 LearningRate 0.0104 Epoch: 13 Global Step: 226050 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:40:39,534-Speed 9471.42 samples/sec Loss 4.7351 LearningRate 0.0104 Epoch: 13 Global Step: 226060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:40,567-Speed 9927.45 samples/sec Loss 4.8044 LearningRate 0.0104 Epoch: 13 Global Step: 226070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:41,652-Speed 9439.60 samples/sec Loss 4.6888 LearningRate 0.0104 Epoch: 13 Global Step: 226080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:42,719-Speed 9600.05 samples/sec Loss 4.7141 LearningRate 0.0104 Epoch: 13 Global Step: 226090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:43,810-Speed 9393.65 samples/sec Loss 4.7592 LearningRate 0.0104 Epoch: 13 Global Step: 226100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:44,891-Speed 9478.87 samples/sec Loss 4.7592 LearningRate 0.0104 Epoch: 13 Global Step: 226110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:45,997-Speed 9265.50 samples/sec Loss 4.6525 LearningRate 0.0104 Epoch: 13 Global Step: 226120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:47,082-Speed 9445.92 samples/sec Loss 4.7834 LearningRate 0.0104 Epoch: 13 Global Step: 226130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:48,218-Speed 9017.18 samples/sec Loss 4.7439 LearningRate 0.0104 Epoch: 13 Global Step: 226140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:49,293-Speed 9534.86 samples/sec Loss 4.7345 LearningRate 0.0104 Epoch: 13 Global Step: 226150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:50,372-Speed 9492.99 samples/sec Loss 4.7625 LearningRate 0.0104 Epoch: 13 Global Step: 226160 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:40:51,463-Speed 9392.16 samples/sec Loss 4.7819 LearningRate 0.0104 Epoch: 13 Global Step: 226170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:52,565-Speed 9305.18 samples/sec Loss 4.7532 LearningRate 0.0104 Epoch: 13 Global Step: 226180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:53,657-Speed 9377.21 samples/sec Loss 4.7304 LearningRate 0.0104 Epoch: 13 Global Step: 226190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:54,761-Speed 9283.02 samples/sec Loss 4.6147 LearningRate 0.0104 Epoch: 13 Global Step: 226200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:55,823-Speed 9649.62 samples/sec Loss 4.6309 LearningRate 0.0104 Epoch: 13 Global Step: 226210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:56,886-Speed 9639.45 samples/sec Loss 4.7350 LearningRate 0.0104 Epoch: 13 Global Step: 226220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:57,977-Speed 9387.50 samples/sec Loss 4.7426 LearningRate 0.0104 Epoch: 13 Global Step: 226230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:40:59,039-Speed 9647.46 samples/sec Loss 4.8447 LearningRate 0.0104 Epoch: 13 Global Step: 226240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:00,088-Speed 9764.33 samples/sec Loss 4.7219 LearningRate 0.0104 Epoch: 13 Global Step: 226250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:01,181-Speed 9380.30 samples/sec Loss 4.8541 LearningRate 0.0104 Epoch: 13 Global Step: 226260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:02,297-Speed 9174.08 samples/sec Loss 4.8452 LearningRate 0.0104 Epoch: 13 Global Step: 226270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:03,360-Speed 9643.53 samples/sec Loss 4.7381 LearningRate 0.0104 Epoch: 13 Global Step: 226280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:04,406-Speed 9790.61 samples/sec Loss 4.7358 LearningRate 0.0104 Epoch: 13 Global Step: 226290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:05,508-Speed 9303.92 samples/sec Loss 4.9293 LearningRate 0.0104 Epoch: 13 Global Step: 226300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:06,614-Speed 9261.58 samples/sec Loss 4.7185 LearningRate 0.0104 Epoch: 13 Global Step: 226310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:07,698-Speed 9449.98 samples/sec Loss 4.7272 LearningRate 0.0104 Epoch: 13 Global Step: 226320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:08,760-Speed 9650.82 samples/sec Loss 4.7353 LearningRate 0.0104 Epoch: 13 Global Step: 226330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:09,838-Speed 9501.63 samples/sec Loss 4.7272 LearningRate 0.0104 Epoch: 13 Global Step: 226340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:10,892-Speed 9719.70 samples/sec Loss 4.6622 LearningRate 0.0104 Epoch: 13 Global Step: 226350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:11,934-Speed 9838.45 samples/sec Loss 4.7780 LearningRate 0.0104 Epoch: 13 Global Step: 226360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:13,016-Speed 9467.16 samples/sec Loss 4.6873 LearningRate 0.0104 Epoch: 13 Global Step: 226370 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:41:14,108-Speed 9390.83 samples/sec Loss 4.7271 LearningRate 0.0104 Epoch: 13 Global Step: 226380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:15,206-Speed 9329.79 samples/sec Loss 4.7475 LearningRate 0.0104 Epoch: 13 Global Step: 226390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:41:16,311-Speed 9274.92 samples/sec Loss 4.7314 LearningRate 0.0104 Epoch: 13 Global Step: 226400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:41:17,398-Speed 9422.53 samples/sec Loss 4.6607 LearningRate 0.0104 Epoch: 13 Global Step: 226410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:41:18,475-Speed 9520.86 samples/sec Loss 4.7265 LearningRate 0.0104 Epoch: 13 Global Step: 226420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:41:19,568-Speed 9366.78 samples/sec Loss 4.7815 LearningRate 0.0103 Epoch: 13 Global Step: 226430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:41:20,635-Speed 9605.15 samples/sec Loss 4.7567 LearningRate 0.0103 Epoch: 13 Global Step: 226440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:41:21,752-Speed 9168.59 samples/sec Loss 4.6261 LearningRate 0.0103 Epoch: 13 Global Step: 226450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:41:22,822-Speed 9581.71 samples/sec Loss 4.7930 LearningRate 0.0103 Epoch: 13 Global Step: 226460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:41:23,916-Speed 9367.13 samples/sec Loss 4.6912 LearningRate 0.0103 Epoch: 13 Global Step: 226470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:41:25,042-Speed 9099.44 samples/sec Loss 4.7539 LearningRate 0.0103 Epoch: 13 Global Step: 226480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:41:26,151-Speed 9237.72 samples/sec Loss 4.6509 LearningRate 0.0103 Epoch: 13 Global Step: 226490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:27,265-Speed 9199.75 samples/sec Loss 4.6876 LearningRate 0.0103 Epoch: 13 Global Step: 226500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:28,352-Speed 9427.80 samples/sec Loss 4.6882 LearningRate 0.0103 Epoch: 13 Global Step: 226510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:29,415-Speed 9635.61 samples/sec Loss 4.7724 LearningRate 0.0103 Epoch: 13 Global Step: 226520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:30,518-Speed 9286.86 samples/sec Loss 4.6902 LearningRate 0.0103 Epoch: 13 Global Step: 226530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:31,612-Speed 9368.02 samples/sec Loss 4.7585 LearningRate 0.0103 Epoch: 13 Global Step: 226540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:32,700-Speed 9422.67 samples/sec Loss 4.8012 LearningRate 0.0103 Epoch: 13 Global Step: 226550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:33,755-Speed 9705.99 samples/sec Loss 4.6949 LearningRate 0.0103 Epoch: 13 Global Step: 226560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:34,818-Speed 9649.05 samples/sec Loss 4.6609 LearningRate 0.0103 Epoch: 13 Global Step: 226570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:35,911-Speed 9370.92 samples/sec Loss 4.7915 LearningRate 0.0103 Epoch: 13 Global Step: 226580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:36,969-Speed 9686.15 samples/sec Loss 4.8066 LearningRate 0.0103 Epoch: 13 Global Step: 226590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:38,032-Speed 9637.85 samples/sec Loss 4.7597 LearningRate 0.0103 Epoch: 13 Global Step: 226600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:39,143-Speed 9217.78 samples/sec Loss 4.7477 LearningRate 0.0103 Epoch: 13 Global Step: 226610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:40,210-Speed 9600.73 samples/sec Loss 4.7049 LearningRate 0.0103 Epoch: 13 Global Step: 226620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:41,306-Speed 9354.31 samples/sec Loss 4.7492 LearningRate 0.0103 Epoch: 13 Global Step: 226630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:42,365-Speed 9666.20 samples/sec Loss 4.8192 LearningRate 0.0103 Epoch: 13 Global Step: 226640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:43,439-Speed 9544.50 samples/sec Loss 4.7424 LearningRate 0.0103 Epoch: 13 Global Step: 226650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:44,557-Speed 9159.58 samples/sec Loss 4.7408 LearningRate 0.0103 Epoch: 13 Global Step: 226660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:45,651-Speed 9372.24 samples/sec Loss 4.8030 LearningRate 0.0103 Epoch: 13 Global Step: 226670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:46,714-Speed 9634.20 samples/sec Loss 4.7090 LearningRate 0.0103 Epoch: 13 Global Step: 226680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:47,829-Speed 9190.46 samples/sec Loss 4.6949 LearningRate 0.0103 Epoch: 13 Global Step: 226690 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:41:48,891-Speed 9650.90 samples/sec Loss 4.8679 LearningRate 0.0103 Epoch: 13 Global Step: 226700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:49,972-Speed 9475.87 samples/sec Loss 4.7999 LearningRate 0.0103 Epoch: 13 Global Step: 226710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:51,085-Speed 9211.67 samples/sec Loss 4.7499 LearningRate 0.0103 Epoch: 13 Global Step: 226720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:52,147-Speed 9646.60 samples/sec Loss 4.6523 LearningRate 0.0103 Epoch: 13 Global Step: 226730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:53,181-Speed 9910.44 samples/sec Loss 4.7159 LearningRate 0.0103 Epoch: 13 Global Step: 226740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:54,260-Speed 9499.33 samples/sec Loss 4.6995 LearningRate 0.0103 Epoch: 13 Global Step: 226750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:55,328-Speed 9589.22 samples/sec Loss 4.7139 LearningRate 0.0103 Epoch: 13 Global Step: 226760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:56,381-Speed 9738.38 samples/sec Loss 4.8058 LearningRate 0.0103 Epoch: 13 Global Step: 226770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:57,468-Speed 9423.24 samples/sec Loss 4.7231 LearningRate 0.0103 Epoch: 13 Global Step: 226780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:58,579-Speed 9223.17 samples/sec Loss 4.7769 LearningRate 0.0103 Epoch: 13 Global Step: 226790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:41:59,650-Speed 9565.48 samples/sec Loss 4.7578 LearningRate 0.0103 Epoch: 13 Global Step: 226800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:00,764-Speed 9194.29 samples/sec Loss 4.7622 LearningRate 0.0103 Epoch: 13 Global Step: 226810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:01,797-Speed 9926.17 samples/sec Loss 4.8364 LearningRate 0.0103 Epoch: 13 Global Step: 226820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:02,912-Speed 9186.92 samples/sec Loss 4.7518 LearningRate 0.0103 Epoch: 13 Global Step: 226830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:03,973-Speed 9659.06 samples/sec Loss 4.8018 LearningRate 0.0103 Epoch: 13 Global Step: 226840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:05,021-Speed 9777.07 samples/sec Loss 4.6722 LearningRate 0.0103 Epoch: 13 Global Step: 226850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:06,096-Speed 9524.38 samples/sec Loss 4.7708 LearningRate 0.0103 Epoch: 13 Global Step: 226860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:07,175-Speed 9494.31 samples/sec Loss 4.6602 LearningRate 0.0103 Epoch: 13 Global Step: 226870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:08,265-Speed 9407.20 samples/sec Loss 4.7120 LearningRate 0.0103 Epoch: 13 Global Step: 226880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:09,334-Speed 9580.97 samples/sec Loss 4.7293 LearningRate 0.0103 Epoch: 13 Global Step: 226890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:10,427-Speed 9383.73 samples/sec Loss 4.7149 LearningRate 0.0103 Epoch: 13 Global Step: 226900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:11,547-Speed 9149.40 samples/sec Loss 4.7799 LearningRate 0.0103 Epoch: 13 Global Step: 226910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:12,618-Speed 9564.25 samples/sec Loss 4.7296 LearningRate 0.0103 Epoch: 13 Global Step: 226920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:13,674-Speed 9704.31 samples/sec Loss 4.7358 LearningRate 0.0103 Epoch: 13 Global Step: 226930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:14,769-Speed 9348.89 samples/sec Loss 4.6641 LearningRate 0.0103 Epoch: 13 Global Step: 226940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:15,852-Speed 9461.62 samples/sec Loss 4.8641 LearningRate 0.0102 Epoch: 13 Global Step: 226950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:16,928-Speed 9524.07 samples/sec Loss 4.7258 LearningRate 0.0102 Epoch: 13 Global Step: 226960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:18,008-Speed 9490.10 samples/sec Loss 4.7794 LearningRate 0.0102 Epoch: 13 Global Step: 226970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:19,088-Speed 9487.89 samples/sec Loss 4.7641 LearningRate 0.0102 Epoch: 13 Global Step: 226980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:20,155-Speed 9599.12 samples/sec Loss 4.8699 LearningRate 0.0102 Epoch: 13 Global Step: 226990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:21,233-Speed 9505.94 samples/sec Loss 4.7165 LearningRate 0.0102 Epoch: 13 Global Step: 227000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:22,285-Speed 9742.77 samples/sec Loss 4.7794 LearningRate 0.0102 Epoch: 13 Global Step: 227010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:23,368-Speed 9460.44 samples/sec Loss 4.6980 LearningRate 0.0102 Epoch: 13 Global Step: 227020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:42:24,465-Speed 9340.19 samples/sec Loss 4.7780 LearningRate 0.0102 Epoch: 13 Global Step: 227030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:42:25,544-Speed 9493.60 samples/sec Loss 4.7392 LearningRate 0.0102 Epoch: 13 Global Step: 227040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:42:26,651-Speed 9257.80 samples/sec Loss 4.7912 LearningRate 0.0102 Epoch: 13 Global Step: 227050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:42:27,734-Speed 9470.69 samples/sec Loss 4.7383 LearningRate 0.0102 Epoch: 13 Global Step: 227060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:42:28,827-Speed 9373.88 samples/sec Loss 4.7339 LearningRate 0.0102 Epoch: 13 Global Step: 227070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:42:29,883-Speed 9701.00 samples/sec Loss 4.6700 LearningRate 0.0102 Epoch: 13 Global Step: 227080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:42:30,929-Speed 9795.64 samples/sec Loss 4.7784 LearningRate 0.0102 Epoch: 13 Global Step: 227090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:42:32,018-Speed 9410.42 samples/sec Loss 4.6847 LearningRate 0.0102 Epoch: 13 Global Step: 227100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:42:33,115-Speed 9337.10 samples/sec Loss 4.7662 LearningRate 0.0102 Epoch: 13 Global Step: 227110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:42:34,184-Speed 9581.66 samples/sec Loss 4.7018 LearningRate 0.0102 Epoch: 13 Global Step: 227120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:35,274-Speed 9400.39 samples/sec Loss 4.8421 LearningRate 0.0102 Epoch: 13 Global Step: 227130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:36,370-Speed 9354.26 samples/sec Loss 4.7835 LearningRate 0.0102 Epoch: 13 Global Step: 227140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:37,522-Speed 8886.99 samples/sec Loss 4.7908 LearningRate 0.0102 Epoch: 13 Global Step: 227150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:38,678-Speed 8861.85 samples/sec Loss 4.7505 LearningRate 0.0102 Epoch: 13 Global Step: 227160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:39,759-Speed 9482.86 samples/sec Loss 4.7875 LearningRate 0.0102 Epoch: 13 Global Step: 227170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:40,823-Speed 9629.07 samples/sec Loss 4.7592 LearningRate 0.0102 Epoch: 13 Global Step: 227180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:41,922-Speed 9325.70 samples/sec Loss 4.7477 LearningRate 0.0102 Epoch: 13 Global Step: 227190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:42,972-Speed 9755.58 samples/sec Loss 4.7022 LearningRate 0.0102 Epoch: 13 Global Step: 227200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:44,044-Speed 9560.12 samples/sec Loss 4.6974 LearningRate 0.0102 Epoch: 13 Global Step: 227210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:45,165-Speed 9136.13 samples/sec Loss 4.7254 LearningRate 0.0102 Epoch: 13 Global Step: 227220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:46,249-Speed 9455.27 samples/sec Loss 4.8188 LearningRate 0.0102 Epoch: 13 Global Step: 227230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:47,358-Speed 9241.67 samples/sec Loss 4.7091 LearningRate 0.0102 Epoch: 13 Global Step: 227240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:48,440-Speed 9470.93 samples/sec Loss 4.7148 LearningRate 0.0102 Epoch: 13 Global Step: 227250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:49,513-Speed 9549.26 samples/sec Loss 4.7729 LearningRate 0.0102 Epoch: 13 Global Step: 227260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:50,595-Speed 9464.93 samples/sec Loss 4.7520 LearningRate 0.0102 Epoch: 13 Global Step: 227270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:51,674-Speed 9504.00 samples/sec Loss 4.7910 LearningRate 0.0102 Epoch: 13 Global Step: 227280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:52,742-Speed 9591.06 samples/sec Loss 4.7713 LearningRate 0.0102 Epoch: 13 Global Step: 227290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:53,802-Speed 9664.53 samples/sec Loss 4.7039 LearningRate 0.0102 Epoch: 13 Global Step: 227300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:54,875-Speed 9548.32 samples/sec Loss 4.7641 LearningRate 0.0102 Epoch: 13 Global Step: 227310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:55,936-Speed 9651.39 samples/sec Loss 4.7657 LearningRate 0.0102 Epoch: 13 Global Step: 227320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:57,034-Speed 9334.13 samples/sec Loss 4.7401 LearningRate 0.0102 Epoch: 13 Global Step: 227330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:58,079-Speed 9808.90 samples/sec Loss 4.7465 LearningRate 0.0102 Epoch: 13 Global Step: 227340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:42:59,157-Speed 9505.14 samples/sec Loss 4.6523 LearningRate 0.0102 Epoch: 13 Global Step: 227350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:00,254-Speed 9338.18 samples/sec Loss 4.7261 LearningRate 0.0102 Epoch: 13 Global Step: 227360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:01,325-Speed 9570.83 samples/sec Loss 4.6140 LearningRate 0.0102 Epoch: 13 Global Step: 227370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:02,393-Speed 9592.75 samples/sec Loss 4.7106 LearningRate 0.0102 Epoch: 13 Global Step: 227380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:03,456-Speed 9636.53 samples/sec Loss 4.6978 LearningRate 0.0102 Epoch: 13 Global Step: 227390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:04,528-Speed 9560.67 samples/sec Loss 4.7685 LearningRate 0.0102 Epoch: 13 Global Step: 227400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:05,594-Speed 9612.40 samples/sec Loss 4.7750 LearningRate 0.0102 Epoch: 13 Global Step: 227410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:06,702-Speed 9250.97 samples/sec Loss 4.8191 LearningRate 0.0102 Epoch: 13 Global Step: 227420 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:43:07,776-Speed 9536.08 samples/sec Loss 4.7012 LearningRate 0.0102 Epoch: 13 Global Step: 227430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:08,848-Speed 9557.75 samples/sec Loss 4.6806 LearningRate 0.0102 Epoch: 13 Global Step: 227440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:09,890-Speed 9830.48 samples/sec Loss 4.7773 LearningRate 0.0102 Epoch: 13 Global Step: 227450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:10,983-Speed 9377.67 samples/sec Loss 4.7474 LearningRate 0.0102 Epoch: 13 Global Step: 227460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:12,095-Speed 9218.91 samples/sec Loss 4.6619 LearningRate 0.0101 Epoch: 13 Global Step: 227470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:13,191-Speed 9345.61 samples/sec Loss 4.6793 LearningRate 0.0101 Epoch: 13 Global Step: 227480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:14,275-Speed 9451.51 samples/sec Loss 4.7702 LearningRate 0.0101 Epoch: 13 Global Step: 227490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:15,397-Speed 9131.76 samples/sec Loss 4.7295 LearningRate 0.0101 Epoch: 13 Global Step: 227500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:16,468-Speed 9567.99 samples/sec Loss 4.7064 LearningRate 0.0101 Epoch: 13 Global Step: 227510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:17,580-Speed 9215.87 samples/sec Loss 4.6530 LearningRate 0.0101 Epoch: 13 Global Step: 227520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:18,723-Speed 8962.69 samples/sec Loss 4.8098 LearningRate 0.0101 Epoch: 13 Global Step: 227530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:19,855-Speed 9049.76 samples/sec Loss 4.7345 LearningRate 0.0101 Epoch: 13 Global Step: 227540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:20,995-Speed 8985.65 samples/sec Loss 4.7278 LearningRate 0.0101 Epoch: 13 Global Step: 227550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:22,097-Speed 9300.93 samples/sec Loss 4.8098 LearningRate 0.0101 Epoch: 13 Global Step: 227560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:23,192-Speed 9363.30 samples/sec Loss 4.7817 LearningRate 0.0101 Epoch: 13 Global Step: 227570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:24,273-Speed 9475.28 samples/sec Loss 4.7474 LearningRate 0.0101 Epoch: 13 Global Step: 227580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:25,354-Speed 9481.47 samples/sec Loss 4.7329 LearningRate 0.0101 Epoch: 13 Global Step: 227590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:26,447-Speed 9369.54 samples/sec Loss 4.7222 LearningRate 0.0101 Epoch: 13 Global Step: 227600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:27,511-Speed 9632.20 samples/sec Loss 4.7395 LearningRate 0.0101 Epoch: 13 Global Step: 227610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:28,616-Speed 9273.07 samples/sec Loss 4.7808 LearningRate 0.0101 Epoch: 13 Global Step: 227620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:29,748-Speed 9049.68 samples/sec Loss 4.8210 LearningRate 0.0101 Epoch: 13 Global Step: 227630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:30,837-Speed 9414.43 samples/sec Loss 4.7174 LearningRate 0.0101 Epoch: 13 Global Step: 227640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:31,892-Speed 9710.25 samples/sec Loss 4.8473 LearningRate 0.0101 Epoch: 13 Global Step: 227650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:32,988-Speed 9346.41 samples/sec Loss 4.7449 LearningRate 0.0101 Epoch: 13 Global Step: 227660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:34,109-Speed 9142.12 samples/sec Loss 4.6935 LearningRate 0.0101 Epoch: 13 Global Step: 227670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:35,177-Speed 9585.10 samples/sec Loss 4.7044 LearningRate 0.0101 Epoch: 13 Global Step: 227680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:36,265-Speed 9423.26 samples/sec Loss 4.7852 LearningRate 0.0101 Epoch: 13 Global Step: 227690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:37,355-Speed 9396.32 samples/sec Loss 4.7804 LearningRate 0.0101 Epoch: 13 Global Step: 227700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:38,469-Speed 9201.16 samples/sec Loss 4.6472 LearningRate 0.0101 Epoch: 13 Global Step: 227710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:39,530-Speed 9661.93 samples/sec Loss 4.7845 LearningRate 0.0101 Epoch: 13 Global Step: 227720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:40,611-Speed 9476.93 samples/sec Loss 4.7297 LearningRate 0.0101 Epoch: 13 Global Step: 227730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:41,732-Speed 9139.77 samples/sec Loss 4.7862 LearningRate 0.0101 Epoch: 13 Global Step: 227740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:42,832-Speed 9310.66 samples/sec Loss 4.7124 LearningRate 0.0101 Epoch: 13 Global Step: 227750 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:43:43,888-Speed 9704.94 samples/sec Loss 4.6929 LearningRate 0.0101 Epoch: 13 Global Step: 227760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:44,979-Speed 9392.02 samples/sec Loss 4.8063 LearningRate 0.0101 Epoch: 13 Global Step: 227770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:46,075-Speed 9348.29 samples/sec Loss 4.8183 LearningRate 0.0101 Epoch: 13 Global Step: 227780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:47,177-Speed 9296.91 samples/sec Loss 4.7236 LearningRate 0.0101 Epoch: 13 Global Step: 227790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:48,267-Speed 9397.41 samples/sec Loss 4.8288 LearningRate 0.0101 Epoch: 13 Global Step: 227800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:49,379-Speed 9213.19 samples/sec Loss 4.8553 LearningRate 0.0101 Epoch: 13 Global Step: 227810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:50,446-Speed 9604.38 samples/sec Loss 4.8442 LearningRate 0.0101 Epoch: 13 Global Step: 227820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:51,503-Speed 9698.86 samples/sec Loss 4.6812 LearningRate 0.0101 Epoch: 13 Global Step: 227830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:52,560-Speed 9695.26 samples/sec Loss 4.8007 LearningRate 0.0101 Epoch: 13 Global Step: 227840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:53,669-Speed 9233.96 samples/sec Loss 4.8090 LearningRate 0.0101 Epoch: 13 Global Step: 227850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:54,754-Speed 9442.40 samples/sec Loss 4.6918 LearningRate 0.0101 Epoch: 13 Global Step: 227860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:43:55,868-Speed 9200.18 samples/sec Loss 4.6523 LearningRate 0.0101 Epoch: 13 Global Step: 227870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:56,933-Speed 9617.95 samples/sec Loss 4.8214 LearningRate 0.0101 Epoch: 13 Global Step: 227880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:58,014-Speed 9480.00 samples/sec Loss 4.7216 LearningRate 0.0101 Epoch: 13 Global Step: 227890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:43:59,135-Speed 9139.07 samples/sec Loss 4.8194 LearningRate 0.0101 Epoch: 13 Global Step: 227900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:00,276-Speed 8980.20 samples/sec Loss 4.7534 LearningRate 0.0101 Epoch: 13 Global Step: 227910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:01,378-Speed 9299.71 samples/sec Loss 4.7936 LearningRate 0.0101 Epoch: 13 Global Step: 227920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:02,452-Speed 9538.84 samples/sec Loss 4.7484 LearningRate 0.0101 Epoch: 13 Global Step: 227930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:03,520-Speed 9599.01 samples/sec Loss 4.7759 LearningRate 0.0101 Epoch: 13 Global Step: 227940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:04,645-Speed 9104.74 samples/sec Loss 4.7352 LearningRate 0.0101 Epoch: 13 Global Step: 227950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:05,725-Speed 9486.87 samples/sec Loss 4.7544 LearningRate 0.0101 Epoch: 13 Global Step: 227960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:06,793-Speed 9596.56 samples/sec Loss 4.6500 LearningRate 0.0101 Epoch: 13 Global Step: 227970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:07,839-Speed 9797.96 samples/sec Loss 4.6951 LearningRate 0.0101 Epoch: 13 Global Step: 227980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:08,953-Speed 9198.70 samples/sec Loss 4.8212 LearningRate 0.0101 Epoch: 13 Global Step: 227990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:10,026-Speed 9549.12 samples/sec Loss 4.7416 LearningRate 0.0100 Epoch: 13 Global Step: 228000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:44:32,085-[lfw][228000]XNorm: 7.984275 Training: 2022-04-11 20:44:32,086-[lfw][228000]Accuracy-Flip: 0.99667+-0.00307 Training: 2022-04-11 20:44:32,086-[lfw][228000]Accuracy-Highest: 0.99733 Training: 2022-04-11 20:44:57,605-[cfp_fp][228000]XNorm: 6.898904 Training: 2022-04-11 20:44:57,606-[cfp_fp][228000]Accuracy-Flip: 0.96586+-0.01056 Training: 2022-04-11 20:44:57,606-[cfp_fp][228000]Accuracy-Highest: 0.96771 Training: 2022-04-11 20:45:19,656-[agedb_30][228000]XNorm: 7.723843 Training: 2022-04-11 20:45:19,656-[agedb_30][228000]Accuracy-Flip: 0.96867+-0.01059 Training: 2022-04-11 20:45:19,656-[agedb_30][228000]Accuracy-Highest: 0.97033 Training: 2022-04-11 20:45:20,770-Speed 144.75 samples/sec Loss 4.7532 LearningRate 0.0100 Epoch: 13 Global Step: 228010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:21,847-Speed 9520.86 samples/sec Loss 4.7082 LearningRate 0.0100 Epoch: 13 Global Step: 228020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:22,930-Speed 9461.39 samples/sec Loss 4.7694 LearningRate 0.0100 Epoch: 13 Global Step: 228030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:24,050-Speed 9146.13 samples/sec Loss 4.8589 LearningRate 0.0100 Epoch: 13 Global Step: 228040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:25,166-Speed 9180.62 samples/sec Loss 4.7603 LearningRate 0.0100 Epoch: 13 Global Step: 228050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:26,250-Speed 9453.17 samples/sec Loss 4.8578 LearningRate 0.0100 Epoch: 13 Global Step: 228060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:27,326-Speed 9523.68 samples/sec Loss 4.7554 LearningRate 0.0100 Epoch: 13 Global Step: 228070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:28,402-Speed 9520.36 samples/sec Loss 4.7167 LearningRate 0.0100 Epoch: 13 Global Step: 228080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:29,483-Speed 9476.21 samples/sec Loss 4.8312 LearningRate 0.0100 Epoch: 13 Global Step: 228090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:30,562-Speed 9501.24 samples/sec Loss 4.8005 LearningRate 0.0100 Epoch: 13 Global Step: 228100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:31,663-Speed 9302.19 samples/sec Loss 4.7114 LearningRate 0.0100 Epoch: 13 Global Step: 228110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:32,761-Speed 9332.40 samples/sec Loss 4.8250 LearningRate 0.0100 Epoch: 13 Global Step: 228120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:33,827-Speed 9610.40 samples/sec Loss 4.7991 LearningRate 0.0100 Epoch: 13 Global Step: 228130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:34,903-Speed 9520.11 samples/sec Loss 4.7402 LearningRate 0.0100 Epoch: 13 Global Step: 228140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:36,011-Speed 9249.68 samples/sec Loss 4.8598 LearningRate 0.0100 Epoch: 13 Global Step: 228150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:37,116-Speed 9273.14 samples/sec Loss 4.7173 LearningRate 0.0100 Epoch: 13 Global Step: 228160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:38,224-Speed 9247.61 samples/sec Loss 4.7295 LearningRate 0.0100 Epoch: 13 Global Step: 228170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:39,342-Speed 9164.15 samples/sec Loss 4.7636 LearningRate 0.0100 Epoch: 13 Global Step: 228180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:40,433-Speed 9389.72 samples/sec Loss 4.7694 LearningRate 0.0100 Epoch: 13 Global Step: 228190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:41,523-Speed 9406.00 samples/sec Loss 4.8462 LearningRate 0.0100 Epoch: 13 Global Step: 228200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:42,618-Speed 9353.54 samples/sec Loss 4.8371 LearningRate 0.0100 Epoch: 13 Global Step: 228210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:43,672-Speed 9721.67 samples/sec Loss 4.6229 LearningRate 0.0100 Epoch: 13 Global Step: 228220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:45:44,759-Speed 9430.97 samples/sec Loss 4.8764 LearningRate 0.0100 Epoch: 13 Global Step: 228230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:45:45,809-Speed 9755.31 samples/sec Loss 4.7267 LearningRate 0.0100 Epoch: 13 Global Step: 228240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:45:46,882-Speed 9553.06 samples/sec Loss 4.7823 LearningRate 0.0100 Epoch: 13 Global Step: 228250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:45:48,008-Speed 9095.21 samples/sec Loss 4.8108 LearningRate 0.0100 Epoch: 13 Global Step: 228260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:45:49,103-Speed 9357.64 samples/sec Loss 4.7828 LearningRate 0.0100 Epoch: 13 Global Step: 228270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:45:50,181-Speed 9507.53 samples/sec Loss 4.7313 LearningRate 0.0100 Epoch: 13 Global Step: 228280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:45:51,270-Speed 9408.98 samples/sec Loss 4.6813 LearningRate 0.0100 Epoch: 13 Global Step: 228290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:45:52,352-Speed 9471.26 samples/sec Loss 4.6828 LearningRate 0.0100 Epoch: 13 Global Step: 228300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:45:53,402-Speed 9759.45 samples/sec Loss 4.6830 LearningRate 0.0100 Epoch: 13 Global Step: 228310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:45:54,495-Speed 9369.16 samples/sec Loss 4.7945 LearningRate 0.0100 Epoch: 13 Global Step: 228320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:55,560-Speed 9624.75 samples/sec Loss 4.8282 LearningRate 0.0100 Epoch: 13 Global Step: 228330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:56,628-Speed 9596.91 samples/sec Loss 4.8057 LearningRate 0.0100 Epoch: 13 Global Step: 228340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:57,730-Speed 9297.06 samples/sec Loss 4.7135 LearningRate 0.0100 Epoch: 13 Global Step: 228350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:58,828-Speed 9335.21 samples/sec Loss 4.6848 LearningRate 0.0100 Epoch: 13 Global Step: 228360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:45:59,934-Speed 9262.32 samples/sec Loss 4.7236 LearningRate 0.0100 Epoch: 13 Global Step: 228370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:01,026-Speed 9386.56 samples/sec Loss 4.6499 LearningRate 0.0100 Epoch: 13 Global Step: 228380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:02,106-Speed 9485.35 samples/sec Loss 4.8026 LearningRate 0.0100 Epoch: 13 Global Step: 228390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:03,181-Speed 9532.64 samples/sec Loss 4.7030 LearningRate 0.0100 Epoch: 13 Global Step: 228400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:04,288-Speed 9255.86 samples/sec Loss 4.7149 LearningRate 0.0100 Epoch: 13 Global Step: 228410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:05,371-Speed 9456.47 samples/sec Loss 4.8157 LearningRate 0.0100 Epoch: 13 Global Step: 228420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:06,512-Speed 8981.32 samples/sec Loss 4.7177 LearningRate 0.0100 Epoch: 13 Global Step: 228430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:07,600-Speed 9411.68 samples/sec Loss 4.7761 LearningRate 0.0100 Epoch: 13 Global Step: 228440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:08,673-Speed 9559.80 samples/sec Loss 4.6723 LearningRate 0.0100 Epoch: 13 Global Step: 228450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:09,777-Speed 9277.16 samples/sec Loss 4.6552 LearningRate 0.0100 Epoch: 13 Global Step: 228460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:10,857-Speed 9488.31 samples/sec Loss 4.7720 LearningRate 0.0100 Epoch: 13 Global Step: 228470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:11,951-Speed 9368.38 samples/sec Loss 4.8655 LearningRate 0.0100 Epoch: 13 Global Step: 228480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:13,015-Speed 9627.72 samples/sec Loss 4.8182 LearningRate 0.0100 Epoch: 13 Global Step: 228490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:14,077-Speed 9649.11 samples/sec Loss 4.7581 LearningRate 0.0100 Epoch: 13 Global Step: 228500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:15,130-Speed 9727.55 samples/sec Loss 4.6274 LearningRate 0.0100 Epoch: 13 Global Step: 228510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:16,223-Speed 9371.16 samples/sec Loss 4.8411 LearningRate 0.0100 Epoch: 13 Global Step: 228520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:17,256-Speed 9926.22 samples/sec Loss 4.7366 LearningRate 0.0099 Epoch: 13 Global Step: 228530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:18,346-Speed 9396.44 samples/sec Loss 4.6860 LearningRate 0.0099 Epoch: 13 Global Step: 228540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:19,459-Speed 9206.93 samples/sec Loss 4.6953 LearningRate 0.0099 Epoch: 13 Global Step: 228550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:20,522-Speed 9634.23 samples/sec Loss 4.7310 LearningRate 0.0099 Epoch: 13 Global Step: 228560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:21,631-Speed 9243.91 samples/sec Loss 4.7476 LearningRate 0.0099 Epoch: 13 Global Step: 228570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:22,729-Speed 9332.76 samples/sec Loss 4.7526 LearningRate 0.0099 Epoch: 13 Global Step: 228580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:23,826-Speed 9340.31 samples/sec Loss 4.6873 LearningRate 0.0099 Epoch: 13 Global Step: 228590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:24,917-Speed 9391.32 samples/sec Loss 4.7455 LearningRate 0.0099 Epoch: 13 Global Step: 228600 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:46:26,000-Speed 9460.28 samples/sec Loss 4.7411 LearningRate 0.0099 Epoch: 13 Global Step: 228610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:27,048-Speed 9773.81 samples/sec Loss 4.8110 LearningRate 0.0099 Epoch: 13 Global Step: 228620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:28,144-Speed 9350.34 samples/sec Loss 4.7224 LearningRate 0.0099 Epoch: 13 Global Step: 228630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:29,193-Speed 9777.46 samples/sec Loss 4.7782 LearningRate 0.0099 Epoch: 13 Global Step: 228640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:30,251-Speed 9682.07 samples/sec Loss 4.6558 LearningRate 0.0099 Epoch: 13 Global Step: 228650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:31,334-Speed 9455.88 samples/sec Loss 4.7369 LearningRate 0.0099 Epoch: 13 Global Step: 228660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:32,397-Speed 9644.15 samples/sec Loss 4.7039 LearningRate 0.0099 Epoch: 13 Global Step: 228670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:33,502-Speed 9273.04 samples/sec Loss 4.6194 LearningRate 0.0099 Epoch: 13 Global Step: 228680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:34,574-Speed 9557.48 samples/sec Loss 4.7271 LearningRate 0.0099 Epoch: 13 Global Step: 228690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:35,674-Speed 9313.07 samples/sec Loss 4.7503 LearningRate 0.0099 Epoch: 13 Global Step: 228700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:36,748-Speed 9536.20 samples/sec Loss 4.6908 LearningRate 0.0099 Epoch: 13 Global Step: 228710 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:46:37,836-Speed 9420.17 samples/sec Loss 4.7403 LearningRate 0.0099 Epoch: 13 Global Step: 228720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:38,905-Speed 9579.22 samples/sec Loss 4.7530 LearningRate 0.0099 Epoch: 13 Global Step: 228730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:39,992-Speed 9430.53 samples/sec Loss 4.7582 LearningRate 0.0099 Epoch: 13 Global Step: 228740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:41,042-Speed 9757.60 samples/sec Loss 4.8404 LearningRate 0.0099 Epoch: 13 Global Step: 228750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:42,133-Speed 9391.08 samples/sec Loss 4.7606 LearningRate 0.0099 Epoch: 13 Global Step: 228760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:43,234-Speed 9307.50 samples/sec Loss 4.7854 LearningRate 0.0099 Epoch: 13 Global Step: 228770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:44,300-Speed 9613.82 samples/sec Loss 4.6864 LearningRate 0.0099 Epoch: 13 Global Step: 228780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:45,367-Speed 9597.22 samples/sec Loss 4.7426 LearningRate 0.0099 Epoch: 13 Global Step: 228790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:46,472-Speed 9271.27 samples/sec Loss 4.7872 LearningRate 0.0099 Epoch: 13 Global Step: 228800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:47,587-Speed 9196.03 samples/sec Loss 4.8111 LearningRate 0.0099 Epoch: 13 Global Step: 228810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:48,686-Speed 9329.57 samples/sec Loss 4.7510 LearningRate 0.0099 Epoch: 13 Global Step: 228820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:49,778-Speed 9378.15 samples/sec Loss 4.6383 LearningRate 0.0099 Epoch: 13 Global Step: 228830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:50,915-Speed 9008.17 samples/sec Loss 4.7676 LearningRate 0.0099 Epoch: 13 Global Step: 228840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:52,013-Speed 9340.02 samples/sec Loss 4.6816 LearningRate 0.0099 Epoch: 13 Global Step: 228850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:46:53,096-Speed 9465.02 samples/sec Loss 4.8656 LearningRate 0.0099 Epoch: 13 Global Step: 228860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:54,143-Speed 9783.24 samples/sec Loss 4.7468 LearningRate 0.0099 Epoch: 13 Global Step: 228870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:55,239-Speed 9350.92 samples/sec Loss 4.7818 LearningRate 0.0099 Epoch: 13 Global Step: 228880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:56,326-Speed 9422.99 samples/sec Loss 4.7441 LearningRate 0.0099 Epoch: 13 Global Step: 228890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:57,374-Speed 9774.06 samples/sec Loss 4.6429 LearningRate 0.0099 Epoch: 13 Global Step: 228900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:58,438-Speed 9635.24 samples/sec Loss 4.7427 LearningRate 0.0099 Epoch: 13 Global Step: 228910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:46:59,557-Speed 9157.16 samples/sec Loss 4.7310 LearningRate 0.0099 Epoch: 13 Global Step: 228920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:00,656-Speed 9324.14 samples/sec Loss 4.7325 LearningRate 0.0099 Epoch: 13 Global Step: 228930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:01,734-Speed 9498.03 samples/sec Loss 4.7329 LearningRate 0.0099 Epoch: 13 Global Step: 228940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:02,793-Speed 9673.85 samples/sec Loss 4.7384 LearningRate 0.0099 Epoch: 13 Global Step: 228950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:03,837-Speed 9814.31 samples/sec Loss 4.7160 LearningRate 0.0099 Epoch: 13 Global Step: 228960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:04,924-Speed 9425.40 samples/sec Loss 4.7566 LearningRate 0.0099 Epoch: 13 Global Step: 228970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:05,982-Speed 9693.31 samples/sec Loss 4.7489 LearningRate 0.0099 Epoch: 13 Global Step: 228980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:07,050-Speed 9594.24 samples/sec Loss 4.8390 LearningRate 0.0099 Epoch: 13 Global Step: 228990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:08,101-Speed 9747.52 samples/sec Loss 4.6623 LearningRate 0.0099 Epoch: 13 Global Step: 229000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:09,165-Speed 9633.23 samples/sec Loss 4.6880 LearningRate 0.0099 Epoch: 13 Global Step: 229010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:10,256-Speed 9390.20 samples/sec Loss 4.7540 LearningRate 0.0099 Epoch: 13 Global Step: 229020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:11,345-Speed 9413.71 samples/sec Loss 4.7327 LearningRate 0.0099 Epoch: 13 Global Step: 229030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:12,457-Speed 9213.93 samples/sec Loss 4.7066 LearningRate 0.0099 Epoch: 13 Global Step: 229040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:13,530-Speed 9542.45 samples/sec Loss 4.7067 LearningRate 0.0099 Epoch: 13 Global Step: 229050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:14,606-Speed 9525.44 samples/sec Loss 4.8090 LearningRate 0.0098 Epoch: 13 Global Step: 229060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:15,718-Speed 9210.27 samples/sec Loss 4.7178 LearningRate 0.0098 Epoch: 13 Global Step: 229070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:16,782-Speed 9630.17 samples/sec Loss 4.8108 LearningRate 0.0098 Epoch: 13 Global Step: 229080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:17,908-Speed 9098.54 samples/sec Loss 4.7461 LearningRate 0.0098 Epoch: 13 Global Step: 229090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:19,027-Speed 9156.75 samples/sec Loss 4.7160 LearningRate 0.0098 Epoch: 13 Global Step: 229100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:20,155-Speed 9084.60 samples/sec Loss 4.8129 LearningRate 0.0098 Epoch: 13 Global Step: 229110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:21,242-Speed 9429.28 samples/sec Loss 4.7960 LearningRate 0.0098 Epoch: 13 Global Step: 229120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:22,358-Speed 9176.34 samples/sec Loss 4.8145 LearningRate 0.0098 Epoch: 13 Global Step: 229130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:23,468-Speed 9234.39 samples/sec Loss 4.7545 LearningRate 0.0098 Epoch: 13 Global Step: 229140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:24,572-Speed 9281.53 samples/sec Loss 4.7198 LearningRate 0.0098 Epoch: 13 Global Step: 229150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:25,665-Speed 9377.81 samples/sec Loss 4.7081 LearningRate 0.0098 Epoch: 13 Global Step: 229160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:26,683-Speed 10066.11 samples/sec Loss 4.8402 LearningRate 0.0098 Epoch: 13 Global Step: 229170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:27,777-Speed 9365.63 samples/sec Loss 4.6767 LearningRate 0.0098 Epoch: 13 Global Step: 229180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:28,848-Speed 9563.32 samples/sec Loss 4.7508 LearningRate 0.0098 Epoch: 13 Global Step: 229190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:29,944-Speed 9350.97 samples/sec Loss 4.7049 LearningRate 0.0098 Epoch: 13 Global Step: 229200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:31,010-Speed 9609.07 samples/sec Loss 4.7630 LearningRate 0.0098 Epoch: 13 Global Step: 229210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:32,079-Speed 9590.77 samples/sec Loss 4.7412 LearningRate 0.0098 Epoch: 13 Global Step: 229220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:33,198-Speed 9154.34 samples/sec Loss 4.7499 LearningRate 0.0098 Epoch: 13 Global Step: 229230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:34,267-Speed 9582.04 samples/sec Loss 4.6657 LearningRate 0.0098 Epoch: 13 Global Step: 229240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:35,299-Speed 9928.87 samples/sec Loss 4.7366 LearningRate 0.0098 Epoch: 13 Global Step: 229250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:36,356-Speed 9691.35 samples/sec Loss 4.7626 LearningRate 0.0098 Epoch: 13 Global Step: 229260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:37,435-Speed 9493.21 samples/sec Loss 4.7236 LearningRate 0.0098 Epoch: 13 Global Step: 229270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:38,521-Speed 9438.59 samples/sec Loss 4.8221 LearningRate 0.0098 Epoch: 13 Global Step: 229280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:39,631-Speed 9229.81 samples/sec Loss 4.7388 LearningRate 0.0098 Epoch: 13 Global Step: 229290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:40,688-Speed 9697.58 samples/sec Loss 4.6944 LearningRate 0.0098 Epoch: 13 Global Step: 229300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:41,777-Speed 9407.95 samples/sec Loss 4.6746 LearningRate 0.0098 Epoch: 13 Global Step: 229310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:42,853-Speed 9523.47 samples/sec Loss 4.7826 LearningRate 0.0098 Epoch: 13 Global Step: 229320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:43,940-Speed 9423.23 samples/sec Loss 4.6801 LearningRate 0.0098 Epoch: 13 Global Step: 229330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:45,008-Speed 9592.53 samples/sec Loss 4.8171 LearningRate 0.0098 Epoch: 13 Global Step: 229340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:46,043-Speed 9906.37 samples/sec Loss 4.8292 LearningRate 0.0098 Epoch: 13 Global Step: 229350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:47,087-Speed 9811.60 samples/sec Loss 4.7233 LearningRate 0.0098 Epoch: 13 Global Step: 229360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:48,157-Speed 9571.35 samples/sec Loss 4.7102 LearningRate 0.0098 Epoch: 13 Global Step: 229370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:49,242-Speed 9448.78 samples/sec Loss 4.7777 LearningRate 0.0098 Epoch: 13 Global Step: 229380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:50,336-Speed 9360.33 samples/sec Loss 4.7141 LearningRate 0.0098 Epoch: 13 Global Step: 229390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:51,427-Speed 9396.76 samples/sec Loss 4.8511 LearningRate 0.0098 Epoch: 13 Global Step: 229400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:52,486-Speed 9674.90 samples/sec Loss 4.7407 LearningRate 0.0098 Epoch: 13 Global Step: 229410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:53,557-Speed 9561.58 samples/sec Loss 4.6529 LearningRate 0.0098 Epoch: 13 Global Step: 229420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:47:54,679-Speed 9134.50 samples/sec Loss 4.7370 LearningRate 0.0098 Epoch: 13 Global Step: 229430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:55,753-Speed 9538.63 samples/sec Loss 4.7729 LearningRate 0.0098 Epoch: 13 Global Step: 229440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:56,841-Speed 9423.98 samples/sec Loss 4.6939 LearningRate 0.0098 Epoch: 13 Global Step: 229450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:57,895-Speed 9725.00 samples/sec Loss 4.7493 LearningRate 0.0098 Epoch: 13 Global Step: 229460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:47:58,968-Speed 9545.77 samples/sec Loss 4.7447 LearningRate 0.0098 Epoch: 13 Global Step: 229470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:00,025-Speed 9692.47 samples/sec Loss 4.8285 LearningRate 0.0098 Epoch: 13 Global Step: 229480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:01,101-Speed 9526.76 samples/sec Loss 4.7489 LearningRate 0.0098 Epoch: 13 Global Step: 229490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:02,189-Speed 9418.73 samples/sec Loss 4.7452 LearningRate 0.0098 Epoch: 13 Global Step: 229500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:03,259-Speed 9577.12 samples/sec Loss 4.8357 LearningRate 0.0098 Epoch: 13 Global Step: 229510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:04,291-Speed 9924.25 samples/sec Loss 4.7478 LearningRate 0.0098 Epoch: 13 Global Step: 229520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:05,373-Speed 9471.94 samples/sec Loss 4.7340 LearningRate 0.0098 Epoch: 13 Global Step: 229530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:06,444-Speed 9562.97 samples/sec Loss 4.8034 LearningRate 0.0098 Epoch: 13 Global Step: 229540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:07,535-Speed 9389.40 samples/sec Loss 4.7839 LearningRate 0.0098 Epoch: 13 Global Step: 229550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:08,622-Speed 9428.89 samples/sec Loss 4.7569 LearningRate 0.0098 Epoch: 13 Global Step: 229560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:09,709-Speed 9424.56 samples/sec Loss 4.7972 LearningRate 0.0098 Epoch: 13 Global Step: 229570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:10,837-Speed 9087.99 samples/sec Loss 4.7068 LearningRate 0.0098 Epoch: 13 Global Step: 229580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:11,900-Speed 9636.13 samples/sec Loss 4.7235 LearningRate 0.0097 Epoch: 13 Global Step: 229590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:12,986-Speed 9430.78 samples/sec Loss 4.7116 LearningRate 0.0097 Epoch: 13 Global Step: 229600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:14,039-Speed 9730.18 samples/sec Loss 4.7555 LearningRate 0.0097 Epoch: 13 Global Step: 229610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:15,139-Speed 9316.66 samples/sec Loss 4.7543 LearningRate 0.0097 Epoch: 13 Global Step: 229620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:16,202-Speed 9635.50 samples/sec Loss 4.6645 LearningRate 0.0097 Epoch: 13 Global Step: 229630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:17,284-Speed 9472.06 samples/sec Loss 4.7150 LearningRate 0.0097 Epoch: 13 Global Step: 229640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:18,424-Speed 8983.39 samples/sec Loss 4.7398 LearningRate 0.0097 Epoch: 13 Global Step: 229650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:19,506-Speed 9470.65 samples/sec Loss 4.7573 LearningRate 0.0097 Epoch: 13 Global Step: 229660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:20,579-Speed 9552.97 samples/sec Loss 4.8490 LearningRate 0.0097 Epoch: 13 Global Step: 229670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:21,644-Speed 9628.13 samples/sec Loss 4.6756 LearningRate 0.0097 Epoch: 13 Global Step: 229680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:22,768-Speed 9119.32 samples/sec Loss 4.8136 LearningRate 0.0097 Epoch: 13 Global Step: 229690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:23,870-Speed 9297.35 samples/sec Loss 4.7187 LearningRate 0.0097 Epoch: 13 Global Step: 229700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:24,926-Speed 9697.35 samples/sec Loss 4.6798 LearningRate 0.0097 Epoch: 13 Global Step: 229710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:26,016-Speed 9405.44 samples/sec Loss 4.7595 LearningRate 0.0097 Epoch: 13 Global Step: 229720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:27,090-Speed 9538.56 samples/sec Loss 4.8223 LearningRate 0.0097 Epoch: 13 Global Step: 229730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:28,183-Speed 9383.51 samples/sec Loss 4.8520 LearningRate 0.0097 Epoch: 13 Global Step: 229740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:29,281-Speed 9334.68 samples/sec Loss 4.7189 LearningRate 0.0097 Epoch: 13 Global Step: 229750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:30,352-Speed 9566.70 samples/sec Loss 4.7981 LearningRate 0.0097 Epoch: 13 Global Step: 229760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:31,410-Speed 9682.05 samples/sec Loss 4.8703 LearningRate 0.0097 Epoch: 13 Global Step: 229770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:32,491-Speed 9479.40 samples/sec Loss 4.7483 LearningRate 0.0097 Epoch: 13 Global Step: 229780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:33,555-Speed 9634.61 samples/sec Loss 4.7550 LearningRate 0.0097 Epoch: 13 Global Step: 229790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:34,595-Speed 9847.35 samples/sec Loss 4.7628 LearningRate 0.0097 Epoch: 13 Global Step: 229800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:35,671-Speed 9526.03 samples/sec Loss 4.7391 LearningRate 0.0097 Epoch: 13 Global Step: 229810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:36,752-Speed 9475.01 samples/sec Loss 4.6883 LearningRate 0.0097 Epoch: 13 Global Step: 229820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:37,814-Speed 9644.77 samples/sec Loss 4.7504 LearningRate 0.0097 Epoch: 13 Global Step: 229830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:38,926-Speed 9220.51 samples/sec Loss 4.6585 LearningRate 0.0097 Epoch: 13 Global Step: 229840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:39,991-Speed 9617.85 samples/sec Loss 4.7650 LearningRate 0.0097 Epoch: 13 Global Step: 229850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:41,082-Speed 9398.96 samples/sec Loss 4.6653 LearningRate 0.0097 Epoch: 13 Global Step: 229860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:42,160-Speed 9496.21 samples/sec Loss 4.7702 LearningRate 0.0097 Epoch: 13 Global Step: 229870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:43,213-Speed 9737.51 samples/sec Loss 4.7692 LearningRate 0.0097 Epoch: 13 Global Step: 229880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:44,246-Speed 9916.59 samples/sec Loss 4.7146 LearningRate 0.0097 Epoch: 13 Global Step: 229890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:45,284-Speed 9871.93 samples/sec Loss 4.7184 LearningRate 0.0097 Epoch: 13 Global Step: 229900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:46,366-Speed 9467.86 samples/sec Loss 4.7491 LearningRate 0.0097 Epoch: 13 Global Step: 229910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:48:47,435-Speed 9581.48 samples/sec Loss 4.7853 LearningRate 0.0097 Epoch: 13 Global Step: 229920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:48,463-Speed 9968.29 samples/sec Loss 4.7684 LearningRate 0.0097 Epoch: 13 Global Step: 229930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:49,538-Speed 9532.96 samples/sec Loss 4.7206 LearningRate 0.0097 Epoch: 13 Global Step: 229940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:50,611-Speed 9545.64 samples/sec Loss 4.6864 LearningRate 0.0097 Epoch: 13 Global Step: 229950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:51,692-Speed 9480.01 samples/sec Loss 4.7463 LearningRate 0.0097 Epoch: 13 Global Step: 229960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:52,791-Speed 9326.73 samples/sec Loss 4.7105 LearningRate 0.0097 Epoch: 13 Global Step: 229970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:53,896-Speed 9264.77 samples/sec Loss 4.7526 LearningRate 0.0097 Epoch: 13 Global Step: 229980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:55,025-Speed 9080.50 samples/sec Loss 4.7512 LearningRate 0.0097 Epoch: 13 Global Step: 229990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:48:56,072-Speed 9778.86 samples/sec Loss 4.7257 LearningRate 0.0097 Epoch: 13 Global Step: 230000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:49:17,929-[lfw][230000]XNorm: 7.864795 Training: 2022-04-11 20:49:17,930-[lfw][230000]Accuracy-Flip: 0.99617+-0.00236 Training: 2022-04-11 20:49:17,930-[lfw][230000]Accuracy-Highest: 0.99733 Training: 2022-04-11 20:49:43,158-[cfp_fp][230000]XNorm: 6.765681 Training: 2022-04-11 20:49:43,159-[cfp_fp][230000]Accuracy-Flip: 0.96914+-0.00767 Training: 2022-04-11 20:49:43,160-[cfp_fp][230000]Accuracy-Highest: 0.96914 Training: 2022-04-11 20:50:04,909-[agedb_30][230000]XNorm: 7.607323 Training: 2022-04-11 20:50:04,910-[agedb_30][230000]Accuracy-Flip: 0.96900+-0.00955 Training: 2022-04-11 20:50:04,910-[agedb_30][230000]Accuracy-Highest: 0.97033 Training: 2022-04-11 20:50:06,018-Speed 146.40 samples/sec Loss 4.6567 LearningRate 0.0097 Epoch: 13 Global Step: 230010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:07,141-Speed 9124.00 samples/sec Loss 4.7554 LearningRate 0.0097 Epoch: 13 Global Step: 230020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:08,246-Speed 9275.40 samples/sec Loss 4.8190 LearningRate 0.0097 Epoch: 13 Global Step: 230030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:09,337-Speed 9393.31 samples/sec Loss 4.6925 LearningRate 0.0097 Epoch: 13 Global Step: 230040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:10,410-Speed 9546.79 samples/sec Loss 4.7112 LearningRate 0.0097 Epoch: 13 Global Step: 230050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:11,479-Speed 9584.60 samples/sec Loss 4.8027 LearningRate 0.0097 Epoch: 13 Global Step: 230060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:12,585-Speed 9261.88 samples/sec Loss 4.6949 LearningRate 0.0097 Epoch: 13 Global Step: 230070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:13,692-Speed 9255.29 samples/sec Loss 4.6824 LearningRate 0.0097 Epoch: 13 Global Step: 230080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:14,756-Speed 9631.34 samples/sec Loss 4.8113 LearningRate 0.0097 Epoch: 13 Global Step: 230090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:15,783-Speed 9980.32 samples/sec Loss 4.6157 LearningRate 0.0097 Epoch: 13 Global Step: 230100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:16,851-Speed 9592.94 samples/sec Loss 4.6817 LearningRate 0.0097 Epoch: 13 Global Step: 230110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:17,927-Speed 9516.86 samples/sec Loss 4.7273 LearningRate 0.0097 Epoch: 13 Global Step: 230120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:18,974-Speed 9788.73 samples/sec Loss 4.6843 LearningRate 0.0096 Epoch: 13 Global Step: 230130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:20,028-Speed 9720.21 samples/sec Loss 4.8367 LearningRate 0.0096 Epoch: 13 Global Step: 230140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:21,080-Speed 9737.21 samples/sec Loss 4.7656 LearningRate 0.0096 Epoch: 13 Global Step: 230150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:22,157-Speed 9518.79 samples/sec Loss 4.7337 LearningRate 0.0096 Epoch: 13 Global Step: 230160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:23,276-Speed 9152.65 samples/sec Loss 4.7326 LearningRate 0.0096 Epoch: 13 Global Step: 230170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:24,376-Speed 9322.88 samples/sec Loss 4.8235 LearningRate 0.0096 Epoch: 13 Global Step: 230180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:25,470-Speed 9358.92 samples/sec Loss 4.7614 LearningRate 0.0096 Epoch: 13 Global Step: 230190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:26,568-Speed 9336.66 samples/sec Loss 4.7101 LearningRate 0.0096 Epoch: 13 Global Step: 230200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:27,664-Speed 9345.23 samples/sec Loss 4.6346 LearningRate 0.0096 Epoch: 13 Global Step: 230210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:28,725-Speed 9659.38 samples/sec Loss 4.7660 LearningRate 0.0096 Epoch: 13 Global Step: 230220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:29,764-Speed 9865.65 samples/sec Loss 4.6806 LearningRate 0.0096 Epoch: 13 Global Step: 230230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:30,853-Speed 9409.39 samples/sec Loss 4.8306 LearningRate 0.0096 Epoch: 13 Global Step: 230240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:31,921-Speed 9588.68 samples/sec Loss 4.8035 LearningRate 0.0096 Epoch: 13 Global Step: 230250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:33,001-Speed 9486.86 samples/sec Loss 4.8218 LearningRate 0.0096 Epoch: 13 Global Step: 230260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:34,110-Speed 9239.70 samples/sec Loss 4.6840 LearningRate 0.0096 Epoch: 13 Global Step: 230270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:35,214-Speed 9277.55 samples/sec Loss 4.8002 LearningRate 0.0096 Epoch: 13 Global Step: 230280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:36,321-Speed 9255.81 samples/sec Loss 4.6970 LearningRate 0.0096 Epoch: 13 Global Step: 230290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:37,395-Speed 9547.87 samples/sec Loss 4.7826 LearningRate 0.0096 Epoch: 13 Global Step: 230300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:38,451-Speed 9700.63 samples/sec Loss 4.7190 LearningRate 0.0096 Epoch: 13 Global Step: 230310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:39,552-Speed 9304.12 samples/sec Loss 4.6748 LearningRate 0.0096 Epoch: 13 Global Step: 230320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:50:40,642-Speed 9401.39 samples/sec Loss 4.8019 LearningRate 0.0096 Epoch: 13 Global Step: 230330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:41,737-Speed 9363.74 samples/sec Loss 4.7195 LearningRate 0.0096 Epoch: 13 Global Step: 230340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:42,806-Speed 9582.52 samples/sec Loss 4.7731 LearningRate 0.0096 Epoch: 13 Global Step: 230350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:43,873-Speed 9601.44 samples/sec Loss 4.7510 LearningRate 0.0096 Epoch: 13 Global Step: 230360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:44,946-Speed 9547.09 samples/sec Loss 4.7554 LearningRate 0.0096 Epoch: 13 Global Step: 230370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:46,043-Speed 9348.84 samples/sec Loss 4.7685 LearningRate 0.0096 Epoch: 13 Global Step: 230380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:47,132-Speed 9404.58 samples/sec Loss 4.7198 LearningRate 0.0096 Epoch: 13 Global Step: 230390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:48,206-Speed 9538.84 samples/sec Loss 4.6683 LearningRate 0.0096 Epoch: 13 Global Step: 230400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:49,364-Speed 8847.62 samples/sec Loss 4.6873 LearningRate 0.0096 Epoch: 13 Global Step: 230410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:50,436-Speed 9556.48 samples/sec Loss 4.6919 LearningRate 0.0096 Epoch: 13 Global Step: 230420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:51,516-Speed 9496.51 samples/sec Loss 4.7268 LearningRate 0.0096 Epoch: 13 Global Step: 230430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:52,549-Speed 9920.45 samples/sec Loss 4.7413 LearningRate 0.0096 Epoch: 13 Global Step: 230440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:53,645-Speed 9347.39 samples/sec Loss 4.8011 LearningRate 0.0096 Epoch: 13 Global Step: 230450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:54,746-Speed 9300.37 samples/sec Loss 4.8112 LearningRate 0.0096 Epoch: 13 Global Step: 230460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:55,776-Speed 9948.85 samples/sec Loss 4.8025 LearningRate 0.0096 Epoch: 13 Global Step: 230470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:56,861-Speed 9446.44 samples/sec Loss 4.7383 LearningRate 0.0096 Epoch: 13 Global Step: 230480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:57,971-Speed 9226.49 samples/sec Loss 4.7095 LearningRate 0.0096 Epoch: 13 Global Step: 230490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:50:59,046-Speed 9533.80 samples/sec Loss 4.6814 LearningRate 0.0096 Epoch: 13 Global Step: 230500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:00,092-Speed 9798.40 samples/sec Loss 4.7870 LearningRate 0.0096 Epoch: 13 Global Step: 230510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:01,157-Speed 9618.27 samples/sec Loss 4.8458 LearningRate 0.0096 Epoch: 13 Global Step: 230520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:02,232-Speed 9530.04 samples/sec Loss 4.7438 LearningRate 0.0096 Epoch: 13 Global Step: 230530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:03,335-Speed 9286.33 samples/sec Loss 4.6752 LearningRate 0.0096 Epoch: 13 Global Step: 230540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:04,468-Speed 9047.33 samples/sec Loss 4.7001 LearningRate 0.0096 Epoch: 13 Global Step: 230550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:05,551-Speed 9458.69 samples/sec Loss 4.6855 LearningRate 0.0096 Epoch: 13 Global Step: 230560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:06,677-Speed 9100.12 samples/sec Loss 4.7807 LearningRate 0.0096 Epoch: 13 Global Step: 230570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:07,805-Speed 9082.53 samples/sec Loss 4.7856 LearningRate 0.0096 Epoch: 13 Global Step: 230580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:08,932-Speed 9095.49 samples/sec Loss 4.7882 LearningRate 0.0096 Epoch: 13 Global Step: 230590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:09,995-Speed 9644.46 samples/sec Loss 4.7600 LearningRate 0.0096 Epoch: 13 Global Step: 230600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:11,052-Speed 9699.34 samples/sec Loss 4.7763 LearningRate 0.0096 Epoch: 13 Global Step: 230610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:12,133-Speed 9472.31 samples/sec Loss 4.6347 LearningRate 0.0096 Epoch: 13 Global Step: 230620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:13,211-Speed 9503.88 samples/sec Loss 4.6714 LearningRate 0.0096 Epoch: 13 Global Step: 230630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:14,343-Speed 9051.59 samples/sec Loss 4.7835 LearningRate 0.0096 Epoch: 13 Global Step: 230640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:15,428-Speed 9446.29 samples/sec Loss 4.7182 LearningRate 0.0096 Epoch: 13 Global Step: 230650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:16,497-Speed 9583.66 samples/sec Loss 4.7089 LearningRate 0.0095 Epoch: 13 Global Step: 230660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:17,615-Speed 9167.95 samples/sec Loss 4.8119 LearningRate 0.0095 Epoch: 13 Global Step: 230670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:18,715-Speed 9307.06 samples/sec Loss 4.7366 LearningRate 0.0095 Epoch: 13 Global Step: 230680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:19,816-Speed 9305.62 samples/sec Loss 4.7875 LearningRate 0.0095 Epoch: 13 Global Step: 230690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:20,928-Speed 9215.52 samples/sec Loss 4.7192 LearningRate 0.0095 Epoch: 13 Global Step: 230700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:22,002-Speed 9546.78 samples/sec Loss 4.7540 LearningRate 0.0095 Epoch: 13 Global Step: 230710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:23,097-Speed 9358.70 samples/sec Loss 4.8298 LearningRate 0.0095 Epoch: 13 Global Step: 230720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:24,193-Speed 9354.35 samples/sec Loss 4.7747 LearningRate 0.0095 Epoch: 13 Global Step: 230730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:25,274-Speed 9479.36 samples/sec Loss 4.7401 LearningRate 0.0095 Epoch: 13 Global Step: 230740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:26,303-Speed 9954.30 samples/sec Loss 4.7763 LearningRate 0.0095 Epoch: 13 Global Step: 230750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:27,383-Speed 9488.31 samples/sec Loss 4.6754 LearningRate 0.0095 Epoch: 13 Global Step: 230760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:28,483-Speed 9313.22 samples/sec Loss 4.8248 LearningRate 0.0095 Epoch: 13 Global Step: 230770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:29,588-Speed 9273.84 samples/sec Loss 4.8142 LearningRate 0.0095 Epoch: 13 Global Step: 230780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:30,697-Speed 9240.20 samples/sec Loss 4.8517 LearningRate 0.0095 Epoch: 13 Global Step: 230790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:31,791-Speed 9359.32 samples/sec Loss 4.7660 LearningRate 0.0095 Epoch: 13 Global Step: 230800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:32,861-Speed 9581.20 samples/sec Loss 4.8021 LearningRate 0.0095 Epoch: 13 Global Step: 230810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:33,914-Speed 9729.05 samples/sec Loss 4.7484 LearningRate 0.0095 Epoch: 13 Global Step: 230820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:35,036-Speed 9127.38 samples/sec Loss 4.7953 LearningRate 0.0095 Epoch: 13 Global Step: 230830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:36,140-Speed 9282.78 samples/sec Loss 4.7389 LearningRate 0.0095 Epoch: 13 Global Step: 230840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:37,237-Speed 9339.28 samples/sec Loss 4.7148 LearningRate 0.0095 Epoch: 13 Global Step: 230850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:38,405-Speed 8875.77 samples/sec Loss 4.7514 LearningRate 0.0095 Epoch: 13 Global Step: 230860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:39,495-Speed 9395.69 samples/sec Loss 4.7910 LearningRate 0.0095 Epoch: 13 Global Step: 230870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:40,589-Speed 9369.36 samples/sec Loss 4.8077 LearningRate 0.0095 Epoch: 13 Global Step: 230880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:41,713-Speed 9119.47 samples/sec Loss 4.7335 LearningRate 0.0095 Epoch: 13 Global Step: 230890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:42,801-Speed 9417.83 samples/sec Loss 4.7212 LearningRate 0.0095 Epoch: 13 Global Step: 230900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:43,852-Speed 9752.55 samples/sec Loss 4.8247 LearningRate 0.0095 Epoch: 13 Global Step: 230910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:44,987-Speed 9023.54 samples/sec Loss 4.7223 LearningRate 0.0095 Epoch: 13 Global Step: 230920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:46,057-Speed 9579.24 samples/sec Loss 4.7966 LearningRate 0.0095 Epoch: 13 Global Step: 230930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:47,165-Speed 9241.63 samples/sec Loss 4.8558 LearningRate 0.0095 Epoch: 13 Global Step: 230940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:48,254-Speed 9414.83 samples/sec Loss 4.7117 LearningRate 0.0095 Epoch: 13 Global Step: 230950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:49,335-Speed 9476.86 samples/sec Loss 4.8310 LearningRate 0.0095 Epoch: 13 Global Step: 230960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:50,399-Speed 9627.14 samples/sec Loss 4.7264 LearningRate 0.0095 Epoch: 13 Global Step: 230970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:51,456-Speed 9690.99 samples/sec Loss 4.6901 LearningRate 0.0095 Epoch: 13 Global Step: 230980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:52,550-Speed 9371.77 samples/sec Loss 4.7438 LearningRate 0.0095 Epoch: 13 Global Step: 230990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:53,649-Speed 9322.32 samples/sec Loss 4.7711 LearningRate 0.0095 Epoch: 13 Global Step: 231000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:54,735-Speed 9430.11 samples/sec Loss 4.7643 LearningRate 0.0095 Epoch: 13 Global Step: 231010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:51:55,820-Speed 9441.96 samples/sec Loss 4.8102 LearningRate 0.0095 Epoch: 13 Global Step: 231020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:51:56,909-Speed 9413.47 samples/sec Loss 4.7160 LearningRate 0.0095 Epoch: 13 Global Step: 231030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:51:57,975-Speed 9611.52 samples/sec Loss 4.7338 LearningRate 0.0095 Epoch: 13 Global Step: 231040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:51:59,093-Speed 9169.21 samples/sec Loss 4.6466 LearningRate 0.0095 Epoch: 13 Global Step: 231050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:00,193-Speed 9318.07 samples/sec Loss 4.7721 LearningRate 0.0095 Epoch: 13 Global Step: 231060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:01,256-Speed 9632.76 samples/sec Loss 4.7447 LearningRate 0.0095 Epoch: 13 Global Step: 231070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:02,334-Speed 9504.31 samples/sec Loss 4.6996 LearningRate 0.0095 Epoch: 13 Global Step: 231080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:03,430-Speed 9353.79 samples/sec Loss 4.7345 LearningRate 0.0095 Epoch: 13 Global Step: 231090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:04,561-Speed 9052.32 samples/sec Loss 4.6927 LearningRate 0.0095 Epoch: 13 Global Step: 231100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:05,611-Speed 9765.64 samples/sec Loss 4.8582 LearningRate 0.0095 Epoch: 13 Global Step: 231110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:06,688-Speed 9510.85 samples/sec Loss 4.8696 LearningRate 0.0095 Epoch: 13 Global Step: 231120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:07,743-Speed 9711.44 samples/sec Loss 4.6904 LearningRate 0.0095 Epoch: 13 Global Step: 231130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:08,800-Speed 9696.75 samples/sec Loss 4.6480 LearningRate 0.0095 Epoch: 13 Global Step: 231140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:09,853-Speed 9727.27 samples/sec Loss 4.8141 LearningRate 0.0095 Epoch: 13 Global Step: 231150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:10,932-Speed 9492.18 samples/sec Loss 4.6530 LearningRate 0.0095 Epoch: 13 Global Step: 231160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:12,039-Speed 9261.12 samples/sec Loss 4.7721 LearningRate 0.0095 Epoch: 13 Global Step: 231170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:13,134-Speed 9354.59 samples/sec Loss 4.7022 LearningRate 0.0095 Epoch: 13 Global Step: 231180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:14,230-Speed 9347.35 samples/sec Loss 4.7583 LearningRate 0.0095 Epoch: 13 Global Step: 231190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:15,327-Speed 9337.01 samples/sec Loss 4.7112 LearningRate 0.0095 Epoch: 13 Global Step: 231200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:16,448-Speed 9142.63 samples/sec Loss 4.7159 LearningRate 0.0094 Epoch: 13 Global Step: 231210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:17,553-Speed 9272.08 samples/sec Loss 4.6033 LearningRate 0.0094 Epoch: 13 Global Step: 231220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:18,625-Speed 9556.32 samples/sec Loss 4.7151 LearningRate 0.0094 Epoch: 13 Global Step: 231230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:52:19,718-Speed 9381.47 samples/sec Loss 4.7327 LearningRate 0.0094 Epoch: 13 Global Step: 231240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:20,773-Speed 9704.27 samples/sec Loss 4.7293 LearningRate 0.0094 Epoch: 13 Global Step: 231250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:21,855-Speed 9476.28 samples/sec Loss 4.7596 LearningRate 0.0094 Epoch: 13 Global Step: 231260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:22,924-Speed 9589.31 samples/sec Loss 4.7369 LearningRate 0.0094 Epoch: 13 Global Step: 231270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:24,013-Speed 9406.25 samples/sec Loss 4.8462 LearningRate 0.0094 Epoch: 13 Global Step: 231280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:25,116-Speed 9284.94 samples/sec Loss 4.7619 LearningRate 0.0094 Epoch: 13 Global Step: 231290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:26,208-Speed 9379.22 samples/sec Loss 4.8537 LearningRate 0.0094 Epoch: 13 Global Step: 231300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:27,319-Speed 9227.99 samples/sec Loss 4.6926 LearningRate 0.0094 Epoch: 13 Global Step: 231310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:28,411-Speed 9398.62 samples/sec Loss 4.6486 LearningRate 0.0094 Epoch: 13 Global Step: 231320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:29,508-Speed 9335.86 samples/sec Loss 4.7066 LearningRate 0.0094 Epoch: 13 Global Step: 231330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:30,593-Speed 9445.11 samples/sec Loss 4.6720 LearningRate 0.0094 Epoch: 13 Global Step: 231340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:31,722-Speed 9078.37 samples/sec Loss 4.6779 LearningRate 0.0094 Epoch: 13 Global Step: 231350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:32,810-Speed 9410.89 samples/sec Loss 4.6992 LearningRate 0.0094 Epoch: 13 Global Step: 231360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:33,890-Speed 9490.64 samples/sec Loss 4.7142 LearningRate 0.0094 Epoch: 13 Global Step: 231370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:34,962-Speed 9559.08 samples/sec Loss 4.7472 LearningRate 0.0094 Epoch: 13 Global Step: 231380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:36,056-Speed 9362.10 samples/sec Loss 4.7674 LearningRate 0.0094 Epoch: 13 Global Step: 231390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:37,153-Speed 9339.13 samples/sec Loss 4.6558 LearningRate 0.0094 Epoch: 13 Global Step: 231400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:52:38,265-Speed 9211.37 samples/sec Loss 4.7505 LearningRate 0.0094 Epoch: 13 Global Step: 231410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:39,386-Speed 9143.49 samples/sec Loss 4.7778 LearningRate 0.0094 Epoch: 13 Global Step: 231420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:40,520-Speed 9039.65 samples/sec Loss 4.6706 LearningRate 0.0094 Epoch: 13 Global Step: 231430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:41,618-Speed 9335.10 samples/sec Loss 4.6793 LearningRate 0.0094 Epoch: 13 Global Step: 231440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:42,683-Speed 9618.37 samples/sec Loss 4.6645 LearningRate 0.0094 Epoch: 13 Global Step: 231450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:43,777-Speed 9363.18 samples/sec Loss 4.8098 LearningRate 0.0094 Epoch: 13 Global Step: 231460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:44,826-Speed 9768.30 samples/sec Loss 4.7614 LearningRate 0.0094 Epoch: 13 Global Step: 231470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:45,925-Speed 9326.80 samples/sec Loss 4.6377 LearningRate 0.0094 Epoch: 13 Global Step: 231480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:46,999-Speed 9540.20 samples/sec Loss 4.7752 LearningRate 0.0094 Epoch: 13 Global Step: 231490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:48,082-Speed 9460.36 samples/sec Loss 4.6934 LearningRate 0.0094 Epoch: 13 Global Step: 231500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:49,161-Speed 9488.53 samples/sec Loss 4.7000 LearningRate 0.0094 Epoch: 13 Global Step: 231510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:52:50,209-Speed 9779.51 samples/sec Loss 4.7780 LearningRate 0.0094 Epoch: 13 Global Step: 231520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:51,289-Speed 9490.95 samples/sec Loss 4.7764 LearningRate 0.0094 Epoch: 13 Global Step: 231530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:52,399-Speed 9226.86 samples/sec Loss 4.7432 LearningRate 0.0094 Epoch: 13 Global Step: 231540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:53,518-Speed 9157.90 samples/sec Loss 4.6744 LearningRate 0.0094 Epoch: 13 Global Step: 231550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:54,554-Speed 9886.82 samples/sec Loss 4.7010 LearningRate 0.0094 Epoch: 13 Global Step: 231560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:55,686-Speed 9054.91 samples/sec Loss 4.7312 LearningRate 0.0094 Epoch: 13 Global Step: 231570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:56,746-Speed 9661.19 samples/sec Loss 4.6631 LearningRate 0.0094 Epoch: 13 Global Step: 231580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:57,858-Speed 9217.83 samples/sec Loss 4.7200 LearningRate 0.0094 Epoch: 13 Global Step: 231590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:52:58,963-Speed 9275.13 samples/sec Loss 4.6471 LearningRate 0.0094 Epoch: 13 Global Step: 231600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:00,021-Speed 9685.25 samples/sec Loss 4.8828 LearningRate 0.0094 Epoch: 13 Global Step: 231610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:01,116-Speed 9357.34 samples/sec Loss 4.6721 LearningRate 0.0094 Epoch: 13 Global Step: 231620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:02,201-Speed 9443.61 samples/sec Loss 4.7151 LearningRate 0.0094 Epoch: 13 Global Step: 231630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:03,277-Speed 9526.59 samples/sec Loss 4.7090 LearningRate 0.0094 Epoch: 13 Global Step: 231640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:04,338-Speed 9652.58 samples/sec Loss 4.8411 LearningRate 0.0094 Epoch: 13 Global Step: 231650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:05,427-Speed 9407.42 samples/sec Loss 4.7008 LearningRate 0.0094 Epoch: 13 Global Step: 231660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:06,500-Speed 9550.53 samples/sec Loss 4.6984 LearningRate 0.0094 Epoch: 13 Global Step: 231670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:07,591-Speed 9393.20 samples/sec Loss 4.7032 LearningRate 0.0094 Epoch: 13 Global Step: 231680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:08,718-Speed 9086.05 samples/sec Loss 4.8549 LearningRate 0.0094 Epoch: 13 Global Step: 231690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:09,791-Speed 9548.89 samples/sec Loss 4.7308 LearningRate 0.0094 Epoch: 13 Global Step: 231700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:10,875-Speed 9458.97 samples/sec Loss 4.7025 LearningRate 0.0094 Epoch: 13 Global Step: 231710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:11,951-Speed 9526.90 samples/sec Loss 4.7092 LearningRate 0.0094 Epoch: 13 Global Step: 231720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:13,045-Speed 9365.80 samples/sec Loss 4.8040 LearningRate 0.0094 Epoch: 13 Global Step: 231730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:14,149-Speed 9279.63 samples/sec Loss 4.7005 LearningRate 0.0094 Epoch: 13 Global Step: 231740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:15,227-Speed 9500.73 samples/sec Loss 4.7466 LearningRate 0.0093 Epoch: 13 Global Step: 231750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:16,321-Speed 9375.23 samples/sec Loss 4.7609 LearningRate 0.0093 Epoch: 13 Global Step: 231760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:17,439-Speed 9167.43 samples/sec Loss 4.7633 LearningRate 0.0093 Epoch: 13 Global Step: 231770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:18,518-Speed 9494.75 samples/sec Loss 4.7431 LearningRate 0.0093 Epoch: 13 Global Step: 231780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:19,636-Speed 9163.26 samples/sec Loss 4.7564 LearningRate 0.0093 Epoch: 13 Global Step: 231790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:20,755-Speed 9152.20 samples/sec Loss 4.7728 LearningRate 0.0093 Epoch: 13 Global Step: 231800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:21,873-Speed 9174.70 samples/sec Loss 4.7425 LearningRate 0.0093 Epoch: 13 Global Step: 231810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:22,936-Speed 9637.90 samples/sec Loss 4.7733 LearningRate 0.0093 Epoch: 13 Global Step: 231820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:24,024-Speed 9412.85 samples/sec Loss 4.7592 LearningRate 0.0093 Epoch: 13 Global Step: 231830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:25,145-Speed 9141.71 samples/sec Loss 4.7372 LearningRate 0.0093 Epoch: 13 Global Step: 231840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:26,248-Speed 9289.79 samples/sec Loss 4.7023 LearningRate 0.0093 Epoch: 13 Global Step: 231850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:27,343-Speed 9356.79 samples/sec Loss 4.7290 LearningRate 0.0093 Epoch: 13 Global Step: 231860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:28,439-Speed 9352.23 samples/sec Loss 4.6751 LearningRate 0.0093 Epoch: 13 Global Step: 231870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:29,557-Speed 9164.62 samples/sec Loss 4.8847 LearningRate 0.0093 Epoch: 13 Global Step: 231880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:30,642-Speed 9438.61 samples/sec Loss 4.7933 LearningRate 0.0093 Epoch: 13 Global Step: 231890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:31,748-Speed 9264.59 samples/sec Loss 4.6827 LearningRate 0.0093 Epoch: 13 Global Step: 231900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:32,824-Speed 9525.30 samples/sec Loss 4.7739 LearningRate 0.0093 Epoch: 13 Global Step: 231910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:33,934-Speed 9228.87 samples/sec Loss 4.7038 LearningRate 0.0093 Epoch: 13 Global Step: 231920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:34,999-Speed 9624.82 samples/sec Loss 4.7413 LearningRate 0.0093 Epoch: 13 Global Step: 231930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:36,069-Speed 9573.85 samples/sec Loss 4.6884 LearningRate 0.0093 Epoch: 13 Global Step: 231940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:37,136-Speed 9597.16 samples/sec Loss 4.7464 LearningRate 0.0093 Epoch: 13 Global Step: 231950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:38,213-Speed 9516.56 samples/sec Loss 4.7304 LearningRate 0.0093 Epoch: 13 Global Step: 231960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:39,280-Speed 9601.28 samples/sec Loss 4.6514 LearningRate 0.0093 Epoch: 13 Global Step: 231970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:53:40,344-Speed 9628.38 samples/sec Loss 4.8185 LearningRate 0.0093 Epoch: 13 Global Step: 231980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:41,445-Speed 9315.59 samples/sec Loss 4.7340 LearningRate 0.0093 Epoch: 13 Global Step: 231990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:53:42,507-Speed 9645.29 samples/sec Loss 4.7864 LearningRate 0.0093 Epoch: 13 Global Step: 232000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:54:04,422-[lfw][232000]XNorm: 7.786779 Training: 2022-04-11 20:54:04,423-[lfw][232000]Accuracy-Flip: 0.99617+-0.00248 Training: 2022-04-11 20:54:04,423-[lfw][232000]Accuracy-Highest: 0.99733 Training: 2022-04-11 20:54:29,695-[cfp_fp][232000]XNorm: 6.727420 Training: 2022-04-11 20:54:29,696-[cfp_fp][232000]Accuracy-Flip: 0.96700+-0.00920 Training: 2022-04-11 20:54:29,696-[cfp_fp][232000]Accuracy-Highest: 0.96914 Training: 2022-04-11 20:54:51,491-[agedb_30][232000]XNorm: 7.541200 Training: 2022-04-11 20:54:51,491-[agedb_30][232000]Accuracy-Flip: 0.97250+-0.00817 Training: 2022-04-11 20:54:51,492-[agedb_30][232000]Accuracy-Highest: 0.97250 Training: 2022-04-11 20:54:52,613-Speed 146.07 samples/sec Loss 4.7096 LearningRate 0.0093 Epoch: 13 Global Step: 232010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:54:53,679-Speed 9611.58 samples/sec Loss 4.7657 LearningRate 0.0093 Epoch: 13 Global Step: 232020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:54:54,752-Speed 9551.96 samples/sec Loss 4.8248 LearningRate 0.0093 Epoch: 13 Global Step: 232030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:54:55,830-Speed 9508.58 samples/sec Loss 4.8786 LearningRate 0.0093 Epoch: 13 Global Step: 232040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:54:56,900-Speed 9571.66 samples/sec Loss 4.7440 LearningRate 0.0093 Epoch: 13 Global Step: 232050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:54:57,985-Speed 9448.82 samples/sec Loss 4.7345 LearningRate 0.0093 Epoch: 13 Global Step: 232060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:54:59,064-Speed 9499.38 samples/sec Loss 4.6987 LearningRate 0.0093 Epoch: 13 Global Step: 232070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:00,171-Speed 9251.45 samples/sec Loss 4.6545 LearningRate 0.0093 Epoch: 13 Global Step: 232080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:01,267-Speed 9347.22 samples/sec Loss 4.7711 LearningRate 0.0093 Epoch: 13 Global Step: 232090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:02,352-Speed 9446.49 samples/sec Loss 4.6853 LearningRate 0.0093 Epoch: 13 Global Step: 232100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:03,431-Speed 9493.52 samples/sec Loss 4.7579 LearningRate 0.0093 Epoch: 13 Global Step: 232110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:04,503-Speed 9559.97 samples/sec Loss 4.7666 LearningRate 0.0093 Epoch: 13 Global Step: 232120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:05,596-Speed 9375.20 samples/sec Loss 4.8746 LearningRate 0.0093 Epoch: 13 Global Step: 232130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:06,680-Speed 9446.72 samples/sec Loss 4.7505 LearningRate 0.0093 Epoch: 13 Global Step: 232140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:07,782-Speed 9298.66 samples/sec Loss 4.7111 LearningRate 0.0093 Epoch: 13 Global Step: 232150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:08,872-Speed 9406.12 samples/sec Loss 4.6942 LearningRate 0.0093 Epoch: 13 Global Step: 232160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:09,937-Speed 9620.54 samples/sec Loss 4.7784 LearningRate 0.0093 Epoch: 13 Global Step: 232170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:11,004-Speed 9598.09 samples/sec Loss 4.7640 LearningRate 0.0093 Epoch: 13 Global Step: 232180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:12,081-Speed 9516.25 samples/sec Loss 4.8174 LearningRate 0.0093 Epoch: 13 Global Step: 232190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:13,162-Speed 9479.11 samples/sec Loss 4.6905 LearningRate 0.0093 Epoch: 13 Global Step: 232200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:14,195-Speed 9920.02 samples/sec Loss 4.6584 LearningRate 0.0093 Epoch: 13 Global Step: 232210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:15,240-Speed 9803.97 samples/sec Loss 4.7374 LearningRate 0.0093 Epoch: 13 Global Step: 232220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:16,340-Speed 9314.01 samples/sec Loss 4.7630 LearningRate 0.0093 Epoch: 13 Global Step: 232230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:17,422-Speed 9469.95 samples/sec Loss 4.7036 LearningRate 0.0093 Epoch: 13 Global Step: 232240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:18,542-Speed 9153.33 samples/sec Loss 4.6760 LearningRate 0.0093 Epoch: 13 Global Step: 232250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:19,646-Speed 9279.07 samples/sec Loss 4.7296 LearningRate 0.0093 Epoch: 13 Global Step: 232260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:20,717-Speed 9563.67 samples/sec Loss 4.7030 LearningRate 0.0093 Epoch: 13 Global Step: 232270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:21,847-Speed 9069.81 samples/sec Loss 4.6899 LearningRate 0.0093 Epoch: 13 Global Step: 232280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:22,922-Speed 9534.52 samples/sec Loss 4.7495 LearningRate 0.0093 Epoch: 13 Global Step: 232290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:24,004-Speed 9469.65 samples/sec Loss 4.8230 LearningRate 0.0092 Epoch: 13 Global Step: 232300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:25,086-Speed 9464.18 samples/sec Loss 4.7918 LearningRate 0.0092 Epoch: 13 Global Step: 232310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:26,188-Speed 9295.82 samples/sec Loss 4.7133 LearningRate 0.0092 Epoch: 13 Global Step: 232320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:27,295-Speed 9256.85 samples/sec Loss 4.6930 LearningRate 0.0092 Epoch: 13 Global Step: 232330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:28,407-Speed 9220.81 samples/sec Loss 4.7129 LearningRate 0.0092 Epoch: 13 Global Step: 232340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:29,530-Speed 9119.05 samples/sec Loss 4.7600 LearningRate 0.0092 Epoch: 13 Global Step: 232350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:30,638-Speed 9251.52 samples/sec Loss 4.7846 LearningRate 0.0092 Epoch: 13 Global Step: 232360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:31,729-Speed 9394.02 samples/sec Loss 4.6291 LearningRate 0.0092 Epoch: 13 Global Step: 232370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:32,806-Speed 9509.97 samples/sec Loss 4.7582 LearningRate 0.0092 Epoch: 13 Global Step: 232380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:33,865-Speed 9673.77 samples/sec Loss 4.8012 LearningRate 0.0092 Epoch: 13 Global Step: 232390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:34,945-Speed 9485.48 samples/sec Loss 4.7924 LearningRate 0.0092 Epoch: 13 Global Step: 232400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:36,078-Speed 9040.96 samples/sec Loss 4.8435 LearningRate 0.0092 Epoch: 13 Global Step: 232410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:37,143-Speed 9625.14 samples/sec Loss 4.9045 LearningRate 0.0092 Epoch: 13 Global Step: 232420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:38,222-Speed 9496.81 samples/sec Loss 4.7833 LearningRate 0.0092 Epoch: 13 Global Step: 232430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:39,344-Speed 9136.35 samples/sec Loss 4.7562 LearningRate 0.0092 Epoch: 13 Global Step: 232440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:40,448-Speed 9275.52 samples/sec Loss 4.7188 LearningRate 0.0092 Epoch: 13 Global Step: 232450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:41,523-Speed 9534.12 samples/sec Loss 4.7311 LearningRate 0.0092 Epoch: 13 Global Step: 232460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:42,652-Speed 9077.41 samples/sec Loss 4.7738 LearningRate 0.0092 Epoch: 13 Global Step: 232470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:43,761-Speed 9240.07 samples/sec Loss 4.8157 LearningRate 0.0092 Epoch: 13 Global Step: 232480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:44,828-Speed 9600.99 samples/sec Loss 4.7286 LearningRate 0.0092 Epoch: 13 Global Step: 232490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:45,904-Speed 9528.04 samples/sec Loss 4.7653 LearningRate 0.0092 Epoch: 13 Global Step: 232500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:46,988-Speed 9446.97 samples/sec Loss 4.6753 LearningRate 0.0092 Epoch: 13 Global Step: 232510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:48,085-Speed 9343.17 samples/sec Loss 4.6463 LearningRate 0.0092 Epoch: 13 Global Step: 232520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:49,184-Speed 9322.80 samples/sec Loss 4.7237 LearningRate 0.0092 Epoch: 13 Global Step: 232530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:55:50,269-Speed 9442.46 samples/sec Loss 4.7411 LearningRate 0.0092 Epoch: 13 Global Step: 232540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:51,366-Speed 9335.26 samples/sec Loss 4.7894 LearningRate 0.0092 Epoch: 13 Global Step: 232550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:52,541-Speed 8718.40 samples/sec Loss 4.7307 LearningRate 0.0092 Epoch: 13 Global Step: 232560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:53,604-Speed 9639.23 samples/sec Loss 4.6559 LearningRate 0.0092 Epoch: 13 Global Step: 232570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:54,689-Speed 9445.16 samples/sec Loss 4.7227 LearningRate 0.0092 Epoch: 13 Global Step: 232580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:55,792-Speed 9289.41 samples/sec Loss 4.7193 LearningRate 0.0092 Epoch: 13 Global Step: 232590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:56,856-Speed 9623.45 samples/sec Loss 4.7857 LearningRate 0.0092 Epoch: 13 Global Step: 232600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:57,974-Speed 9173.23 samples/sec Loss 4.6847 LearningRate 0.0092 Epoch: 13 Global Step: 232610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:55:59,052-Speed 9500.86 samples/sec Loss 4.7249 LearningRate 0.0092 Epoch: 13 Global Step: 232620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:00,146-Speed 9365.51 samples/sec Loss 4.7355 LearningRate 0.0092 Epoch: 13 Global Step: 232630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:01,203-Speed 9702.83 samples/sec Loss 4.7671 LearningRate 0.0092 Epoch: 13 Global Step: 232640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:02,304-Speed 9306.19 samples/sec Loss 4.7495 LearningRate 0.0092 Epoch: 13 Global Step: 232650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:03,387-Speed 9460.22 samples/sec Loss 4.7789 LearningRate 0.0092 Epoch: 13 Global Step: 232660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:04,470-Speed 9455.71 samples/sec Loss 4.6984 LearningRate 0.0092 Epoch: 13 Global Step: 232670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:05,531-Speed 9662.61 samples/sec Loss 4.7318 LearningRate 0.0092 Epoch: 13 Global Step: 232680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:06,620-Speed 9406.94 samples/sec Loss 4.7344 LearningRate 0.0092 Epoch: 13 Global Step: 232690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:07,658-Speed 9866.15 samples/sec Loss 4.7710 LearningRate 0.0092 Epoch: 13 Global Step: 232700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:08,769-Speed 9227.36 samples/sec Loss 4.7415 LearningRate 0.0092 Epoch: 13 Global Step: 232710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:09,891-Speed 9133.25 samples/sec Loss 4.6977 LearningRate 0.0092 Epoch: 13 Global Step: 232720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:10,986-Speed 9351.89 samples/sec Loss 4.7819 LearningRate 0.0092 Epoch: 13 Global Step: 232730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:12,055-Speed 9585.41 samples/sec Loss 4.7279 LearningRate 0.0092 Epoch: 13 Global Step: 232740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:13,139-Speed 9454.12 samples/sec Loss 4.8167 LearningRate 0.0092 Epoch: 13 Global Step: 232750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:14,233-Speed 9363.33 samples/sec Loss 4.7240 LearningRate 0.0092 Epoch: 13 Global Step: 232760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:15,335-Speed 9303.81 samples/sec Loss 4.6838 LearningRate 0.0092 Epoch: 13 Global Step: 232770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:16,438-Speed 9284.94 samples/sec Loss 4.8024 LearningRate 0.0092 Epoch: 13 Global Step: 232780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:17,536-Speed 9331.07 samples/sec Loss 4.7591 LearningRate 0.0092 Epoch: 13 Global Step: 232790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:18,628-Speed 9385.22 samples/sec Loss 4.7486 LearningRate 0.0092 Epoch: 13 Global Step: 232800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:19,711-Speed 9461.48 samples/sec Loss 4.7584 LearningRate 0.0092 Epoch: 13 Global Step: 232810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:20,808-Speed 9342.67 samples/sec Loss 4.7365 LearningRate 0.0092 Epoch: 13 Global Step: 232820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:21,908-Speed 9310.47 samples/sec Loss 4.6714 LearningRate 0.0092 Epoch: 13 Global Step: 232830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:23,024-Speed 9185.37 samples/sec Loss 4.7706 LearningRate 0.0092 Epoch: 13 Global Step: 232840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:24,135-Speed 9216.90 samples/sec Loss 4.8235 LearningRate 0.0091 Epoch: 13 Global Step: 232850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:25,233-Speed 9337.79 samples/sec Loss 4.7272 LearningRate 0.0091 Epoch: 13 Global Step: 232860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:26,315-Speed 9464.86 samples/sec Loss 4.7024 LearningRate 0.0091 Epoch: 13 Global Step: 232870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:27,447-Speed 9053.53 samples/sec Loss 4.7328 LearningRate 0.0091 Epoch: 13 Global Step: 232880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:28,531-Speed 9461.62 samples/sec Loss 4.7185 LearningRate 0.0091 Epoch: 13 Global Step: 232890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:29,596-Speed 9612.33 samples/sec Loss 4.7645 LearningRate 0.0091 Epoch: 13 Global Step: 232900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:30,699-Speed 9296.20 samples/sec Loss 4.7850 LearningRate 0.0091 Epoch: 13 Global Step: 232910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:31,816-Speed 9171.03 samples/sec Loss 4.7438 LearningRate 0.0091 Epoch: 13 Global Step: 232920 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 20:56:32,886-Speed 9574.48 samples/sec Loss 4.7132 LearningRate 0.0091 Epoch: 13 Global Step: 232930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:34,001-Speed 9189.30 samples/sec Loss 4.7061 LearningRate 0.0091 Epoch: 13 Global Step: 232940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:35,085-Speed 9454.41 samples/sec Loss 4.6173 LearningRate 0.0091 Epoch: 13 Global Step: 232950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:36,178-Speed 9372.22 samples/sec Loss 4.7174 LearningRate 0.0091 Epoch: 13 Global Step: 232960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:37,258-Speed 9485.62 samples/sec Loss 4.7322 LearningRate 0.0091 Epoch: 13 Global Step: 232970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:38,309-Speed 9757.51 samples/sec Loss 4.6866 LearningRate 0.0091 Epoch: 13 Global Step: 232980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:39,352-Speed 9818.85 samples/sec Loss 4.7516 LearningRate 0.0091 Epoch: 13 Global Step: 232990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:40,428-Speed 9523.95 samples/sec Loss 4.7059 LearningRate 0.0091 Epoch: 13 Global Step: 233000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:41,560-Speed 9052.10 samples/sec Loss 4.6988 LearningRate 0.0091 Epoch: 13 Global Step: 233010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:42,669-Speed 9240.62 samples/sec Loss 4.6854 LearningRate 0.0091 Epoch: 13 Global Step: 233020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:43,768-Speed 9335.31 samples/sec Loss 4.8318 LearningRate 0.0091 Epoch: 13 Global Step: 233030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:44,854-Speed 9431.95 samples/sec Loss 4.6811 LearningRate 0.0091 Epoch: 13 Global Step: 233040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:45,949-Speed 9359.14 samples/sec Loss 4.7758 LearningRate 0.0091 Epoch: 13 Global Step: 233050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:47,021-Speed 9555.18 samples/sec Loss 4.7892 LearningRate 0.0091 Epoch: 13 Global Step: 233060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:56:48,087-Speed 9608.54 samples/sec Loss 4.7473 LearningRate 0.0091 Epoch: 13 Global Step: 233070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:49,131-Speed 9815.22 samples/sec Loss 4.7443 LearningRate 0.0091 Epoch: 13 Global Step: 233080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:50,215-Speed 9458.16 samples/sec Loss 4.6863 LearningRate 0.0091 Epoch: 13 Global Step: 233090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:51,284-Speed 9584.89 samples/sec Loss 4.8152 LearningRate 0.0091 Epoch: 13 Global Step: 233100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:52,343-Speed 9675.20 samples/sec Loss 4.7753 LearningRate 0.0091 Epoch: 13 Global Step: 233110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:53,472-Speed 9069.13 samples/sec Loss 4.7445 LearningRate 0.0091 Epoch: 13 Global Step: 233120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:54,575-Speed 9290.06 samples/sec Loss 4.7563 LearningRate 0.0091 Epoch: 13 Global Step: 233130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:55,717-Speed 8970.62 samples/sec Loss 4.7663 LearningRate 0.0091 Epoch: 13 Global Step: 233140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:56,772-Speed 9713.05 samples/sec Loss 4.6792 LearningRate 0.0091 Epoch: 13 Global Step: 233150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:57,820-Speed 9782.60 samples/sec Loss 4.7082 LearningRate 0.0091 Epoch: 13 Global Step: 233160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:56:58,887-Speed 9595.69 samples/sec Loss 4.7867 LearningRate 0.0091 Epoch: 13 Global Step: 233170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:00,002-Speed 9196.69 samples/sec Loss 4.7396 LearningRate 0.0091 Epoch: 13 Global Step: 233180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:01,089-Speed 9425.00 samples/sec Loss 4.6046 LearningRate 0.0091 Epoch: 13 Global Step: 233190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:02,211-Speed 9124.51 samples/sec Loss 4.7201 LearningRate 0.0091 Epoch: 13 Global Step: 233200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:03,335-Speed 9124.60 samples/sec Loss 4.7911 LearningRate 0.0091 Epoch: 13 Global Step: 233210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:04,430-Speed 9351.24 samples/sec Loss 4.7579 LearningRate 0.0091 Epoch: 13 Global Step: 233220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:05,520-Speed 9403.31 samples/sec Loss 4.6937 LearningRate 0.0091 Epoch: 13 Global Step: 233230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:06,594-Speed 9542.16 samples/sec Loss 4.8045 LearningRate 0.0091 Epoch: 13 Global Step: 233240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:07,678-Speed 9445.01 samples/sec Loss 4.6646 LearningRate 0.0091 Epoch: 13 Global Step: 233250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:08,748-Speed 9577.94 samples/sec Loss 4.6975 LearningRate 0.0091 Epoch: 13 Global Step: 233260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:09,836-Speed 9413.41 samples/sec Loss 4.6202 LearningRate 0.0091 Epoch: 13 Global Step: 233270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:10,921-Speed 9444.61 samples/sec Loss 4.7679 LearningRate 0.0091 Epoch: 13 Global Step: 233280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:11,992-Speed 9564.50 samples/sec Loss 4.6967 LearningRate 0.0091 Epoch: 13 Global Step: 233290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:13,063-Speed 9569.62 samples/sec Loss 4.7784 LearningRate 0.0091 Epoch: 13 Global Step: 233300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:14,149-Speed 9442.74 samples/sec Loss 4.7747 LearningRate 0.0091 Epoch: 13 Global Step: 233310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:15,228-Speed 9496.68 samples/sec Loss 4.7406 LearningRate 0.0091 Epoch: 13 Global Step: 233320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:16,323-Speed 9353.25 samples/sec Loss 4.7007 LearningRate 0.0091 Epoch: 13 Global Step: 233330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:17,442-Speed 9157.22 samples/sec Loss 4.6916 LearningRate 0.0091 Epoch: 13 Global Step: 233340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:18,529-Speed 9424.06 samples/sec Loss 4.7056 LearningRate 0.0091 Epoch: 13 Global Step: 233350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:19,588-Speed 9679.04 samples/sec Loss 4.7117 LearningRate 0.0091 Epoch: 13 Global Step: 233360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:20,651-Speed 9639.82 samples/sec Loss 4.7255 LearningRate 0.0091 Epoch: 13 Global Step: 233370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:21,737-Speed 9439.13 samples/sec Loss 4.6507 LearningRate 0.0091 Epoch: 13 Global Step: 233380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:22,816-Speed 9491.05 samples/sec Loss 4.7453 LearningRate 0.0091 Epoch: 13 Global Step: 233390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:23,914-Speed 9332.13 samples/sec Loss 4.6941 LearningRate 0.0090 Epoch: 13 Global Step: 233400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:25,038-Speed 9121.54 samples/sec Loss 4.8071 LearningRate 0.0090 Epoch: 13 Global Step: 233410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:26,151-Speed 9202.97 samples/sec Loss 4.7554 LearningRate 0.0090 Epoch: 13 Global Step: 233420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:27,226-Speed 9526.95 samples/sec Loss 4.7838 LearningRate 0.0090 Epoch: 13 Global Step: 233430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:28,301-Speed 9536.90 samples/sec Loss 4.7076 LearningRate 0.0090 Epoch: 13 Global Step: 233440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:29,429-Speed 9082.42 samples/sec Loss 4.7949 LearningRate 0.0090 Epoch: 13 Global Step: 233450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:30,546-Speed 9165.68 samples/sec Loss 4.6672 LearningRate 0.0090 Epoch: 13 Global Step: 233460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:31,647-Speed 9305.94 samples/sec Loss 4.6838 LearningRate 0.0090 Epoch: 13 Global Step: 233470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:32,716-Speed 9596.59 samples/sec Loss 4.7846 LearningRate 0.0090 Epoch: 13 Global Step: 233480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:33,831-Speed 9190.10 samples/sec Loss 4.6907 LearningRate 0.0090 Epoch: 13 Global Step: 233490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:34,914-Speed 9462.36 samples/sec Loss 4.7739 LearningRate 0.0090 Epoch: 13 Global Step: 233500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:35,995-Speed 9477.61 samples/sec Loss 4.7665 LearningRate 0.0090 Epoch: 13 Global Step: 233510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:37,073-Speed 9506.45 samples/sec Loss 4.6982 LearningRate 0.0090 Epoch: 13 Global Step: 233520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:38,133-Speed 9661.37 samples/sec Loss 4.7250 LearningRate 0.0090 Epoch: 13 Global Step: 233530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:39,272-Speed 8996.73 samples/sec Loss 4.7071 LearningRate 0.0090 Epoch: 13 Global Step: 233540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:40,367-Speed 9358.60 samples/sec Loss 4.7473 LearningRate 0.0090 Epoch: 13 Global Step: 233550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:41,480-Speed 9208.90 samples/sec Loss 4.7338 LearningRate 0.0090 Epoch: 13 Global Step: 233560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:42,551-Speed 9562.59 samples/sec Loss 4.6907 LearningRate 0.0090 Epoch: 13 Global Step: 233570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:43,641-Speed 9395.78 samples/sec Loss 4.6935 LearningRate 0.0090 Epoch: 13 Global Step: 233580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:44,732-Speed 9398.43 samples/sec Loss 4.7216 LearningRate 0.0090 Epoch: 13 Global Step: 233590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:57:45,814-Speed 9473.13 samples/sec Loss 4.8107 LearningRate 0.0090 Epoch: 13 Global Step: 233600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:46,929-Speed 9182.08 samples/sec Loss 4.7602 LearningRate 0.0090 Epoch: 13 Global Step: 233610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:48,043-Speed 9200.66 samples/sec Loss 4.6805 LearningRate 0.0090 Epoch: 13 Global Step: 233620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:49,151-Speed 9244.92 samples/sec Loss 4.7152 LearningRate 0.0090 Epoch: 13 Global Step: 233630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:50,259-Speed 9247.63 samples/sec Loss 4.6513 LearningRate 0.0090 Epoch: 13 Global Step: 233640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:51,348-Speed 9414.82 samples/sec Loss 4.6725 LearningRate 0.0090 Epoch: 13 Global Step: 233650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:52,446-Speed 9332.37 samples/sec Loss 4.7278 LearningRate 0.0090 Epoch: 13 Global Step: 233660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:57:53,994-Speed 6615.96 samples/sec Loss 4.7626 LearningRate 0.0090 Epoch: 13 Global Step: 233670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:37,135-Speed 237.37 samples/sec Loss 4.3075 LearningRate 0.0090 Epoch: 14 Global Step: 233680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:38,787-Speed 6207.45 samples/sec Loss 4.0919 LearningRate 0.0090 Epoch: 14 Global Step: 233690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:39,866-Speed 9497.80 samples/sec Loss 3.9814 LearningRate 0.0090 Epoch: 14 Global Step: 233700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:40,947-Speed 9476.81 samples/sec Loss 4.0864 LearningRate 0.0090 Epoch: 14 Global Step: 233710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:42,656-Speed 5991.48 samples/sec Loss 4.0313 LearningRate 0.0090 Epoch: 14 Global Step: 233720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:43,938-Speed 7996.05 samples/sec Loss 4.0485 LearningRate 0.0090 Epoch: 14 Global Step: 233730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:45,042-Speed 9284.75 samples/sec Loss 4.0234 LearningRate 0.0090 Epoch: 14 Global Step: 233740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:46,131-Speed 9408.45 samples/sec Loss 4.0563 LearningRate 0.0090 Epoch: 14 Global Step: 233750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:47,392-Speed 8122.69 samples/sec Loss 3.9804 LearningRate 0.0090 Epoch: 14 Global Step: 233760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:48,491-Speed 9320.59 samples/sec Loss 4.1438 LearningRate 0.0090 Epoch: 14 Global Step: 233770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:58:49,587-Speed 9346.04 samples/sec Loss 4.0933 LearningRate 0.0090 Epoch: 14 Global Step: 233780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:58:50,696-Speed 9238.46 samples/sec Loss 4.1275 LearningRate 0.0090 Epoch: 14 Global Step: 233790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:58:51,837-Speed 8987.64 samples/sec Loss 3.9810 LearningRate 0.0090 Epoch: 14 Global Step: 233800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:58:53,023-Speed 8641.11 samples/sec Loss 4.0523 LearningRate 0.0090 Epoch: 14 Global Step: 233810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:58:54,116-Speed 9367.77 samples/sec Loss 4.0375 LearningRate 0.0090 Epoch: 14 Global Step: 233820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:58:55,230-Speed 9198.82 samples/sec Loss 4.0902 LearningRate 0.0090 Epoch: 14 Global Step: 233830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:58:56,325-Speed 9362.27 samples/sec Loss 4.0923 LearningRate 0.0090 Epoch: 14 Global Step: 233840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:58:57,426-Speed 9307.21 samples/sec Loss 4.1626 LearningRate 0.0090 Epoch: 14 Global Step: 233850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:58:58,519-Speed 9374.06 samples/sec Loss 4.0813 LearningRate 0.0090 Epoch: 14 Global Step: 233860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:58:59,668-Speed 8918.74 samples/sec Loss 4.0485 LearningRate 0.0090 Epoch: 14 Global Step: 233870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 20:59:00,704-Speed 9883.97 samples/sec Loss 4.1670 LearningRate 0.0090 Epoch: 14 Global Step: 233880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:59:01,809-Speed 9273.44 samples/sec Loss 4.0877 LearningRate 0.0090 Epoch: 14 Global Step: 233890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:59:02,882-Speed 9544.37 samples/sec Loss 4.1224 LearningRate 0.0090 Epoch: 14 Global Step: 233900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:59:03,943-Speed 9667.22 samples/sec Loss 4.1663 LearningRate 0.0090 Epoch: 14 Global Step: 233910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:59:04,988-Speed 9798.54 samples/sec Loss 4.0805 LearningRate 0.0090 Epoch: 14 Global Step: 233920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:59:06,041-Speed 9735.12 samples/sec Loss 4.1432 LearningRate 0.0090 Epoch: 14 Global Step: 233930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:59:07,119-Speed 9500.89 samples/sec Loss 4.0464 LearningRate 0.0090 Epoch: 14 Global Step: 233940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:59:08,225-Speed 9263.09 samples/sec Loss 4.1079 LearningRate 0.0090 Epoch: 14 Global Step: 233950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:59:09,339-Speed 9195.19 samples/sec Loss 4.1860 LearningRate 0.0089 Epoch: 14 Global Step: 233960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:59:10,440-Speed 9313.00 samples/sec Loss 4.1275 LearningRate 0.0089 Epoch: 14 Global Step: 233970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 20:59:11,505-Speed 9616.13 samples/sec Loss 4.1991 LearningRate 0.0089 Epoch: 14 Global Step: 233980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:59:12,604-Speed 9320.58 samples/sec Loss 4.1407 LearningRate 0.0089 Epoch: 14 Global Step: 233990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:59:13,677-Speed 9551.57 samples/sec Loss 4.1166 LearningRate 0.0089 Epoch: 14 Global Step: 234000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 20:59:35,439-[lfw][234000]XNorm: 7.682958 Training: 2022-04-11 20:59:35,440-[lfw][234000]Accuracy-Flip: 0.99533+-0.00287 Training: 2022-04-11 20:59:35,440-[lfw][234000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:00:00,604-[cfp_fp][234000]XNorm: 6.673557 Training: 2022-04-11 21:00:00,604-[cfp_fp][234000]Accuracy-Flip: 0.97014+-0.01015 Training: 2022-04-11 21:00:00,605-[cfp_fp][234000]Accuracy-Highest: 0.97014 Training: 2022-04-11 21:00:22,303-[agedb_30][234000]XNorm: 7.464890 Training: 2022-04-11 21:00:22,304-[agedb_30][234000]Accuracy-Flip: 0.97083+-0.00946 Training: 2022-04-11 21:00:22,304-[agedb_30][234000]Accuracy-Highest: 0.97250 Training: 2022-04-11 21:00:23,404-Speed 146.86 samples/sec Loss 4.1683 LearningRate 0.0089 Epoch: 14 Global Step: 234010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:24,451-Speed 9782.74 samples/sec Loss 4.0868 LearningRate 0.0089 Epoch: 14 Global Step: 234020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:25,513-Speed 9645.01 samples/sec Loss 4.1081 LearningRate 0.0089 Epoch: 14 Global Step: 234030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:26,718-Speed 8505.69 samples/sec Loss 4.0329 LearningRate 0.0089 Epoch: 14 Global Step: 234040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:27,961-Speed 8241.14 samples/sec Loss 4.0111 LearningRate 0.0089 Epoch: 14 Global Step: 234050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:29,066-Speed 9272.20 samples/sec Loss 4.1344 LearningRate 0.0089 Epoch: 14 Global Step: 234060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:30,134-Speed 9592.85 samples/sec Loss 4.0457 LearningRate 0.0089 Epoch: 14 Global Step: 234070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:31,198-Speed 9632.70 samples/sec Loss 4.0812 LearningRate 0.0089 Epoch: 14 Global Step: 234080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:32,284-Speed 9434.35 samples/sec Loss 4.1727 LearningRate 0.0089 Epoch: 14 Global Step: 234090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:33,377-Speed 9370.07 samples/sec Loss 4.0981 LearningRate 0.0089 Epoch: 14 Global Step: 234100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:34,464-Speed 9425.94 samples/sec Loss 4.1739 LearningRate 0.0089 Epoch: 14 Global Step: 234110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:35,512-Speed 9776.13 samples/sec Loss 4.1463 LearningRate 0.0089 Epoch: 14 Global Step: 234120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:36,611-Speed 9325.04 samples/sec Loss 4.0867 LearningRate 0.0089 Epoch: 14 Global Step: 234130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:37,676-Speed 9623.63 samples/sec Loss 4.0758 LearningRate 0.0089 Epoch: 14 Global Step: 234140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:38,756-Speed 9488.28 samples/sec Loss 4.0861 LearningRate 0.0089 Epoch: 14 Global Step: 234150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:39,841-Speed 9440.23 samples/sec Loss 4.0990 LearningRate 0.0089 Epoch: 14 Global Step: 234160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:40,950-Speed 9233.43 samples/sec Loss 4.1359 LearningRate 0.0089 Epoch: 14 Global Step: 234170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:42,088-Speed 9004.29 samples/sec Loss 4.0943 LearningRate 0.0089 Epoch: 14 Global Step: 234180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:43,194-Speed 9269.90 samples/sec Loss 4.1615 LearningRate 0.0089 Epoch: 14 Global Step: 234190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:44,263-Speed 9587.39 samples/sec Loss 4.0268 LearningRate 0.0089 Epoch: 14 Global Step: 234200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:45,348-Speed 9443.47 samples/sec Loss 4.0323 LearningRate 0.0089 Epoch: 14 Global Step: 234210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:46,419-Speed 9566.68 samples/sec Loss 4.1613 LearningRate 0.0089 Epoch: 14 Global Step: 234220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:47,475-Speed 9702.22 samples/sec Loss 4.1685 LearningRate 0.0089 Epoch: 14 Global Step: 234230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:48,547-Speed 9560.19 samples/sec Loss 4.1253 LearningRate 0.0089 Epoch: 14 Global Step: 234240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:49,603-Speed 9698.67 samples/sec Loss 4.1277 LearningRate 0.0089 Epoch: 14 Global Step: 234250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:50,682-Speed 9502.56 samples/sec Loss 4.0977 LearningRate 0.0089 Epoch: 14 Global Step: 234260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:51,780-Speed 9330.78 samples/sec Loss 4.0992 LearningRate 0.0089 Epoch: 14 Global Step: 234270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:00:52,812-Speed 9929.56 samples/sec Loss 4.1863 LearningRate 0.0089 Epoch: 14 Global Step: 234280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:00:53,918-Speed 9264.10 samples/sec Loss 4.1577 LearningRate 0.0089 Epoch: 14 Global Step: 234290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:00:55,015-Speed 9333.59 samples/sec Loss 4.2415 LearningRate 0.0089 Epoch: 14 Global Step: 234300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:00:56,072-Speed 9699.57 samples/sec Loss 4.0502 LearningRate 0.0089 Epoch: 14 Global Step: 234310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:00:57,129-Speed 9691.31 samples/sec Loss 4.2413 LearningRate 0.0089 Epoch: 14 Global Step: 234320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:00:58,193-Speed 9629.94 samples/sec Loss 4.0628 LearningRate 0.0089 Epoch: 14 Global Step: 234330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:00:59,243-Speed 9755.81 samples/sec Loss 4.0820 LearningRate 0.0089 Epoch: 14 Global Step: 234340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:00,308-Speed 9622.20 samples/sec Loss 4.1041 LearningRate 0.0089 Epoch: 14 Global Step: 234350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:01,390-Speed 9464.26 samples/sec Loss 4.1641 LearningRate 0.0089 Epoch: 14 Global Step: 234360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:02,663-Speed 8054.37 samples/sec Loss 4.2113 LearningRate 0.0089 Epoch: 14 Global Step: 234370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:03,752-Speed 9408.13 samples/sec Loss 4.1636 LearningRate 0.0089 Epoch: 14 Global Step: 234380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:04,834-Speed 9472.35 samples/sec Loss 4.1387 LearningRate 0.0089 Epoch: 14 Global Step: 234390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:05,887-Speed 9726.42 samples/sec Loss 4.0697 LearningRate 0.0089 Epoch: 14 Global Step: 234400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:06,997-Speed 9228.89 samples/sec Loss 4.0967 LearningRate 0.0089 Epoch: 14 Global Step: 234410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:08,050-Speed 9737.78 samples/sec Loss 4.1182 LearningRate 0.0089 Epoch: 14 Global Step: 234420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:09,109-Speed 9670.61 samples/sec Loss 4.1512 LearningRate 0.0089 Epoch: 14 Global Step: 234430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:10,159-Speed 9755.79 samples/sec Loss 4.1987 LearningRate 0.0089 Epoch: 14 Global Step: 234440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:11,247-Speed 9415.99 samples/sec Loss 4.0999 LearningRate 0.0089 Epoch: 14 Global Step: 234450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:12,369-Speed 9139.41 samples/sec Loss 4.1206 LearningRate 0.0089 Epoch: 14 Global Step: 234460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:13,429-Speed 9661.14 samples/sec Loss 4.2103 LearningRate 0.0089 Epoch: 14 Global Step: 234470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:14,534-Speed 9280.92 samples/sec Loss 4.2162 LearningRate 0.0089 Epoch: 14 Global Step: 234480 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 21:01:15,631-Speed 9341.63 samples/sec Loss 4.1625 LearningRate 0.0089 Epoch: 14 Global Step: 234490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:16,704-Speed 9542.40 samples/sec Loss 4.0836 LearningRate 0.0089 Epoch: 14 Global Step: 234500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:17,799-Speed 9356.14 samples/sec Loss 4.1270 LearningRate 0.0089 Epoch: 14 Global Step: 234510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:18,887-Speed 9418.30 samples/sec Loss 4.1216 LearningRate 0.0088 Epoch: 14 Global Step: 234520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:19,943-Speed 9707.87 samples/sec Loss 4.1183 LearningRate 0.0088 Epoch: 14 Global Step: 234530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:21,046-Speed 9293.27 samples/sec Loss 4.2369 LearningRate 0.0088 Epoch: 14 Global Step: 234540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:22,159-Speed 9204.76 samples/sec Loss 4.1596 LearningRate 0.0088 Epoch: 14 Global Step: 234550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:23,261-Speed 9296.55 samples/sec Loss 4.0696 LearningRate 0.0088 Epoch: 14 Global Step: 234560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:24,356-Speed 9354.52 samples/sec Loss 4.1200 LearningRate 0.0088 Epoch: 14 Global Step: 234570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:25,426-Speed 9578.85 samples/sec Loss 4.0541 LearningRate 0.0088 Epoch: 14 Global Step: 234580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:26,514-Speed 9421.25 samples/sec Loss 4.1073 LearningRate 0.0088 Epoch: 14 Global Step: 234590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:27,629-Speed 9185.19 samples/sec Loss 4.1745 LearningRate 0.0088 Epoch: 14 Global Step: 234600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:28,683-Speed 9721.96 samples/sec Loss 4.1201 LearningRate 0.0088 Epoch: 14 Global Step: 234610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:29,757-Speed 9539.57 samples/sec Loss 4.0447 LearningRate 0.0088 Epoch: 14 Global Step: 234620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:30,829-Speed 9554.44 samples/sec Loss 4.1374 LearningRate 0.0088 Epoch: 14 Global Step: 234630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:31,891-Speed 9651.18 samples/sec Loss 4.0753 LearningRate 0.0088 Epoch: 14 Global Step: 234640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:32,956-Speed 9615.64 samples/sec Loss 4.1504 LearningRate 0.0088 Epoch: 14 Global Step: 234650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:34,023-Speed 9608.70 samples/sec Loss 4.1304 LearningRate 0.0088 Epoch: 14 Global Step: 234660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:35,094-Speed 9563.31 samples/sec Loss 4.1044 LearningRate 0.0088 Epoch: 14 Global Step: 234670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:36,214-Speed 9149.24 samples/sec Loss 4.0877 LearningRate 0.0088 Epoch: 14 Global Step: 234680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:37,285-Speed 9561.60 samples/sec Loss 4.1235 LearningRate 0.0088 Epoch: 14 Global Step: 234690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:38,339-Speed 9726.79 samples/sec Loss 4.2077 LearningRate 0.0088 Epoch: 14 Global Step: 234700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:01:39,424-Speed 9440.03 samples/sec Loss 4.1496 LearningRate 0.0088 Epoch: 14 Global Step: 234710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:40,497-Speed 9555.40 samples/sec Loss 4.1940 LearningRate 0.0088 Epoch: 14 Global Step: 234720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:41,595-Speed 9332.51 samples/sec Loss 4.1077 LearningRate 0.0088 Epoch: 14 Global Step: 234730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:42,682-Speed 9427.14 samples/sec Loss 4.2265 LearningRate 0.0088 Epoch: 14 Global Step: 234740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:43,786-Speed 9281.76 samples/sec Loss 4.1438 LearningRate 0.0088 Epoch: 14 Global Step: 234750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:44,899-Speed 9203.73 samples/sec Loss 4.2519 LearningRate 0.0088 Epoch: 14 Global Step: 234760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:45,950-Speed 9749.55 samples/sec Loss 4.1612 LearningRate 0.0088 Epoch: 14 Global Step: 234770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:47,028-Speed 9497.85 samples/sec Loss 4.1307 LearningRate 0.0088 Epoch: 14 Global Step: 234780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:48,081-Speed 9735.38 samples/sec Loss 4.2104 LearningRate 0.0088 Epoch: 14 Global Step: 234790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:49,162-Speed 9473.12 samples/sec Loss 4.2189 LearningRate 0.0088 Epoch: 14 Global Step: 234800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:50,195-Speed 9923.05 samples/sec Loss 4.2500 LearningRate 0.0088 Epoch: 14 Global Step: 234810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:51,262-Speed 9602.71 samples/sec Loss 4.3467 LearningRate 0.0088 Epoch: 14 Global Step: 234820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:52,345-Speed 9463.93 samples/sec Loss 4.1369 LearningRate 0.0088 Epoch: 14 Global Step: 234830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:53,424-Speed 9496.99 samples/sec Loss 4.1983 LearningRate 0.0088 Epoch: 14 Global Step: 234840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:54,543-Speed 9156.59 samples/sec Loss 4.2008 LearningRate 0.0088 Epoch: 14 Global Step: 234850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:55,616-Speed 9548.94 samples/sec Loss 4.1419 LearningRate 0.0088 Epoch: 14 Global Step: 234860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:56,702-Speed 9439.16 samples/sec Loss 4.2345 LearningRate 0.0088 Epoch: 14 Global Step: 234870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:57,833-Speed 9052.47 samples/sec Loss 4.1897 LearningRate 0.0088 Epoch: 14 Global Step: 234880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:58,931-Speed 9330.82 samples/sec Loss 4.1487 LearningRate 0.0088 Epoch: 14 Global Step: 234890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:01:59,982-Speed 9754.87 samples/sec Loss 4.1650 LearningRate 0.0088 Epoch: 14 Global Step: 234900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:01,070-Speed 9417.16 samples/sec Loss 4.1309 LearningRate 0.0088 Epoch: 14 Global Step: 234910 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 21:02:02,164-Speed 9362.69 samples/sec Loss 4.1352 LearningRate 0.0088 Epoch: 14 Global Step: 234920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:03,273-Speed 9244.43 samples/sec Loss 4.1083 LearningRate 0.0088 Epoch: 14 Global Step: 234930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:04,370-Speed 9340.58 samples/sec Loss 4.1336 LearningRate 0.0088 Epoch: 14 Global Step: 234940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:05,445-Speed 9526.32 samples/sec Loss 4.1701 LearningRate 0.0088 Epoch: 14 Global Step: 234950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:06,524-Speed 9492.22 samples/sec Loss 4.2334 LearningRate 0.0088 Epoch: 14 Global Step: 234960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:07,605-Speed 9485.55 samples/sec Loss 4.2244 LearningRate 0.0088 Epoch: 14 Global Step: 234970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:08,636-Speed 9928.57 samples/sec Loss 4.1620 LearningRate 0.0088 Epoch: 14 Global Step: 234980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:09,751-Speed 9198.74 samples/sec Loss 4.1820 LearningRate 0.0088 Epoch: 14 Global Step: 234990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:10,819-Speed 9594.13 samples/sec Loss 4.1728 LearningRate 0.0088 Epoch: 14 Global Step: 235000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:11,930-Speed 9216.18 samples/sec Loss 4.1715 LearningRate 0.0088 Epoch: 14 Global Step: 235010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:12,975-Speed 9806.96 samples/sec Loss 4.2300 LearningRate 0.0088 Epoch: 14 Global Step: 235020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:14,038-Speed 9635.57 samples/sec Loss 4.2566 LearningRate 0.0088 Epoch: 14 Global Step: 235030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:15,094-Speed 9706.43 samples/sec Loss 4.1772 LearningRate 0.0088 Epoch: 14 Global Step: 235040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:16,186-Speed 9385.76 samples/sec Loss 4.2105 LearningRate 0.0088 Epoch: 14 Global Step: 235050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:17,287-Speed 9301.26 samples/sec Loss 4.0865 LearningRate 0.0088 Epoch: 14 Global Step: 235060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:18,366-Speed 9493.13 samples/sec Loss 4.2100 LearningRate 0.0088 Epoch: 14 Global Step: 235070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:19,430-Speed 9640.47 samples/sec Loss 4.0809 LearningRate 0.0087 Epoch: 14 Global Step: 235080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:20,501-Speed 9564.33 samples/sec Loss 4.2081 LearningRate 0.0087 Epoch: 14 Global Step: 235090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:21,579-Speed 9507.39 samples/sec Loss 4.1628 LearningRate 0.0087 Epoch: 14 Global Step: 235100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:22,673-Speed 9361.36 samples/sec Loss 4.1529 LearningRate 0.0087 Epoch: 14 Global Step: 235110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:23,776-Speed 9288.25 samples/sec Loss 4.2362 LearningRate 0.0087 Epoch: 14 Global Step: 235120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:24,849-Speed 9549.15 samples/sec Loss 4.3138 LearningRate 0.0087 Epoch: 14 Global Step: 235130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:25,920-Speed 9568.78 samples/sec Loss 4.1922 LearningRate 0.0087 Epoch: 14 Global Step: 235140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:27,034-Speed 9203.00 samples/sec Loss 4.2051 LearningRate 0.0087 Epoch: 14 Global Step: 235150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:28,119-Speed 9438.75 samples/sec Loss 4.2163 LearningRate 0.0087 Epoch: 14 Global Step: 235160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:29,252-Speed 9046.49 samples/sec Loss 4.1619 LearningRate 0.0087 Epoch: 14 Global Step: 235170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:30,350-Speed 9326.78 samples/sec Loss 4.1114 LearningRate 0.0087 Epoch: 14 Global Step: 235180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:31,428-Speed 9501.81 samples/sec Loss 4.1329 LearningRate 0.0087 Epoch: 14 Global Step: 235190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:32,481-Speed 9733.06 samples/sec Loss 4.1023 LearningRate 0.0087 Epoch: 14 Global Step: 235200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:33,550-Speed 9582.02 samples/sec Loss 4.1734 LearningRate 0.0087 Epoch: 14 Global Step: 235210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:34,594-Speed 9814.91 samples/sec Loss 4.1604 LearningRate 0.0087 Epoch: 14 Global Step: 235220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:35,688-Speed 9366.94 samples/sec Loss 4.1692 LearningRate 0.0087 Epoch: 14 Global Step: 235230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:36,756-Speed 9589.32 samples/sec Loss 4.1919 LearningRate 0.0087 Epoch: 14 Global Step: 235240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:37,856-Speed 9324.68 samples/sec Loss 4.3314 LearningRate 0.0087 Epoch: 14 Global Step: 235250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:38,956-Speed 9316.66 samples/sec Loss 4.2335 LearningRate 0.0087 Epoch: 14 Global Step: 235260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:40,042-Speed 9437.50 samples/sec Loss 4.2665 LearningRate 0.0087 Epoch: 14 Global Step: 235270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:41,104-Speed 9644.21 samples/sec Loss 4.3095 LearningRate 0.0087 Epoch: 14 Global Step: 235280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:42,158-Speed 9721.75 samples/sec Loss 4.1862 LearningRate 0.0087 Epoch: 14 Global Step: 235290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:43,270-Speed 9217.94 samples/sec Loss 4.2517 LearningRate 0.0087 Epoch: 14 Global Step: 235300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:44,349-Speed 9493.37 samples/sec Loss 4.2072 LearningRate 0.0087 Epoch: 14 Global Step: 235310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:45,400-Speed 9749.20 samples/sec Loss 4.2345 LearningRate 0.0087 Epoch: 14 Global Step: 235320 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 21:02:46,458-Speed 9686.85 samples/sec Loss 4.2853 LearningRate 0.0087 Epoch: 14 Global Step: 235330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:47,545-Speed 9425.29 samples/sec Loss 4.2500 LearningRate 0.0087 Epoch: 14 Global Step: 235340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:48,681-Speed 9014.65 samples/sec Loss 4.2210 LearningRate 0.0087 Epoch: 14 Global Step: 235350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:49,792-Speed 9229.49 samples/sec Loss 4.2225 LearningRate 0.0087 Epoch: 14 Global Step: 235360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:50,858-Speed 9610.62 samples/sec Loss 4.2730 LearningRate 0.0087 Epoch: 14 Global Step: 235370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:51,959-Speed 9299.98 samples/sec Loss 4.1985 LearningRate 0.0087 Epoch: 14 Global Step: 235380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:53,011-Speed 9742.27 samples/sec Loss 4.1755 LearningRate 0.0087 Epoch: 14 Global Step: 235390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:54,102-Speed 9394.66 samples/sec Loss 4.2055 LearningRate 0.0087 Epoch: 14 Global Step: 235400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:55,176-Speed 9535.72 samples/sec Loss 4.2763 LearningRate 0.0087 Epoch: 14 Global Step: 235410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:56,349-Speed 8743.03 samples/sec Loss 4.1977 LearningRate 0.0087 Epoch: 14 Global Step: 235420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:57,444-Speed 9358.02 samples/sec Loss 4.2858 LearningRate 0.0087 Epoch: 14 Global Step: 235430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:58,562-Speed 9161.81 samples/sec Loss 4.2306 LearningRate 0.0087 Epoch: 14 Global Step: 235440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:02:59,660-Speed 9327.42 samples/sec Loss 4.2426 LearningRate 0.0087 Epoch: 14 Global Step: 235450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:00,711-Speed 9752.15 samples/sec Loss 4.1906 LearningRate 0.0087 Epoch: 14 Global Step: 235460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:01,795-Speed 9450.31 samples/sec Loss 4.2033 LearningRate 0.0087 Epoch: 14 Global Step: 235470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:02,834-Speed 9864.35 samples/sec Loss 4.3301 LearningRate 0.0087 Epoch: 14 Global Step: 235480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:03,877-Speed 9816.29 samples/sec Loss 4.1926 LearningRate 0.0087 Epoch: 14 Global Step: 235490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:04,935-Speed 9692.26 samples/sec Loss 4.1807 LearningRate 0.0087 Epoch: 14 Global Step: 235500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:06,014-Speed 9493.85 samples/sec Loss 4.2301 LearningRate 0.0087 Epoch: 14 Global Step: 235510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:07,129-Speed 9184.58 samples/sec Loss 4.2473 LearningRate 0.0087 Epoch: 14 Global Step: 235520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:08,225-Speed 9349.29 samples/sec Loss 4.2311 LearningRate 0.0087 Epoch: 14 Global Step: 235530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:09,368-Speed 8964.19 samples/sec Loss 4.2699 LearningRate 0.0087 Epoch: 14 Global Step: 235540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:10,464-Speed 9346.69 samples/sec Loss 4.1886 LearningRate 0.0087 Epoch: 14 Global Step: 235550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:11,555-Speed 9391.09 samples/sec Loss 4.2360 LearningRate 0.0087 Epoch: 14 Global Step: 235560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:12,631-Speed 9522.49 samples/sec Loss 4.1807 LearningRate 0.0087 Epoch: 14 Global Step: 235570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:13,688-Speed 9697.90 samples/sec Loss 4.3161 LearningRate 0.0087 Epoch: 14 Global Step: 235580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:14,753-Speed 9619.28 samples/sec Loss 4.1838 LearningRate 0.0087 Epoch: 14 Global Step: 235590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:15,834-Speed 9476.62 samples/sec Loss 4.2950 LearningRate 0.0087 Epoch: 14 Global Step: 235600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:16,923-Speed 9413.10 samples/sec Loss 4.2791 LearningRate 0.0087 Epoch: 14 Global Step: 235610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:18,006-Speed 9460.22 samples/sec Loss 4.2958 LearningRate 0.0087 Epoch: 14 Global Step: 235620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:19,142-Speed 9013.70 samples/sec Loss 4.1810 LearningRate 0.0087 Epoch: 14 Global Step: 235630 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 21:03:20,211-Speed 9591.84 samples/sec Loss 4.2144 LearningRate 0.0087 Epoch: 14 Global Step: 235640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:21,301-Speed 9397.79 samples/sec Loss 4.2243 LearningRate 0.0086 Epoch: 14 Global Step: 235650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:22,357-Speed 9703.57 samples/sec Loss 4.2629 LearningRate 0.0086 Epoch: 14 Global Step: 235660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:23,443-Speed 9432.50 samples/sec Loss 4.2868 LearningRate 0.0086 Epoch: 14 Global Step: 235670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:24,557-Speed 9196.39 samples/sec Loss 4.2541 LearningRate 0.0086 Epoch: 14 Global Step: 235680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:25,661-Speed 9284.69 samples/sec Loss 4.2677 LearningRate 0.0086 Epoch: 14 Global Step: 235690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:26,750-Speed 9412.74 samples/sec Loss 4.3631 LearningRate 0.0086 Epoch: 14 Global Step: 235700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:27,827-Speed 9508.29 samples/sec Loss 4.2405 LearningRate 0.0086 Epoch: 14 Global Step: 235710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:28,881-Speed 9725.16 samples/sec Loss 4.2660 LearningRate 0.0086 Epoch: 14 Global Step: 235720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:29,988-Speed 9256.73 samples/sec Loss 4.2288 LearningRate 0.0086 Epoch: 14 Global Step: 235730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:31,055-Speed 9598.25 samples/sec Loss 4.2732 LearningRate 0.0086 Epoch: 14 Global Step: 235740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:32,177-Speed 9133.89 samples/sec Loss 4.2329 LearningRate 0.0086 Epoch: 14 Global Step: 235750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:33,260-Speed 9460.78 samples/sec Loss 4.2906 LearningRate 0.0086 Epoch: 14 Global Step: 235760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:34,310-Speed 9762.63 samples/sec Loss 4.2496 LearningRate 0.0086 Epoch: 14 Global Step: 235770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:35,449-Speed 8995.66 samples/sec Loss 4.3224 LearningRate 0.0086 Epoch: 14 Global Step: 235780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:36,539-Speed 9402.56 samples/sec Loss 4.3769 LearningRate 0.0086 Epoch: 14 Global Step: 235790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:37,615-Speed 9514.43 samples/sec Loss 4.2387 LearningRate 0.0086 Epoch: 14 Global Step: 235800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:38,679-Speed 9634.04 samples/sec Loss 4.2730 LearningRate 0.0086 Epoch: 14 Global Step: 235810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:39,756-Speed 9517.01 samples/sec Loss 4.2281 LearningRate 0.0086 Epoch: 14 Global Step: 235820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:40,845-Speed 9407.95 samples/sec Loss 4.2387 LearningRate 0.0086 Epoch: 14 Global Step: 235830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:41,939-Speed 9360.14 samples/sec Loss 4.2008 LearningRate 0.0086 Epoch: 14 Global Step: 235840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:42,983-Speed 9817.72 samples/sec Loss 4.1965 LearningRate 0.0086 Epoch: 14 Global Step: 235850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:44,086-Speed 9291.29 samples/sec Loss 4.2937 LearningRate 0.0086 Epoch: 14 Global Step: 235860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:45,165-Speed 9493.34 samples/sec Loss 4.1940 LearningRate 0.0086 Epoch: 14 Global Step: 235870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:46,269-Speed 9276.51 samples/sec Loss 4.1677 LearningRate 0.0086 Epoch: 14 Global Step: 235880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:47,310-Speed 9848.99 samples/sec Loss 4.3647 LearningRate 0.0086 Epoch: 14 Global Step: 235890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:48,404-Speed 9366.38 samples/sec Loss 4.2532 LearningRate 0.0086 Epoch: 14 Global Step: 235900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:49,462-Speed 9684.18 samples/sec Loss 4.4075 LearningRate 0.0086 Epoch: 14 Global Step: 235910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:50,551-Speed 9409.01 samples/sec Loss 4.2428 LearningRate 0.0086 Epoch: 14 Global Step: 235920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:51,630-Speed 9497.10 samples/sec Loss 4.1760 LearningRate 0.0086 Epoch: 14 Global Step: 235930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:52,757-Speed 9089.40 samples/sec Loss 4.2266 LearningRate 0.0086 Epoch: 14 Global Step: 235940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:53,852-Speed 9356.83 samples/sec Loss 4.1637 LearningRate 0.0086 Epoch: 14 Global Step: 235950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:03:54,939-Speed 9424.65 samples/sec Loss 4.2310 LearningRate 0.0086 Epoch: 14 Global Step: 235960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:55,993-Speed 9726.19 samples/sec Loss 4.2350 LearningRate 0.0086 Epoch: 14 Global Step: 235970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:57,025-Speed 9925.74 samples/sec Loss 4.2539 LearningRate 0.0086 Epoch: 14 Global Step: 235980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:58,093-Speed 9597.43 samples/sec Loss 4.2909 LearningRate 0.0086 Epoch: 14 Global Step: 235990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:03:59,161-Speed 9589.69 samples/sec Loss 4.2874 LearningRate 0.0086 Epoch: 14 Global Step: 236000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:04:21,666-[lfw][236000]XNorm: 7.708942 Training: 2022-04-11 21:04:21,667-[lfw][236000]Accuracy-Flip: 0.99667+-0.00269 Training: 2022-04-11 21:04:21,667-[lfw][236000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:04:47,978-[cfp_fp][236000]XNorm: 6.633025 Training: 2022-04-11 21:04:47,979-[cfp_fp][236000]Accuracy-Flip: 0.97143+-0.00780 Training: 2022-04-11 21:04:47,979-[cfp_fp][236000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:05:10,337-[agedb_30][236000]XNorm: 7.491549 Training: 2022-04-11 21:05:10,338-[agedb_30][236000]Accuracy-Flip: 0.96933+-0.00952 Training: 2022-04-11 21:05:10,339-[agedb_30][236000]Accuracy-Highest: 0.97250 Training: 2022-04-11 21:05:11,430-Speed 141.70 samples/sec Loss 4.2550 LearningRate 0.0086 Epoch: 14 Global Step: 236010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:12,502-Speed 9549.93 samples/sec Loss 4.2933 LearningRate 0.0086 Epoch: 14 Global Step: 236020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:13,549-Speed 9786.20 samples/sec Loss 4.3304 LearningRate 0.0086 Epoch: 14 Global Step: 236030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:14,626-Speed 9517.47 samples/sec Loss 4.1797 LearningRate 0.0086 Epoch: 14 Global Step: 236040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:15,697-Speed 9561.42 samples/sec Loss 4.3039 LearningRate 0.0086 Epoch: 14 Global Step: 236050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:16,736-Speed 9869.09 samples/sec Loss 4.2664 LearningRate 0.0086 Epoch: 14 Global Step: 236060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:17,782-Speed 9789.70 samples/sec Loss 4.2642 LearningRate 0.0086 Epoch: 14 Global Step: 236070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:18,881-Speed 9327.21 samples/sec Loss 4.2090 LearningRate 0.0086 Epoch: 14 Global Step: 236080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:19,952-Speed 9571.22 samples/sec Loss 4.2884 LearningRate 0.0086 Epoch: 14 Global Step: 236090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:21,050-Speed 9330.17 samples/sec Loss 4.1626 LearningRate 0.0086 Epoch: 14 Global Step: 236100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:22,136-Speed 9432.78 samples/sec Loss 4.4476 LearningRate 0.0086 Epoch: 14 Global Step: 236110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:23,175-Speed 9864.89 samples/sec Loss 4.2564 LearningRate 0.0086 Epoch: 14 Global Step: 236120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:24,267-Speed 9375.02 samples/sec Loss 4.2454 LearningRate 0.0086 Epoch: 14 Global Step: 236130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:25,343-Speed 9526.86 samples/sec Loss 4.2522 LearningRate 0.0086 Epoch: 14 Global Step: 236140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:26,432-Speed 9412.68 samples/sec Loss 4.3129 LearningRate 0.0086 Epoch: 14 Global Step: 236150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:27,533-Speed 9303.44 samples/sec Loss 4.3435 LearningRate 0.0086 Epoch: 14 Global Step: 236160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:28,578-Speed 9803.57 samples/sec Loss 4.2613 LearningRate 0.0086 Epoch: 14 Global Step: 236170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:29,691-Speed 9205.66 samples/sec Loss 4.2204 LearningRate 0.0086 Epoch: 14 Global Step: 236180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:30,764-Speed 9551.86 samples/sec Loss 4.3413 LearningRate 0.0086 Epoch: 14 Global Step: 236190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:31,854-Speed 9398.32 samples/sec Loss 4.2807 LearningRate 0.0086 Epoch: 14 Global Step: 236200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:32,952-Speed 9328.44 samples/sec Loss 4.2891 LearningRate 0.0085 Epoch: 14 Global Step: 236210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:34,044-Speed 9386.55 samples/sec Loss 4.2981 LearningRate 0.0085 Epoch: 14 Global Step: 236220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:35,138-Speed 9369.74 samples/sec Loss 4.2547 LearningRate 0.0085 Epoch: 14 Global Step: 236230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:36,260-Speed 9126.17 samples/sec Loss 4.3726 LearningRate 0.0085 Epoch: 14 Global Step: 236240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:37,339-Speed 9498.32 samples/sec Loss 4.3124 LearningRate 0.0085 Epoch: 14 Global Step: 236250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:38,419-Speed 9489.52 samples/sec Loss 4.2927 LearningRate 0.0085 Epoch: 14 Global Step: 236260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:39,493-Speed 9547.85 samples/sec Loss 4.1901 LearningRate 0.0085 Epoch: 14 Global Step: 236270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:40,584-Speed 9386.41 samples/sec Loss 4.2656 LearningRate 0.0085 Epoch: 14 Global Step: 236280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:41,657-Speed 9546.98 samples/sec Loss 4.2458 LearningRate 0.0085 Epoch: 14 Global Step: 236290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:42,763-Speed 9264.80 samples/sec Loss 4.3296 LearningRate 0.0085 Epoch: 14 Global Step: 236300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:43,862-Speed 9326.44 samples/sec Loss 4.2694 LearningRate 0.0085 Epoch: 14 Global Step: 236310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:44,967-Speed 9265.43 samples/sec Loss 4.2658 LearningRate 0.0085 Epoch: 14 Global Step: 236320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:46,066-Speed 9323.83 samples/sec Loss 4.2764 LearningRate 0.0085 Epoch: 14 Global Step: 236330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:47,140-Speed 9542.93 samples/sec Loss 4.2269 LearningRate 0.0085 Epoch: 14 Global Step: 236340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:48,238-Speed 9329.13 samples/sec Loss 4.3282 LearningRate 0.0085 Epoch: 14 Global Step: 236350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:49,367-Speed 9075.17 samples/sec Loss 4.2428 LearningRate 0.0085 Epoch: 14 Global Step: 236360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:50,470-Speed 9290.57 samples/sec Loss 4.2885 LearningRate 0.0085 Epoch: 14 Global Step: 236370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:51,540-Speed 9575.28 samples/sec Loss 4.3058 LearningRate 0.0085 Epoch: 14 Global Step: 236380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:52,579-Speed 9860.11 samples/sec Loss 4.2543 LearningRate 0.0085 Epoch: 14 Global Step: 236390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:53,689-Speed 9230.85 samples/sec Loss 4.2965 LearningRate 0.0085 Epoch: 14 Global Step: 236400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:54,783-Speed 9367.43 samples/sec Loss 4.3497 LearningRate 0.0085 Epoch: 14 Global Step: 236410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:05:55,868-Speed 9446.25 samples/sec Loss 4.2534 LearningRate 0.0085 Epoch: 14 Global Step: 236420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:56,976-Speed 9249.21 samples/sec Loss 4.1808 LearningRate 0.0085 Epoch: 14 Global Step: 236430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:58,047-Speed 9570.20 samples/sec Loss 4.1679 LearningRate 0.0085 Epoch: 14 Global Step: 236440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:05:59,124-Speed 9514.77 samples/sec Loss 4.2509 LearningRate 0.0085 Epoch: 14 Global Step: 236450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:00,182-Speed 9680.32 samples/sec Loss 4.2760 LearningRate 0.0085 Epoch: 14 Global Step: 236460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:01,231-Speed 9767.23 samples/sec Loss 4.2916 LearningRate 0.0085 Epoch: 14 Global Step: 236470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:02,342-Speed 9220.81 samples/sec Loss 4.2498 LearningRate 0.0085 Epoch: 14 Global Step: 236480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:03,406-Speed 9633.66 samples/sec Loss 4.2488 LearningRate 0.0085 Epoch: 14 Global Step: 236490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:04,454-Speed 9774.23 samples/sec Loss 4.2474 LearningRate 0.0085 Epoch: 14 Global Step: 236500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:05,529-Speed 9528.28 samples/sec Loss 4.2602 LearningRate 0.0085 Epoch: 14 Global Step: 236510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:06,596-Speed 9608.50 samples/sec Loss 4.3179 LearningRate 0.0085 Epoch: 14 Global Step: 236520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:07,666-Speed 9570.35 samples/sec Loss 4.3320 LearningRate 0.0085 Epoch: 14 Global Step: 236530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:08,760-Speed 9365.94 samples/sec Loss 4.2117 LearningRate 0.0085 Epoch: 14 Global Step: 236540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:09,869-Speed 9245.55 samples/sec Loss 4.2791 LearningRate 0.0085 Epoch: 14 Global Step: 236550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:10,968-Speed 9321.88 samples/sec Loss 4.3213 LearningRate 0.0085 Epoch: 14 Global Step: 236560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:12,052-Speed 9452.20 samples/sec Loss 4.1998 LearningRate 0.0085 Epoch: 14 Global Step: 236570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:13,131-Speed 9494.38 samples/sec Loss 4.2395 LearningRate 0.0085 Epoch: 14 Global Step: 236580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:14,195-Speed 9626.74 samples/sec Loss 4.2627 LearningRate 0.0085 Epoch: 14 Global Step: 236590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:15,311-Speed 9186.33 samples/sec Loss 4.3194 LearningRate 0.0085 Epoch: 14 Global Step: 236600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:16,414-Speed 9283.34 samples/sec Loss 4.3059 LearningRate 0.0085 Epoch: 14 Global Step: 236610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:17,524-Speed 9232.53 samples/sec Loss 4.2975 LearningRate 0.0085 Epoch: 14 Global Step: 236620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:18,615-Speed 9404.22 samples/sec Loss 4.2417 LearningRate 0.0085 Epoch: 14 Global Step: 236630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:19,749-Speed 9041.54 samples/sec Loss 4.2816 LearningRate 0.0085 Epoch: 14 Global Step: 236640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:20,833-Speed 9451.84 samples/sec Loss 4.2523 LearningRate 0.0085 Epoch: 14 Global Step: 236650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:21,950-Speed 9171.96 samples/sec Loss 4.3101 LearningRate 0.0085 Epoch: 14 Global Step: 236660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:23,027-Speed 9515.39 samples/sec Loss 4.3475 LearningRate 0.0085 Epoch: 14 Global Step: 236670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:24,088-Speed 9656.56 samples/sec Loss 4.2909 LearningRate 0.0085 Epoch: 14 Global Step: 236680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:25,149-Speed 9655.40 samples/sec Loss 4.2134 LearningRate 0.0085 Epoch: 14 Global Step: 236690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:26,217-Speed 9600.94 samples/sec Loss 4.2961 LearningRate 0.0085 Epoch: 14 Global Step: 236700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:27,302-Speed 9441.53 samples/sec Loss 4.2909 LearningRate 0.0085 Epoch: 14 Global Step: 236710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:28,390-Speed 9413.54 samples/sec Loss 4.3477 LearningRate 0.0085 Epoch: 14 Global Step: 236720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:29,503-Speed 9201.96 samples/sec Loss 4.3667 LearningRate 0.0085 Epoch: 14 Global Step: 236730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:30,592-Speed 9414.23 samples/sec Loss 4.3294 LearningRate 0.0085 Epoch: 14 Global Step: 236740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:31,673-Speed 9477.96 samples/sec Loss 4.2161 LearningRate 0.0085 Epoch: 14 Global Step: 236750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:32,751-Speed 9501.56 samples/sec Loss 4.2685 LearningRate 0.0085 Epoch: 14 Global Step: 236760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:33,829-Speed 9506.90 samples/sec Loss 4.2679 LearningRate 0.0085 Epoch: 14 Global Step: 236770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:34,896-Speed 9609.72 samples/sec Loss 4.3693 LearningRate 0.0085 Epoch: 14 Global Step: 236780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:35,976-Speed 9487.16 samples/sec Loss 4.2464 LearningRate 0.0084 Epoch: 14 Global Step: 236790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:37,026-Speed 9756.56 samples/sec Loss 4.2915 LearningRate 0.0084 Epoch: 14 Global Step: 236800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:38,073-Speed 9786.50 samples/sec Loss 4.3023 LearningRate 0.0084 Epoch: 14 Global Step: 236810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:39,111-Speed 9866.85 samples/sec Loss 4.2859 LearningRate 0.0084 Epoch: 14 Global Step: 236820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:40,160-Speed 9770.09 samples/sec Loss 4.3388 LearningRate 0.0084 Epoch: 14 Global Step: 236830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:41,239-Speed 9496.54 samples/sec Loss 4.2527 LearningRate 0.0084 Epoch: 14 Global Step: 236840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:42,287-Speed 9771.42 samples/sec Loss 4.3592 LearningRate 0.0084 Epoch: 14 Global Step: 236850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:43,333-Speed 9798.84 samples/sec Loss 4.3109 LearningRate 0.0084 Epoch: 14 Global Step: 236860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:44,405-Speed 9558.99 samples/sec Loss 4.3388 LearningRate 0.0084 Epoch: 14 Global Step: 236870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:45,488-Speed 9459.44 samples/sec Loss 4.2547 LearningRate 0.0084 Epoch: 14 Global Step: 236880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:46,558-Speed 9574.95 samples/sec Loss 4.3063 LearningRate 0.0084 Epoch: 14 Global Step: 236890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:47,672-Speed 9201.07 samples/sec Loss 4.2492 LearningRate 0.0084 Epoch: 14 Global Step: 236900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:48,739-Speed 9601.38 samples/sec Loss 4.3293 LearningRate 0.0084 Epoch: 14 Global Step: 236910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:06:49,849-Speed 9232.48 samples/sec Loss 4.3761 LearningRate 0.0084 Epoch: 14 Global Step: 236920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:50,903-Speed 9718.61 samples/sec Loss 4.2626 LearningRate 0.0084 Epoch: 14 Global Step: 236930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:51,962-Speed 9673.59 samples/sec Loss 4.2854 LearningRate 0.0084 Epoch: 14 Global Step: 236940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:53,039-Speed 9515.17 samples/sec Loss 4.3104 LearningRate 0.0084 Epoch: 14 Global Step: 236950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:54,088-Speed 9771.10 samples/sec Loss 4.3303 LearningRate 0.0084 Epoch: 14 Global Step: 236960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:55,155-Speed 9606.56 samples/sec Loss 4.2248 LearningRate 0.0084 Epoch: 14 Global Step: 236970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:56,240-Speed 9443.83 samples/sec Loss 4.2765 LearningRate 0.0084 Epoch: 14 Global Step: 236980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:57,331-Speed 9389.78 samples/sec Loss 4.3330 LearningRate 0.0084 Epoch: 14 Global Step: 236990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:58,428-Speed 9338.10 samples/sec Loss 4.3126 LearningRate 0.0084 Epoch: 14 Global Step: 237000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:06:59,479-Speed 9743.34 samples/sec Loss 4.3054 LearningRate 0.0084 Epoch: 14 Global Step: 237010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:00,544-Speed 9625.52 samples/sec Loss 4.3038 LearningRate 0.0084 Epoch: 14 Global Step: 237020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:01,613-Speed 9584.79 samples/sec Loss 4.2690 LearningRate 0.0084 Epoch: 14 Global Step: 237030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:02,664-Speed 9756.23 samples/sec Loss 4.3103 LearningRate 0.0084 Epoch: 14 Global Step: 237040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:03,783-Speed 9154.97 samples/sec Loss 4.2787 LearningRate 0.0084 Epoch: 14 Global Step: 237050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:04,841-Speed 9677.92 samples/sec Loss 4.3327 LearningRate 0.0084 Epoch: 14 Global Step: 237060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:05,914-Speed 9550.08 samples/sec Loss 4.2410 LearningRate 0.0084 Epoch: 14 Global Step: 237070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:07,011-Speed 9347.20 samples/sec Loss 4.2937 LearningRate 0.0084 Epoch: 14 Global Step: 237080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:08,109-Speed 9333.36 samples/sec Loss 4.3691 LearningRate 0.0084 Epoch: 14 Global Step: 237090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:09,132-Speed 10014.44 samples/sec Loss 4.3283 LearningRate 0.0084 Epoch: 14 Global Step: 237100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:10,193-Speed 9651.56 samples/sec Loss 4.3650 LearningRate 0.0084 Epoch: 14 Global Step: 237110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:11,286-Speed 9376.25 samples/sec Loss 4.2853 LearningRate 0.0084 Epoch: 14 Global Step: 237120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:12,340-Speed 9725.74 samples/sec Loss 4.3237 LearningRate 0.0084 Epoch: 14 Global Step: 237130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:13,401-Speed 9654.91 samples/sec Loss 4.3973 LearningRate 0.0084 Epoch: 14 Global Step: 237140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:14,488-Speed 9426.08 samples/sec Loss 4.2956 LearningRate 0.0084 Epoch: 14 Global Step: 237150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:15,564-Speed 9527.67 samples/sec Loss 4.3937 LearningRate 0.0084 Epoch: 14 Global Step: 237160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:16,674-Speed 9230.65 samples/sec Loss 4.2840 LearningRate 0.0084 Epoch: 14 Global Step: 237170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:17,760-Speed 9431.77 samples/sec Loss 4.3491 LearningRate 0.0084 Epoch: 14 Global Step: 237180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:18,862-Speed 9299.55 samples/sec Loss 4.2927 LearningRate 0.0084 Epoch: 14 Global Step: 237190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:19,953-Speed 9390.91 samples/sec Loss 4.3911 LearningRate 0.0084 Epoch: 14 Global Step: 237200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:21,052-Speed 9317.94 samples/sec Loss 4.3064 LearningRate 0.0084 Epoch: 14 Global Step: 237210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:22,152-Speed 9315.39 samples/sec Loss 4.2600 LearningRate 0.0084 Epoch: 14 Global Step: 237220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:23,242-Speed 9402.41 samples/sec Loss 4.3439 LearningRate 0.0084 Epoch: 14 Global Step: 237230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:24,311-Speed 9589.88 samples/sec Loss 4.3472 LearningRate 0.0084 Epoch: 14 Global Step: 237240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:25,378-Speed 9603.71 samples/sec Loss 4.3408 LearningRate 0.0084 Epoch: 14 Global Step: 237250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:26,468-Speed 9400.70 samples/sec Loss 4.3785 LearningRate 0.0084 Epoch: 14 Global Step: 237260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:27,515-Speed 9785.68 samples/sec Loss 4.2535 LearningRate 0.0084 Epoch: 14 Global Step: 237270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:28,573-Speed 9686.64 samples/sec Loss 4.3324 LearningRate 0.0084 Epoch: 14 Global Step: 237280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:29,609-Speed 9883.67 samples/sec Loss 4.1982 LearningRate 0.0084 Epoch: 14 Global Step: 237290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:30,694-Speed 9445.18 samples/sec Loss 4.3211 LearningRate 0.0084 Epoch: 14 Global Step: 237300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:31,753-Speed 9676.57 samples/sec Loss 4.3284 LearningRate 0.0084 Epoch: 14 Global Step: 237310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:32,836-Speed 9462.76 samples/sec Loss 4.2757 LearningRate 0.0084 Epoch: 14 Global Step: 237320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:33,950-Speed 9193.04 samples/sec Loss 4.3349 LearningRate 0.0084 Epoch: 14 Global Step: 237330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:35,009-Speed 9679.26 samples/sec Loss 4.3033 LearningRate 0.0084 Epoch: 14 Global Step: 237340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:36,112-Speed 9285.77 samples/sec Loss 4.4320 LearningRate 0.0084 Epoch: 14 Global Step: 237350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:37,200-Speed 9418.40 samples/sec Loss 4.3155 LearningRate 0.0083 Epoch: 14 Global Step: 237360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:38,291-Speed 9387.03 samples/sec Loss 4.3194 LearningRate 0.0083 Epoch: 14 Global Step: 237370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:39,373-Speed 9470.87 samples/sec Loss 4.2621 LearningRate 0.0083 Epoch: 14 Global Step: 237380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:40,463-Speed 9400.46 samples/sec Loss 4.4023 LearningRate 0.0083 Epoch: 14 Global Step: 237390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:41,556-Speed 9376.32 samples/sec Loss 4.3412 LearningRate 0.0083 Epoch: 14 Global Step: 237400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:42,585-Speed 9957.81 samples/sec Loss 4.4565 LearningRate 0.0083 Epoch: 14 Global Step: 237410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:43,652-Speed 9603.64 samples/sec Loss 4.3624 LearningRate 0.0083 Epoch: 14 Global Step: 237420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:44,732-Speed 9493.50 samples/sec Loss 4.3408 LearningRate 0.0083 Epoch: 14 Global Step: 237430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:45,813-Speed 9476.52 samples/sec Loss 4.3928 LearningRate 0.0083 Epoch: 14 Global Step: 237440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:46,911-Speed 9332.81 samples/sec Loss 4.4298 LearningRate 0.0083 Epoch: 14 Global Step: 237450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:48,030-Speed 9156.47 samples/sec Loss 4.2783 LearningRate 0.0083 Epoch: 14 Global Step: 237460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:49,141-Speed 9217.90 samples/sec Loss 4.3198 LearningRate 0.0083 Epoch: 14 Global Step: 237470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:50,280-Speed 8995.82 samples/sec Loss 4.3833 LearningRate 0.0083 Epoch: 14 Global Step: 237480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:51,381-Speed 9305.14 samples/sec Loss 4.4160 LearningRate 0.0083 Epoch: 14 Global Step: 237490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:52,463-Speed 9472.50 samples/sec Loss 4.3355 LearningRate 0.0083 Epoch: 14 Global Step: 237500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:53,564-Speed 9302.71 samples/sec Loss 4.2739 LearningRate 0.0083 Epoch: 14 Global Step: 237510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:07:54,664-Speed 9321.00 samples/sec Loss 4.3468 LearningRate 0.0083 Epoch: 14 Global Step: 237520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:55,723-Speed 9678.75 samples/sec Loss 4.3973 LearningRate 0.0083 Epoch: 14 Global Step: 237530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:56,844-Speed 9135.42 samples/sec Loss 4.3227 LearningRate 0.0083 Epoch: 14 Global Step: 237540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:57,994-Speed 8913.67 samples/sec Loss 4.2970 LearningRate 0.0083 Epoch: 14 Global Step: 237550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:07:59,080-Speed 9431.53 samples/sec Loss 4.3660 LearningRate 0.0083 Epoch: 14 Global Step: 237560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:00,155-Speed 9537.26 samples/sec Loss 4.3273 LearningRate 0.0083 Epoch: 14 Global Step: 237570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:01,220-Speed 9617.10 samples/sec Loss 4.2877 LearningRate 0.0083 Epoch: 14 Global Step: 237580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:02,299-Speed 9500.06 samples/sec Loss 4.3332 LearningRate 0.0083 Epoch: 14 Global Step: 237590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:03,405-Speed 9262.82 samples/sec Loss 4.1983 LearningRate 0.0083 Epoch: 14 Global Step: 237600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:04,470-Speed 9621.68 samples/sec Loss 4.3807 LearningRate 0.0083 Epoch: 14 Global Step: 237610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:05,551-Speed 9478.01 samples/sec Loss 4.3723 LearningRate 0.0083 Epoch: 14 Global Step: 237620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:06,598-Speed 9782.81 samples/sec Loss 4.3508 LearningRate 0.0083 Epoch: 14 Global Step: 237630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:07,663-Speed 9619.69 samples/sec Loss 4.3077 LearningRate 0.0083 Epoch: 14 Global Step: 237640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:08,766-Speed 9295.61 samples/sec Loss 4.4144 LearningRate 0.0083 Epoch: 14 Global Step: 237650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:09,861-Speed 9352.16 samples/sec Loss 4.3449 LearningRate 0.0083 Epoch: 14 Global Step: 237660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:10,957-Speed 9345.34 samples/sec Loss 4.3293 LearningRate 0.0083 Epoch: 14 Global Step: 237670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:12,026-Speed 9588.01 samples/sec Loss 4.3204 LearningRate 0.0083 Epoch: 14 Global Step: 237680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:13,114-Speed 9416.84 samples/sec Loss 4.2823 LearningRate 0.0083 Epoch: 14 Global Step: 237690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:14,209-Speed 9353.96 samples/sec Loss 4.3136 LearningRate 0.0083 Epoch: 14 Global Step: 237700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:15,309-Speed 9322.68 samples/sec Loss 4.3633 LearningRate 0.0083 Epoch: 14 Global Step: 237710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:16,389-Speed 9485.07 samples/sec Loss 4.3310 LearningRate 0.0083 Epoch: 14 Global Step: 237720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:17,475-Speed 9432.90 samples/sec Loss 4.3783 LearningRate 0.0083 Epoch: 14 Global Step: 237730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:18,567-Speed 9378.70 samples/sec Loss 4.3450 LearningRate 0.0083 Epoch: 14 Global Step: 237740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:19,655-Speed 9423.23 samples/sec Loss 4.2856 LearningRate 0.0083 Epoch: 14 Global Step: 237750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:20,766-Speed 9223.10 samples/sec Loss 4.3549 LearningRate 0.0083 Epoch: 14 Global Step: 237760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:21,845-Speed 9500.54 samples/sec Loss 4.3031 LearningRate 0.0083 Epoch: 14 Global Step: 237770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:22,914-Speed 9578.66 samples/sec Loss 4.3454 LearningRate 0.0083 Epoch: 14 Global Step: 237780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:24,001-Speed 9423.16 samples/sec Loss 4.4091 LearningRate 0.0083 Epoch: 14 Global Step: 237790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:25,087-Speed 9437.74 samples/sec Loss 4.3483 LearningRate 0.0083 Epoch: 14 Global Step: 237800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:26,211-Speed 9118.15 samples/sec Loss 4.4691 LearningRate 0.0083 Epoch: 14 Global Step: 237810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:27,278-Speed 9602.30 samples/sec Loss 4.2848 LearningRate 0.0083 Epoch: 14 Global Step: 237820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:28,363-Speed 9443.00 samples/sec Loss 4.3389 LearningRate 0.0083 Epoch: 14 Global Step: 237830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:29,473-Speed 9234.47 samples/sec Loss 4.3186 LearningRate 0.0083 Epoch: 14 Global Step: 237840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:30,613-Speed 8986.40 samples/sec Loss 4.3814 LearningRate 0.0083 Epoch: 14 Global Step: 237850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:31,738-Speed 9107.66 samples/sec Loss 4.3062 LearningRate 0.0083 Epoch: 14 Global Step: 237860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:32,820-Speed 9468.75 samples/sec Loss 4.2526 LearningRate 0.0083 Epoch: 14 Global Step: 237870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:33,879-Speed 9675.29 samples/sec Loss 4.4078 LearningRate 0.0083 Epoch: 14 Global Step: 237880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:34,971-Speed 9386.58 samples/sec Loss 4.3087 LearningRate 0.0083 Epoch: 14 Global Step: 237890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:36,031-Speed 9662.94 samples/sec Loss 4.3320 LearningRate 0.0083 Epoch: 14 Global Step: 237900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:37,058-Speed 9969.72 samples/sec Loss 4.4039 LearningRate 0.0083 Epoch: 14 Global Step: 237910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:38,132-Speed 9545.00 samples/sec Loss 4.3406 LearningRate 0.0083 Epoch: 14 Global Step: 237920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:39,185-Speed 9729.51 samples/sec Loss 4.3420 LearningRate 0.0083 Epoch: 14 Global Step: 237930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:40,288-Speed 9291.39 samples/sec Loss 4.2745 LearningRate 0.0082 Epoch: 14 Global Step: 237940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:41,353-Speed 9625.00 samples/sec Loss 4.3610 LearningRate 0.0082 Epoch: 14 Global Step: 237950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:42,440-Speed 9426.00 samples/sec Loss 4.3300 LearningRate 0.0082 Epoch: 14 Global Step: 237960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:43,519-Speed 9494.18 samples/sec Loss 4.3077 LearningRate 0.0082 Epoch: 14 Global Step: 237970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:08:44,578-Speed 9676.45 samples/sec Loss 4.4674 LearningRate 0.0082 Epoch: 14 Global Step: 237980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:45,642-Speed 9632.39 samples/sec Loss 4.2571 LearningRate 0.0082 Epoch: 14 Global Step: 237990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:08:46,716-Speed 9531.79 samples/sec Loss 4.3980 LearningRate 0.0082 Epoch: 14 Global Step: 238000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:09:08,874-[lfw][238000]XNorm: 7.709809 Training: 2022-04-11 21:09:08,875-[lfw][238000]Accuracy-Flip: 0.99550+-0.00289 Training: 2022-04-11 21:09:08,875-[lfw][238000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:09:34,458-[cfp_fp][238000]XNorm: 6.654449 Training: 2022-04-11 21:09:34,458-[cfp_fp][238000]Accuracy-Flip: 0.96886+-0.00717 Training: 2022-04-11 21:09:34,459-[cfp_fp][238000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:09:56,498-[agedb_30][238000]XNorm: 7.481106 Training: 2022-04-11 21:09:56,499-[agedb_30][238000]Accuracy-Flip: 0.96683+-0.00917 Training: 2022-04-11 21:09:56,499-[agedb_30][238000]Accuracy-Highest: 0.97250 Training: 2022-04-11 21:09:57,602-Speed 144.46 samples/sec Loss 4.3896 LearningRate 0.0082 Epoch: 14 Global Step: 238010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:09:58,679-Speed 9516.05 samples/sec Loss 4.3070 LearningRate 0.0082 Epoch: 14 Global Step: 238020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:09:59,737-Speed 9680.83 samples/sec Loss 4.4323 LearningRate 0.0082 Epoch: 14 Global Step: 238030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:00,782-Speed 9810.25 samples/sec Loss 4.3093 LearningRate 0.0082 Epoch: 14 Global Step: 238040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:01,872-Speed 9396.11 samples/sec Loss 4.3391 LearningRate 0.0082 Epoch: 14 Global Step: 238050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:02,923-Speed 9747.76 samples/sec Loss 4.4112 LearningRate 0.0082 Epoch: 14 Global Step: 238060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:04,020-Speed 9343.70 samples/sec Loss 4.2898 LearningRate 0.0082 Epoch: 14 Global Step: 238070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:05,054-Speed 9902.42 samples/sec Loss 4.3958 LearningRate 0.0082 Epoch: 14 Global Step: 238080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:06,163-Speed 9244.66 samples/sec Loss 4.3192 LearningRate 0.0082 Epoch: 14 Global Step: 238090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:07,231-Speed 9591.86 samples/sec Loss 4.3054 LearningRate 0.0082 Epoch: 14 Global Step: 238100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:08,286-Speed 9712.84 samples/sec Loss 4.3437 LearningRate 0.0082 Epoch: 14 Global Step: 238110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:09,375-Speed 9410.76 samples/sec Loss 4.3650 LearningRate 0.0082 Epoch: 14 Global Step: 238120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:10,462-Speed 9426.98 samples/sec Loss 4.3426 LearningRate 0.0082 Epoch: 14 Global Step: 238130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:11,579-Speed 9170.87 samples/sec Loss 4.4116 LearningRate 0.0082 Epoch: 14 Global Step: 238140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:12,717-Speed 8998.01 samples/sec Loss 4.3496 LearningRate 0.0082 Epoch: 14 Global Step: 238150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:13,743-Speed 9985.08 samples/sec Loss 4.3782 LearningRate 0.0082 Epoch: 14 Global Step: 238160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:14,789-Speed 9796.13 samples/sec Loss 4.3748 LearningRate 0.0082 Epoch: 14 Global Step: 238170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:15,833-Speed 9818.46 samples/sec Loss 4.3880 LearningRate 0.0082 Epoch: 14 Global Step: 238180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:16,906-Speed 9546.55 samples/sec Loss 4.3571 LearningRate 0.0082 Epoch: 14 Global Step: 238190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:17,963-Speed 9692.94 samples/sec Loss 4.3414 LearningRate 0.0082 Epoch: 14 Global Step: 238200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:19,073-Speed 9236.81 samples/sec Loss 4.2968 LearningRate 0.0082 Epoch: 14 Global Step: 238210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:20,170-Speed 9333.67 samples/sec Loss 4.4440 LearningRate 0.0082 Epoch: 14 Global Step: 238220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:21,273-Speed 9288.95 samples/sec Loss 4.3909 LearningRate 0.0082 Epoch: 14 Global Step: 238230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:22,394-Speed 9137.86 samples/sec Loss 4.4597 LearningRate 0.0082 Epoch: 14 Global Step: 238240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:23,456-Speed 9651.41 samples/sec Loss 4.2913 LearningRate 0.0082 Epoch: 14 Global Step: 238250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:24,499-Speed 9821.15 samples/sec Loss 4.3532 LearningRate 0.0082 Epoch: 14 Global Step: 238260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:25,597-Speed 9334.21 samples/sec Loss 4.4050 LearningRate 0.0082 Epoch: 14 Global Step: 238270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:26,727-Speed 9069.81 samples/sec Loss 4.3047 LearningRate 0.0082 Epoch: 14 Global Step: 238280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:27,839-Speed 9215.96 samples/sec Loss 4.3141 LearningRate 0.0082 Epoch: 14 Global Step: 238290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:28,959-Speed 9147.06 samples/sec Loss 4.3652 LearningRate 0.0082 Epoch: 14 Global Step: 238300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:30,043-Speed 9454.92 samples/sec Loss 4.3134 LearningRate 0.0082 Epoch: 14 Global Step: 238310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:31,100-Speed 9692.39 samples/sec Loss 4.3428 LearningRate 0.0082 Epoch: 14 Global Step: 238320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:32,162-Speed 9646.45 samples/sec Loss 4.4426 LearningRate 0.0082 Epoch: 14 Global Step: 238330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:33,259-Speed 9339.49 samples/sec Loss 4.4246 LearningRate 0.0082 Epoch: 14 Global Step: 238340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:34,284-Speed 9995.37 samples/sec Loss 4.3714 LearningRate 0.0082 Epoch: 14 Global Step: 238350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:35,369-Speed 9445.63 samples/sec Loss 4.3376 LearningRate 0.0082 Epoch: 14 Global Step: 238360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:36,430-Speed 9656.08 samples/sec Loss 4.3487 LearningRate 0.0082 Epoch: 14 Global Step: 238370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:37,524-Speed 9366.04 samples/sec Loss 4.3257 LearningRate 0.0082 Epoch: 14 Global Step: 238380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:38,606-Speed 9467.54 samples/sec Loss 4.3428 LearningRate 0.0082 Epoch: 14 Global Step: 238390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:39,716-Speed 9233.91 samples/sec Loss 4.2963 LearningRate 0.0082 Epoch: 14 Global Step: 238400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:40,800-Speed 9448.61 samples/sec Loss 4.3185 LearningRate 0.0082 Epoch: 14 Global Step: 238410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:41,875-Speed 9530.50 samples/sec Loss 4.3701 LearningRate 0.0082 Epoch: 14 Global Step: 238420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:42,988-Speed 9206.70 samples/sec Loss 4.3282 LearningRate 0.0082 Epoch: 14 Global Step: 238430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:44,082-Speed 9365.12 samples/sec Loss 4.3436 LearningRate 0.0082 Epoch: 14 Global Step: 238440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:45,191-Speed 9244.90 samples/sec Loss 4.4660 LearningRate 0.0082 Epoch: 14 Global Step: 238450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:46,271-Speed 9488.66 samples/sec Loss 4.3441 LearningRate 0.0082 Epoch: 14 Global Step: 238460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:47,354-Speed 9464.93 samples/sec Loss 4.3221 LearningRate 0.0082 Epoch: 14 Global Step: 238470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:10:48,418-Speed 9623.52 samples/sec Loss 4.4655 LearningRate 0.0082 Epoch: 14 Global Step: 238480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:49,448-Speed 9949.15 samples/sec Loss 4.3802 LearningRate 0.0082 Epoch: 14 Global Step: 238490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:50,513-Speed 9621.29 samples/sec Loss 4.3806 LearningRate 0.0082 Epoch: 14 Global Step: 238500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:51,586-Speed 9544.12 samples/sec Loss 4.2902 LearningRate 0.0082 Epoch: 14 Global Step: 238510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:52,701-Speed 9189.98 samples/sec Loss 4.4443 LearningRate 0.0082 Epoch: 14 Global Step: 238520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:53,808-Speed 9258.52 samples/sec Loss 4.3639 LearningRate 0.0081 Epoch: 14 Global Step: 238530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:54,867-Speed 9669.02 samples/sec Loss 4.4810 LearningRate 0.0081 Epoch: 14 Global Step: 238540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:55,943-Speed 9521.88 samples/sec Loss 4.4228 LearningRate 0.0081 Epoch: 14 Global Step: 238550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:57,031-Speed 9418.86 samples/sec Loss 4.2909 LearningRate 0.0081 Epoch: 14 Global Step: 238560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:58,101-Speed 9582.48 samples/sec Loss 4.3545 LearningRate 0.0081 Epoch: 14 Global Step: 238570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:10:59,192-Speed 9394.45 samples/sec Loss 4.4234 LearningRate 0.0081 Epoch: 14 Global Step: 238580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:00,235-Speed 9822.65 samples/sec Loss 4.3587 LearningRate 0.0081 Epoch: 14 Global Step: 238590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:01,301-Speed 9605.14 samples/sec Loss 4.3228 LearningRate 0.0081 Epoch: 14 Global Step: 238600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:02,368-Speed 9602.18 samples/sec Loss 4.2986 LearningRate 0.0081 Epoch: 14 Global Step: 238610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:03,444-Speed 9527.06 samples/sec Loss 4.3631 LearningRate 0.0081 Epoch: 14 Global Step: 238620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:04,504-Speed 9662.01 samples/sec Loss 4.2946 LearningRate 0.0081 Epoch: 14 Global Step: 238630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:05,580-Speed 9527.00 samples/sec Loss 4.3719 LearningRate 0.0081 Epoch: 14 Global Step: 238640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:06,689-Speed 9239.51 samples/sec Loss 4.3756 LearningRate 0.0081 Epoch: 14 Global Step: 238650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:07,781-Speed 9382.75 samples/sec Loss 4.3506 LearningRate 0.0081 Epoch: 14 Global Step: 238660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:08,887-Speed 9268.75 samples/sec Loss 4.3098 LearningRate 0.0081 Epoch: 14 Global Step: 238670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:09,962-Speed 9528.98 samples/sec Loss 4.4306 LearningRate 0.0081 Epoch: 14 Global Step: 238680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:11,061-Speed 9325.90 samples/sec Loss 4.4245 LearningRate 0.0081 Epoch: 14 Global Step: 238690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:12,107-Speed 9789.98 samples/sec Loss 4.3196 LearningRate 0.0081 Epoch: 14 Global Step: 238700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:13,144-Speed 9880.52 samples/sec Loss 4.4561 LearningRate 0.0081 Epoch: 14 Global Step: 238710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:14,239-Speed 9361.87 samples/sec Loss 4.4685 LearningRate 0.0081 Epoch: 14 Global Step: 238720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:15,330-Speed 9398.89 samples/sec Loss 4.3719 LearningRate 0.0081 Epoch: 14 Global Step: 238730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:16,481-Speed 8899.73 samples/sec Loss 4.3858 LearningRate 0.0081 Epoch: 14 Global Step: 238740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:17,614-Speed 9045.67 samples/sec Loss 4.3853 LearningRate 0.0081 Epoch: 14 Global Step: 238750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:18,741-Speed 9085.18 samples/sec Loss 4.3647 LearningRate 0.0081 Epoch: 14 Global Step: 238760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:19,862-Speed 9140.52 samples/sec Loss 4.3972 LearningRate 0.0081 Epoch: 14 Global Step: 238770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:20,930-Speed 9594.53 samples/sec Loss 4.3847 LearningRate 0.0081 Epoch: 14 Global Step: 238780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:21,976-Speed 9789.95 samples/sec Loss 4.3362 LearningRate 0.0081 Epoch: 14 Global Step: 238790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:23,046-Speed 9575.58 samples/sec Loss 4.3859 LearningRate 0.0081 Epoch: 14 Global Step: 238800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:24,087-Speed 9844.84 samples/sec Loss 4.4425 LearningRate 0.0081 Epoch: 14 Global Step: 238810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:25,135-Speed 9785.62 samples/sec Loss 4.3657 LearningRate 0.0081 Epoch: 14 Global Step: 238820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:26,176-Speed 9838.97 samples/sec Loss 4.3707 LearningRate 0.0081 Epoch: 14 Global Step: 238830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:27,258-Speed 9468.06 samples/sec Loss 4.2933 LearningRate 0.0081 Epoch: 14 Global Step: 238840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:28,348-Speed 9403.26 samples/sec Loss 4.4253 LearningRate 0.0081 Epoch: 14 Global Step: 238850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:29,407-Speed 9672.86 samples/sec Loss 4.3683 LearningRate 0.0081 Epoch: 14 Global Step: 238860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:30,502-Speed 9360.30 samples/sec Loss 4.3323 LearningRate 0.0081 Epoch: 14 Global Step: 238870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:31,593-Speed 9392.52 samples/sec Loss 4.3862 LearningRate 0.0081 Epoch: 14 Global Step: 238880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:32,681-Speed 9410.97 samples/sec Loss 4.3707 LearningRate 0.0081 Epoch: 14 Global Step: 238890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:33,767-Speed 9434.22 samples/sec Loss 4.3819 LearningRate 0.0081 Epoch: 14 Global Step: 238900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:34,816-Speed 9772.67 samples/sec Loss 4.3584 LearningRate 0.0081 Epoch: 14 Global Step: 238910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:35,892-Speed 9519.84 samples/sec Loss 4.3989 LearningRate 0.0081 Epoch: 14 Global Step: 238920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:36,959-Speed 9603.14 samples/sec Loss 4.3682 LearningRate 0.0081 Epoch: 14 Global Step: 238930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:38,088-Speed 9076.71 samples/sec Loss 4.4413 LearningRate 0.0081 Epoch: 14 Global Step: 238940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:39,126-Speed 9875.81 samples/sec Loss 4.4161 LearningRate 0.0081 Epoch: 14 Global Step: 238950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:40,193-Speed 9599.76 samples/sec Loss 4.3902 LearningRate 0.0081 Epoch: 14 Global Step: 238960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:41,271-Speed 9501.18 samples/sec Loss 4.4544 LearningRate 0.0081 Epoch: 14 Global Step: 238970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:42,327-Speed 9703.57 samples/sec Loss 4.3959 LearningRate 0.0081 Epoch: 14 Global Step: 238980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:43,416-Speed 9409.59 samples/sec Loss 4.3381 LearningRate 0.0081 Epoch: 14 Global Step: 238990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:44,438-Speed 10031.10 samples/sec Loss 4.3187 LearningRate 0.0081 Epoch: 14 Global Step: 239000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:45,475-Speed 9887.91 samples/sec Loss 4.5339 LearningRate 0.0081 Epoch: 14 Global Step: 239010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:46,574-Speed 9318.01 samples/sec Loss 4.4027 LearningRate 0.0081 Epoch: 14 Global Step: 239020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:47,673-Speed 9320.53 samples/sec Loss 4.4370 LearningRate 0.0081 Epoch: 14 Global Step: 239030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:48,729-Speed 9701.29 samples/sec Loss 4.4218 LearningRate 0.0081 Epoch: 14 Global Step: 239040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:49,802-Speed 9556.32 samples/sec Loss 4.3769 LearningRate 0.0081 Epoch: 14 Global Step: 239050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:50,857-Speed 9702.70 samples/sec Loss 4.4152 LearningRate 0.0081 Epoch: 14 Global Step: 239060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:51,935-Speed 9507.82 samples/sec Loss 4.4076 LearningRate 0.0081 Epoch: 14 Global Step: 239070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:53,046-Speed 9223.37 samples/sec Loss 4.4709 LearningRate 0.0081 Epoch: 14 Global Step: 239080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:54,152-Speed 9261.49 samples/sec Loss 4.3595 LearningRate 0.0081 Epoch: 14 Global Step: 239090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:55,241-Speed 9408.75 samples/sec Loss 4.4028 LearningRate 0.0081 Epoch: 14 Global Step: 239100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:11:56,372-Speed 9061.68 samples/sec Loss 4.4015 LearningRate 0.0080 Epoch: 14 Global Step: 239110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:57,439-Speed 9598.65 samples/sec Loss 4.3798 LearningRate 0.0080 Epoch: 14 Global Step: 239120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:58,537-Speed 9331.88 samples/sec Loss 4.4133 LearningRate 0.0080 Epoch: 14 Global Step: 239130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:11:59,649-Speed 9217.97 samples/sec Loss 4.3507 LearningRate 0.0080 Epoch: 14 Global Step: 239140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:00,745-Speed 9352.06 samples/sec Loss 4.3725 LearningRate 0.0080 Epoch: 14 Global Step: 239150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:01,849-Speed 9286.39 samples/sec Loss 4.4053 LearningRate 0.0080 Epoch: 14 Global Step: 239160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:02,965-Speed 9175.62 samples/sec Loss 4.3392 LearningRate 0.0080 Epoch: 14 Global Step: 239170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:04,012-Speed 9785.49 samples/sec Loss 4.2991 LearningRate 0.0080 Epoch: 14 Global Step: 239180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:05,132-Speed 9150.02 samples/sec Loss 4.4174 LearningRate 0.0080 Epoch: 14 Global Step: 239190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:06,226-Speed 9361.22 samples/sec Loss 4.4170 LearningRate 0.0080 Epoch: 14 Global Step: 239200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:07,286-Speed 9669.11 samples/sec Loss 4.4449 LearningRate 0.0080 Epoch: 14 Global Step: 239210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:08,340-Speed 9720.84 samples/sec Loss 4.2810 LearningRate 0.0080 Epoch: 14 Global Step: 239220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:09,440-Speed 9321.40 samples/sec Loss 4.3206 LearningRate 0.0080 Epoch: 14 Global Step: 239230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:10,512-Speed 9552.42 samples/sec Loss 4.3917 LearningRate 0.0080 Epoch: 14 Global Step: 239240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:11,608-Speed 9350.60 samples/sec Loss 4.3882 LearningRate 0.0080 Epoch: 14 Global Step: 239250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:12,724-Speed 9175.51 samples/sec Loss 4.4650 LearningRate 0.0080 Epoch: 14 Global Step: 239260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:13,885-Speed 8824.02 samples/sec Loss 4.4090 LearningRate 0.0080 Epoch: 14 Global Step: 239270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:14,970-Speed 9446.00 samples/sec Loss 4.3624 LearningRate 0.0080 Epoch: 14 Global Step: 239280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:16,059-Speed 9416.89 samples/sec Loss 4.4838 LearningRate 0.0080 Epoch: 14 Global Step: 239290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:17,182-Speed 9118.71 samples/sec Loss 4.3890 LearningRate 0.0080 Epoch: 14 Global Step: 239300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:18,242-Speed 9668.81 samples/sec Loss 4.4584 LearningRate 0.0080 Epoch: 14 Global Step: 239310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:19,315-Speed 9549.03 samples/sec Loss 4.4485 LearningRate 0.0080 Epoch: 14 Global Step: 239320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:12:20,396-Speed 9475.59 samples/sec Loss 4.3997 LearningRate 0.0080 Epoch: 14 Global Step: 239330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:21,504-Speed 9252.67 samples/sec Loss 4.3949 LearningRate 0.0080 Epoch: 14 Global Step: 239340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:22,588-Speed 9449.88 samples/sec Loss 4.3288 LearningRate 0.0080 Epoch: 14 Global Step: 239350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:23,667-Speed 9498.47 samples/sec Loss 4.4191 LearningRate 0.0080 Epoch: 14 Global Step: 239360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:24,741-Speed 9541.90 samples/sec Loss 4.4084 LearningRate 0.0080 Epoch: 14 Global Step: 239370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:25,805-Speed 9624.98 samples/sec Loss 4.3690 LearningRate 0.0080 Epoch: 14 Global Step: 239380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:26,890-Speed 9440.24 samples/sec Loss 4.4262 LearningRate 0.0080 Epoch: 14 Global Step: 239390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:27,959-Speed 9590.81 samples/sec Loss 4.3868 LearningRate 0.0080 Epoch: 14 Global Step: 239400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:29,067-Speed 9253.07 samples/sec Loss 4.3703 LearningRate 0.0080 Epoch: 14 Global Step: 239410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:30,143-Speed 9522.85 samples/sec Loss 4.3846 LearningRate 0.0080 Epoch: 14 Global Step: 239420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:31,244-Speed 9301.41 samples/sec Loss 4.3794 LearningRate 0.0080 Epoch: 14 Global Step: 239430 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 21:12:32,321-Speed 9517.05 samples/sec Loss 4.3358 LearningRate 0.0080 Epoch: 14 Global Step: 239440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:33,409-Speed 9415.55 samples/sec Loss 4.4177 LearningRate 0.0080 Epoch: 14 Global Step: 239450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:34,460-Speed 9747.88 samples/sec Loss 4.2885 LearningRate 0.0080 Epoch: 14 Global Step: 239460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:35,491-Speed 9942.36 samples/sec Loss 4.3566 LearningRate 0.0080 Epoch: 14 Global Step: 239470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:36,576-Speed 9442.03 samples/sec Loss 4.4360 LearningRate 0.0080 Epoch: 14 Global Step: 239480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:37,666-Speed 9400.01 samples/sec Loss 4.3280 LearningRate 0.0080 Epoch: 14 Global Step: 239490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:38,752-Speed 9428.75 samples/sec Loss 4.3833 LearningRate 0.0080 Epoch: 14 Global Step: 239500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:39,810-Speed 9692.76 samples/sec Loss 4.4123 LearningRate 0.0080 Epoch: 14 Global Step: 239510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:40,861-Speed 9750.29 samples/sec Loss 4.4040 LearningRate 0.0080 Epoch: 14 Global Step: 239520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:41,892-Speed 9940.36 samples/sec Loss 4.4343 LearningRate 0.0080 Epoch: 14 Global Step: 239530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:42,949-Speed 9693.78 samples/sec Loss 4.4104 LearningRate 0.0080 Epoch: 14 Global Step: 239540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:44,073-Speed 9113.66 samples/sec Loss 4.4328 LearningRate 0.0080 Epoch: 14 Global Step: 239550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:45,130-Speed 9693.70 samples/sec Loss 4.3957 LearningRate 0.0080 Epoch: 14 Global Step: 239560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:46,184-Speed 9724.02 samples/sec Loss 4.4370 LearningRate 0.0080 Epoch: 14 Global Step: 239570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:47,284-Speed 9307.73 samples/sec Loss 4.4288 LearningRate 0.0080 Epoch: 14 Global Step: 239580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:48,370-Speed 9435.35 samples/sec Loss 4.4642 LearningRate 0.0080 Epoch: 14 Global Step: 239590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:49,471-Speed 9307.29 samples/sec Loss 4.5233 LearningRate 0.0080 Epoch: 14 Global Step: 239600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:50,552-Speed 9482.14 samples/sec Loss 4.3964 LearningRate 0.0080 Epoch: 14 Global Step: 239610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:51,635-Speed 9461.41 samples/sec Loss 4.4084 LearningRate 0.0080 Epoch: 14 Global Step: 239620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:52,714-Speed 9488.58 samples/sec Loss 4.3873 LearningRate 0.0080 Epoch: 14 Global Step: 239630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:53,833-Speed 9161.14 samples/sec Loss 4.4924 LearningRate 0.0080 Epoch: 14 Global Step: 239640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:54,891-Speed 9683.43 samples/sec Loss 4.3350 LearningRate 0.0080 Epoch: 14 Global Step: 239650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:55,969-Speed 9504.35 samples/sec Loss 4.3272 LearningRate 0.0080 Epoch: 14 Global Step: 239660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:57,092-Speed 9126.99 samples/sec Loss 4.3814 LearningRate 0.0080 Epoch: 14 Global Step: 239670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:58,213-Speed 9140.96 samples/sec Loss 4.3724 LearningRate 0.0080 Epoch: 14 Global Step: 239680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:12:59,286-Speed 9550.33 samples/sec Loss 4.3751 LearningRate 0.0080 Epoch: 14 Global Step: 239690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:00,351-Speed 9618.21 samples/sec Loss 4.4377 LearningRate 0.0079 Epoch: 14 Global Step: 239700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:01,403-Speed 9741.29 samples/sec Loss 4.4261 LearningRate 0.0079 Epoch: 14 Global Step: 239710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:02,522-Speed 9157.75 samples/sec Loss 4.3430 LearningRate 0.0079 Epoch: 14 Global Step: 239720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:03,587-Speed 9623.36 samples/sec Loss 4.4476 LearningRate 0.0079 Epoch: 14 Global Step: 239730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:04,648-Speed 9659.19 samples/sec Loss 4.3622 LearningRate 0.0079 Epoch: 14 Global Step: 239740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:05,734-Speed 9428.64 samples/sec Loss 4.3819 LearningRate 0.0079 Epoch: 14 Global Step: 239750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:06,813-Speed 9495.51 samples/sec Loss 4.3986 LearningRate 0.0079 Epoch: 14 Global Step: 239760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:07,872-Speed 9679.70 samples/sec Loss 4.3710 LearningRate 0.0079 Epoch: 14 Global Step: 239770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:08,920-Speed 9781.70 samples/sec Loss 4.2489 LearningRate 0.0079 Epoch: 14 Global Step: 239780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:09,990-Speed 9573.63 samples/sec Loss 4.4726 LearningRate 0.0079 Epoch: 14 Global Step: 239790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:11,070-Speed 9486.55 samples/sec Loss 4.3418 LearningRate 0.0079 Epoch: 14 Global Step: 239800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:12,159-Speed 9404.53 samples/sec Loss 4.3850 LearningRate 0.0079 Epoch: 14 Global Step: 239810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:13,268-Speed 9239.64 samples/sec Loss 4.3414 LearningRate 0.0079 Epoch: 14 Global Step: 239820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:14,352-Speed 9458.03 samples/sec Loss 4.4597 LearningRate 0.0079 Epoch: 14 Global Step: 239830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:15,468-Speed 9179.34 samples/sec Loss 4.3275 LearningRate 0.0079 Epoch: 14 Global Step: 239840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:16,581-Speed 9209.55 samples/sec Loss 4.4138 LearningRate 0.0079 Epoch: 14 Global Step: 239850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:17,653-Speed 9562.86 samples/sec Loss 4.4813 LearningRate 0.0079 Epoch: 14 Global Step: 239860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:18,727-Speed 9537.79 samples/sec Loss 4.4854 LearningRate 0.0079 Epoch: 14 Global Step: 239870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:19,802-Speed 9528.93 samples/sec Loss 4.3414 LearningRate 0.0079 Epoch: 14 Global Step: 239880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:20,904-Speed 9302.05 samples/sec Loss 4.4257 LearningRate 0.0079 Epoch: 14 Global Step: 239890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:22,031-Speed 9087.93 samples/sec Loss 4.3829 LearningRate 0.0079 Epoch: 14 Global Step: 239900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:23,115-Speed 9450.48 samples/sec Loss 4.4059 LearningRate 0.0079 Epoch: 14 Global Step: 239910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:24,230-Speed 9190.76 samples/sec Loss 4.4383 LearningRate 0.0079 Epoch: 14 Global Step: 239920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:25,308-Speed 9502.10 samples/sec Loss 4.4436 LearningRate 0.0079 Epoch: 14 Global Step: 239930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:26,397-Speed 9411.41 samples/sec Loss 4.3556 LearningRate 0.0079 Epoch: 14 Global Step: 239940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:27,460-Speed 9644.84 samples/sec Loss 4.2976 LearningRate 0.0079 Epoch: 14 Global Step: 239950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:28,560-Speed 9315.03 samples/sec Loss 4.4591 LearningRate 0.0079 Epoch: 14 Global Step: 239960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:29,645-Speed 9435.25 samples/sec Loss 4.4249 LearningRate 0.0079 Epoch: 14 Global Step: 239970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:13:30,733-Speed 9424.23 samples/sec Loss 4.4070 LearningRate 0.0079 Epoch: 14 Global Step: 239980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:31,792-Speed 9674.62 samples/sec Loss 4.4104 LearningRate 0.0079 Epoch: 14 Global Step: 239990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:32,861-Speed 9578.68 samples/sec Loss 4.3986 LearningRate 0.0079 Epoch: 14 Global Step: 240000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:13:54,594-[lfw][240000]XNorm: 7.469457 Training: 2022-04-11 21:13:54,594-[lfw][240000]Accuracy-Flip: 0.99683+-0.00302 Training: 2022-04-11 21:13:54,595-[lfw][240000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:14:19,743-[cfp_fp][240000]XNorm: 6.468526 Training: 2022-04-11 21:14:19,744-[cfp_fp][240000]Accuracy-Flip: 0.96857+-0.01026 Training: 2022-04-11 21:14:19,744-[cfp_fp][240000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:14:41,446-[agedb_30][240000]XNorm: 7.240392 Training: 2022-04-11 21:14:41,447-[agedb_30][240000]Accuracy-Flip: 0.97150+-0.00908 Training: 2022-04-11 21:14:41,447-[agedb_30][240000]Accuracy-Highest: 0.97250 Training: 2022-04-11 21:14:42,551-Speed 146.94 samples/sec Loss 4.4464 LearningRate 0.0079 Epoch: 14 Global Step: 240010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:14:43,592-Speed 9845.75 samples/sec Loss 4.3968 LearningRate 0.0079 Epoch: 14 Global Step: 240020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:14:44,648-Speed 9705.52 samples/sec Loss 4.3953 LearningRate 0.0079 Epoch: 14 Global Step: 240030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:14:45,765-Speed 9170.46 samples/sec Loss 4.4535 LearningRate 0.0079 Epoch: 14 Global Step: 240040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:14:46,819-Speed 9716.07 samples/sec Loss 4.2955 LearningRate 0.0079 Epoch: 14 Global Step: 240050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:14:47,970-Speed 8905.55 samples/sec Loss 4.3955 LearningRate 0.0079 Epoch: 14 Global Step: 240060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:14:49,079-Speed 9241.17 samples/sec Loss 4.4369 LearningRate 0.0079 Epoch: 14 Global Step: 240070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:14:50,161-Speed 9466.63 samples/sec Loss 4.3814 LearningRate 0.0079 Epoch: 14 Global Step: 240080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:14:51,244-Speed 9463.64 samples/sec Loss 4.3827 LearningRate 0.0079 Epoch: 14 Global Step: 240090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:14:52,333-Speed 9403.82 samples/sec Loss 4.3772 LearningRate 0.0079 Epoch: 14 Global Step: 240100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:14:53,414-Speed 9479.24 samples/sec Loss 4.4917 LearningRate 0.0079 Epoch: 14 Global Step: 240110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:14:54,498-Speed 9456.21 samples/sec Loss 4.3811 LearningRate 0.0079 Epoch: 14 Global Step: 240120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:14:55,566-Speed 9595.90 samples/sec Loss 4.3743 LearningRate 0.0079 Epoch: 14 Global Step: 240130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:14:56,614-Speed 9775.92 samples/sec Loss 4.4033 LearningRate 0.0079 Epoch: 14 Global Step: 240140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:14:57,685-Speed 9566.26 samples/sec Loss 4.3900 LearningRate 0.0079 Epoch: 14 Global Step: 240150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:14:58,747-Speed 9647.19 samples/sec Loss 4.3973 LearningRate 0.0079 Epoch: 14 Global Step: 240160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:14:59,841-Speed 9360.56 samples/sec Loss 4.3358 LearningRate 0.0079 Epoch: 14 Global Step: 240170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:00,913-Speed 9565.90 samples/sec Loss 4.3874 LearningRate 0.0079 Epoch: 14 Global Step: 240180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:01,955-Speed 9828.72 samples/sec Loss 4.4206 LearningRate 0.0079 Epoch: 14 Global Step: 240190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:03,020-Speed 9616.76 samples/sec Loss 4.4025 LearningRate 0.0079 Epoch: 14 Global Step: 240200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:04,082-Speed 9653.11 samples/sec Loss 4.4447 LearningRate 0.0079 Epoch: 14 Global Step: 240210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:05,169-Speed 9425.89 samples/sec Loss 4.4345 LearningRate 0.0079 Epoch: 14 Global Step: 240220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:06,238-Speed 9585.74 samples/sec Loss 4.5363 LearningRate 0.0079 Epoch: 14 Global Step: 240230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:07,289-Speed 9752.03 samples/sec Loss 4.4560 LearningRate 0.0079 Epoch: 14 Global Step: 240240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:08,381-Speed 9377.23 samples/sec Loss 4.3559 LearningRate 0.0079 Epoch: 14 Global Step: 240250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:09,488-Speed 9259.80 samples/sec Loss 4.4575 LearningRate 0.0079 Epoch: 14 Global Step: 240260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:10,565-Speed 9511.46 samples/sec Loss 4.4346 LearningRate 0.0079 Epoch: 14 Global Step: 240270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:11,675-Speed 9226.25 samples/sec Loss 4.3247 LearningRate 0.0079 Epoch: 14 Global Step: 240280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:12,785-Speed 9233.67 samples/sec Loss 4.3593 LearningRate 0.0079 Epoch: 14 Global Step: 240290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:13,865-Speed 9492.83 samples/sec Loss 4.3966 LearningRate 0.0078 Epoch: 14 Global Step: 240300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:14,948-Speed 9460.10 samples/sec Loss 4.4378 LearningRate 0.0078 Epoch: 14 Global Step: 240310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:16,029-Speed 9479.67 samples/sec Loss 4.4614 LearningRate 0.0078 Epoch: 14 Global Step: 240320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:17,117-Speed 9413.02 samples/sec Loss 4.4286 LearningRate 0.0078 Epoch: 14 Global Step: 240330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:18,230-Speed 9208.01 samples/sec Loss 4.4156 LearningRate 0.0078 Epoch: 14 Global Step: 240340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:19,277-Speed 9781.71 samples/sec Loss 4.3468 LearningRate 0.0078 Epoch: 14 Global Step: 240350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:20,370-Speed 9370.85 samples/sec Loss 4.3971 LearningRate 0.0078 Epoch: 14 Global Step: 240360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:21,484-Speed 9204.64 samples/sec Loss 4.3696 LearningRate 0.0078 Epoch: 14 Global Step: 240370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:22,591-Speed 9253.44 samples/sec Loss 4.3955 LearningRate 0.0078 Epoch: 14 Global Step: 240380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:23,681-Speed 9397.45 samples/sec Loss 4.4430 LearningRate 0.0078 Epoch: 14 Global Step: 240390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:24,790-Speed 9242.22 samples/sec Loss 4.3475 LearningRate 0.0078 Epoch: 14 Global Step: 240400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:25,876-Speed 9434.42 samples/sec Loss 4.3776 LearningRate 0.0078 Epoch: 14 Global Step: 240410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:26,947-Speed 9569.05 samples/sec Loss 4.3954 LearningRate 0.0078 Epoch: 14 Global Step: 240420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:28,013-Speed 9610.72 samples/sec Loss 4.3566 LearningRate 0.0078 Epoch: 14 Global Step: 240430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:29,085-Speed 9562.68 samples/sec Loss 4.4958 LearningRate 0.0078 Epoch: 14 Global Step: 240440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:30,163-Speed 9506.73 samples/sec Loss 4.4525 LearningRate 0.0078 Epoch: 14 Global Step: 240450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:31,225-Speed 9644.02 samples/sec Loss 4.4108 LearningRate 0.0078 Epoch: 14 Global Step: 240460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:32,279-Speed 9728.18 samples/sec Loss 4.4623 LearningRate 0.0078 Epoch: 14 Global Step: 240470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:33,371-Speed 9375.68 samples/sec Loss 4.4200 LearningRate 0.0078 Epoch: 14 Global Step: 240480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:34,469-Speed 9337.34 samples/sec Loss 4.4459 LearningRate 0.0078 Epoch: 14 Global Step: 240490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:35,560-Speed 9386.46 samples/sec Loss 4.3748 LearningRate 0.0078 Epoch: 14 Global Step: 240500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:36,645-Speed 9447.90 samples/sec Loss 4.4820 LearningRate 0.0078 Epoch: 14 Global Step: 240510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:37,687-Speed 9825.71 samples/sec Loss 4.4329 LearningRate 0.0078 Epoch: 14 Global Step: 240520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:38,752-Speed 9623.20 samples/sec Loss 4.3559 LearningRate 0.0078 Epoch: 14 Global Step: 240530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:39,781-Speed 9955.74 samples/sec Loss 4.4862 LearningRate 0.0078 Epoch: 14 Global Step: 240540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:40,895-Speed 9194.37 samples/sec Loss 4.3281 LearningRate 0.0078 Epoch: 14 Global Step: 240550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:41,970-Speed 9536.09 samples/sec Loss 4.4658 LearningRate 0.0078 Epoch: 14 Global Step: 240560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:43,079-Speed 9235.22 samples/sec Loss 4.3843 LearningRate 0.0078 Epoch: 14 Global Step: 240570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:44,189-Speed 9237.28 samples/sec Loss 4.3764 LearningRate 0.0078 Epoch: 14 Global Step: 240580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:45,237-Speed 9774.40 samples/sec Loss 4.4216 LearningRate 0.0078 Epoch: 14 Global Step: 240590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:46,312-Speed 9530.44 samples/sec Loss 4.4194 LearningRate 0.0078 Epoch: 14 Global Step: 240600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:47,398-Speed 9435.39 samples/sec Loss 4.4229 LearningRate 0.0078 Epoch: 14 Global Step: 240610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:48,489-Speed 9397.43 samples/sec Loss 4.4400 LearningRate 0.0078 Epoch: 14 Global Step: 240620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:49,579-Speed 9396.56 samples/sec Loss 4.5387 LearningRate 0.0078 Epoch: 14 Global Step: 240630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:50,633-Speed 9725.02 samples/sec Loss 4.4796 LearningRate 0.0078 Epoch: 14 Global Step: 240640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:51,713-Speed 9479.69 samples/sec Loss 4.4156 LearningRate 0.0078 Epoch: 14 Global Step: 240650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:52,823-Speed 9235.72 samples/sec Loss 4.3549 LearningRate 0.0078 Epoch: 14 Global Step: 240660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:53,949-Speed 9097.80 samples/sec Loss 4.3972 LearningRate 0.0078 Epoch: 14 Global Step: 240670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:55,039-Speed 9404.26 samples/sec Loss 4.4203 LearningRate 0.0078 Epoch: 14 Global Step: 240680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:56,138-Speed 9317.56 samples/sec Loss 4.3970 LearningRate 0.0078 Epoch: 14 Global Step: 240690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:57,219-Speed 9480.43 samples/sec Loss 4.3118 LearningRate 0.0078 Epoch: 14 Global Step: 240700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:15:58,301-Speed 9471.41 samples/sec Loss 4.3718 LearningRate 0.0078 Epoch: 14 Global Step: 240710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:15:59,366-Speed 9622.93 samples/sec Loss 4.3719 LearningRate 0.0078 Epoch: 14 Global Step: 240720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:00,472-Speed 9259.67 samples/sec Loss 4.3434 LearningRate 0.0078 Epoch: 14 Global Step: 240730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:01,566-Speed 9370.82 samples/sec Loss 4.3632 LearningRate 0.0078 Epoch: 14 Global Step: 240740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:02,648-Speed 9467.21 samples/sec Loss 4.4008 LearningRate 0.0078 Epoch: 14 Global Step: 240750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:03,743-Speed 9358.50 samples/sec Loss 4.3918 LearningRate 0.0078 Epoch: 14 Global Step: 240760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:04,812-Speed 9588.59 samples/sec Loss 4.4019 LearningRate 0.0078 Epoch: 14 Global Step: 240770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:05,886-Speed 9537.02 samples/sec Loss 4.4272 LearningRate 0.0078 Epoch: 14 Global Step: 240780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:06,964-Speed 9500.42 samples/sec Loss 4.4069 LearningRate 0.0078 Epoch: 14 Global Step: 240790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:07,999-Speed 9908.11 samples/sec Loss 4.3993 LearningRate 0.0078 Epoch: 14 Global Step: 240800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:09,048-Speed 9762.97 samples/sec Loss 4.4828 LearningRate 0.0078 Epoch: 14 Global Step: 240810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:10,115-Speed 9599.99 samples/sec Loss 4.3806 LearningRate 0.0078 Epoch: 14 Global Step: 240820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:11,144-Speed 9959.77 samples/sec Loss 4.4035 LearningRate 0.0078 Epoch: 14 Global Step: 240830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:12,281-Speed 9013.37 samples/sec Loss 4.4129 LearningRate 0.0078 Epoch: 14 Global Step: 240840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:13,364-Speed 9452.35 samples/sec Loss 4.5425 LearningRate 0.0078 Epoch: 14 Global Step: 240850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:14,459-Speed 9368.47 samples/sec Loss 4.4268 LearningRate 0.0078 Epoch: 14 Global Step: 240860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:15,490-Speed 9930.32 samples/sec Loss 4.3691 LearningRate 0.0078 Epoch: 14 Global Step: 240870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:16,579-Speed 9407.18 samples/sec Loss 4.4160 LearningRate 0.0078 Epoch: 14 Global Step: 240880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:17,688-Speed 9246.33 samples/sec Loss 4.4275 LearningRate 0.0077 Epoch: 14 Global Step: 240890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:18,818-Speed 9066.05 samples/sec Loss 4.4320 LearningRate 0.0077 Epoch: 14 Global Step: 240900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:19,884-Speed 9604.70 samples/sec Loss 4.3921 LearningRate 0.0077 Epoch: 14 Global Step: 240910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:20,976-Speed 9389.40 samples/sec Loss 4.4423 LearningRate 0.0077 Epoch: 14 Global Step: 240920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:22,102-Speed 9100.03 samples/sec Loss 4.3587 LearningRate 0.0077 Epoch: 14 Global Step: 240930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:23,175-Speed 9548.34 samples/sec Loss 4.4404 LearningRate 0.0077 Epoch: 14 Global Step: 240940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:24,287-Speed 9221.83 samples/sec Loss 4.4251 LearningRate 0.0077 Epoch: 14 Global Step: 240950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:25,367-Speed 9481.08 samples/sec Loss 4.3817 LearningRate 0.0077 Epoch: 14 Global Step: 240960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:26,424-Speed 9699.78 samples/sec Loss 4.3925 LearningRate 0.0077 Epoch: 14 Global Step: 240970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:27,509-Speed 9438.89 samples/sec Loss 4.4314 LearningRate 0.0077 Epoch: 14 Global Step: 240980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:28,634-Speed 9104.33 samples/sec Loss 4.4145 LearningRate 0.0077 Epoch: 14 Global Step: 240990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:29,765-Speed 9066.65 samples/sec Loss 4.3579 LearningRate 0.0077 Epoch: 14 Global Step: 241000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:30,879-Speed 9200.02 samples/sec Loss 4.3772 LearningRate 0.0077 Epoch: 14 Global Step: 241010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:31,954-Speed 9530.54 samples/sec Loss 4.4399 LearningRate 0.0077 Epoch: 14 Global Step: 241020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:33,040-Speed 9429.66 samples/sec Loss 4.4460 LearningRate 0.0077 Epoch: 14 Global Step: 241030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:34,124-Speed 9452.63 samples/sec Loss 4.4505 LearningRate 0.0077 Epoch: 14 Global Step: 241040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:35,209-Speed 9443.41 samples/sec Loss 4.4666 LearningRate 0.0077 Epoch: 14 Global Step: 241050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:36,291-Speed 9466.09 samples/sec Loss 4.3019 LearningRate 0.0077 Epoch: 14 Global Step: 241060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:37,348-Speed 9697.16 samples/sec Loss 4.3730 LearningRate 0.0077 Epoch: 14 Global Step: 241070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:38,427-Speed 9499.19 samples/sec Loss 4.5167 LearningRate 0.0077 Epoch: 14 Global Step: 241080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:39,512-Speed 9436.35 samples/sec Loss 4.3236 LearningRate 0.0077 Epoch: 14 Global Step: 241090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:40,632-Speed 9148.58 samples/sec Loss 4.3988 LearningRate 0.0077 Epoch: 14 Global Step: 241100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:41,745-Speed 9208.18 samples/sec Loss 4.3693 LearningRate 0.0077 Epoch: 14 Global Step: 241110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:42,835-Speed 9401.75 samples/sec Loss 4.4150 LearningRate 0.0077 Epoch: 14 Global Step: 241120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:43,898-Speed 9643.74 samples/sec Loss 4.3358 LearningRate 0.0077 Epoch: 14 Global Step: 241130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:44,984-Speed 9438.43 samples/sec Loss 4.3279 LearningRate 0.0077 Epoch: 14 Global Step: 241140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:46,068-Speed 9454.77 samples/sec Loss 4.4989 LearningRate 0.0077 Epoch: 14 Global Step: 241150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:47,109-Speed 9839.14 samples/sec Loss 4.3106 LearningRate 0.0077 Epoch: 14 Global Step: 241160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:48,166-Speed 9693.58 samples/sec Loss 4.4780 LearningRate 0.0077 Epoch: 14 Global Step: 241170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:49,257-Speed 9393.53 samples/sec Loss 4.4650 LearningRate 0.0077 Epoch: 14 Global Step: 241180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:50,302-Speed 9799.06 samples/sec Loss 4.3464 LearningRate 0.0077 Epoch: 14 Global Step: 241190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:51,355-Speed 9730.29 samples/sec Loss 4.4208 LearningRate 0.0077 Epoch: 14 Global Step: 241200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:52,422-Speed 9604.53 samples/sec Loss 4.4372 LearningRate 0.0077 Epoch: 14 Global Step: 241210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:53,495-Speed 9545.96 samples/sec Loss 4.5553 LearningRate 0.0077 Epoch: 14 Global Step: 241220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:54,594-Speed 9331.23 samples/sec Loss 4.4147 LearningRate 0.0077 Epoch: 14 Global Step: 241230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:55,625-Speed 9934.82 samples/sec Loss 4.4389 LearningRate 0.0077 Epoch: 14 Global Step: 241240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:16:56,681-Speed 9699.38 samples/sec Loss 4.3351 LearningRate 0.0077 Epoch: 14 Global Step: 241250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:57,790-Speed 9236.49 samples/sec Loss 4.3890 LearningRate 0.0077 Epoch: 14 Global Step: 241260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:16:58,901-Speed 9224.42 samples/sec Loss 4.3145 LearningRate 0.0077 Epoch: 14 Global Step: 241270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:00,025-Speed 9124.47 samples/sec Loss 4.4555 LearningRate 0.0077 Epoch: 14 Global Step: 241280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:01,146-Speed 9149.25 samples/sec Loss 4.4213 LearningRate 0.0077 Epoch: 14 Global Step: 241290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:02,247-Speed 9300.42 samples/sec Loss 4.4916 LearningRate 0.0077 Epoch: 14 Global Step: 241300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:03,325-Speed 9506.98 samples/sec Loss 4.3392 LearningRate 0.0077 Epoch: 14 Global Step: 241310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:04,393-Speed 9590.23 samples/sec Loss 4.5262 LearningRate 0.0077 Epoch: 14 Global Step: 241320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:05,496-Speed 9291.45 samples/sec Loss 4.4285 LearningRate 0.0077 Epoch: 14 Global Step: 241330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:06,587-Speed 9390.57 samples/sec Loss 4.3355 LearningRate 0.0077 Epoch: 14 Global Step: 241340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:07,691-Speed 9277.92 samples/sec Loss 4.4699 LearningRate 0.0077 Epoch: 14 Global Step: 241350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:08,751-Speed 9669.77 samples/sec Loss 4.4961 LearningRate 0.0077 Epoch: 14 Global Step: 241360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:09,786-Speed 9899.60 samples/sec Loss 4.4544 LearningRate 0.0077 Epoch: 14 Global Step: 241370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:10,884-Speed 9329.69 samples/sec Loss 4.3407 LearningRate 0.0077 Epoch: 14 Global Step: 241380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:11,968-Speed 9454.80 samples/sec Loss 4.4763 LearningRate 0.0077 Epoch: 14 Global Step: 241390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:13,060-Speed 9382.45 samples/sec Loss 4.5362 LearningRate 0.0077 Epoch: 14 Global Step: 241400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:14,160-Speed 9315.00 samples/sec Loss 4.4977 LearningRate 0.0077 Epoch: 14 Global Step: 241410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:15,244-Speed 9452.09 samples/sec Loss 4.5040 LearningRate 0.0077 Epoch: 14 Global Step: 241420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:16,332-Speed 9414.49 samples/sec Loss 4.4439 LearningRate 0.0077 Epoch: 14 Global Step: 241430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:17,414-Speed 9469.13 samples/sec Loss 4.4856 LearningRate 0.0077 Epoch: 14 Global Step: 241440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:18,494-Speed 9493.79 samples/sec Loss 4.4676 LearningRate 0.0077 Epoch: 14 Global Step: 241450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:19,584-Speed 9397.02 samples/sec Loss 4.3479 LearningRate 0.0077 Epoch: 14 Global Step: 241460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:20,655-Speed 9567.54 samples/sec Loss 4.4552 LearningRate 0.0077 Epoch: 14 Global Step: 241470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:21,733-Speed 9501.25 samples/sec Loss 4.3708 LearningRate 0.0077 Epoch: 14 Global Step: 241480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:22,795-Speed 9651.98 samples/sec Loss 4.4138 LearningRate 0.0076 Epoch: 14 Global Step: 241490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:23,872-Speed 9513.79 samples/sec Loss 4.4502 LearningRate 0.0076 Epoch: 14 Global Step: 241500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:24,963-Speed 9389.71 samples/sec Loss 4.4024 LearningRate 0.0076 Epoch: 14 Global Step: 241510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:26,056-Speed 9380.65 samples/sec Loss 4.3665 LearningRate 0.0076 Epoch: 14 Global Step: 241520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:27,131-Speed 9523.82 samples/sec Loss 4.4440 LearningRate 0.0076 Epoch: 14 Global Step: 241530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:28,174-Speed 9826.50 samples/sec Loss 4.3728 LearningRate 0.0076 Epoch: 14 Global Step: 241540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:29,261-Speed 9428.43 samples/sec Loss 4.4938 LearningRate 0.0076 Epoch: 14 Global Step: 241550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:30,318-Speed 9688.84 samples/sec Loss 4.3796 LearningRate 0.0076 Epoch: 14 Global Step: 241560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:31,400-Speed 9476.27 samples/sec Loss 4.2956 LearningRate 0.0076 Epoch: 14 Global Step: 241570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:32,491-Speed 9392.40 samples/sec Loss 4.3659 LearningRate 0.0076 Epoch: 14 Global Step: 241580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:33,638-Speed 8927.39 samples/sec Loss 4.4217 LearningRate 0.0076 Epoch: 14 Global Step: 241590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:34,692-Speed 9722.29 samples/sec Loss 4.4439 LearningRate 0.0076 Epoch: 14 Global Step: 241600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:35,736-Speed 9817.01 samples/sec Loss 4.4642 LearningRate 0.0076 Epoch: 14 Global Step: 241610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:36,820-Speed 9448.84 samples/sec Loss 4.4491 LearningRate 0.0076 Epoch: 14 Global Step: 241620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:37,872-Speed 9740.22 samples/sec Loss 4.4912 LearningRate 0.0076 Epoch: 14 Global Step: 241630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:38,944-Speed 9561.40 samples/sec Loss 4.4742 LearningRate 0.0076 Epoch: 14 Global Step: 241640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:40,018-Speed 9540.59 samples/sec Loss 4.4288 LearningRate 0.0076 Epoch: 14 Global Step: 241650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:41,119-Speed 9305.95 samples/sec Loss 4.4425 LearningRate 0.0076 Epoch: 14 Global Step: 241660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:42,206-Speed 9427.64 samples/sec Loss 4.3515 LearningRate 0.0076 Epoch: 14 Global Step: 241670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:43,278-Speed 9555.91 samples/sec Loss 4.4007 LearningRate 0.0076 Epoch: 14 Global Step: 241680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:44,351-Speed 9554.32 samples/sec Loss 4.5042 LearningRate 0.0076 Epoch: 14 Global Step: 241690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:45,403-Speed 9738.82 samples/sec Loss 4.3873 LearningRate 0.0076 Epoch: 14 Global Step: 241700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:46,494-Speed 9387.80 samples/sec Loss 4.3931 LearningRate 0.0076 Epoch: 14 Global Step: 241710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:47,539-Speed 9803.67 samples/sec Loss 4.5124 LearningRate 0.0076 Epoch: 14 Global Step: 241720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:48,625-Speed 9441.38 samples/sec Loss 4.3612 LearningRate 0.0076 Epoch: 14 Global Step: 241730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:49,721-Speed 9342.48 samples/sec Loss 4.3850 LearningRate 0.0076 Epoch: 14 Global Step: 241740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:50,822-Speed 9305.06 samples/sec Loss 4.4673 LearningRate 0.0076 Epoch: 14 Global Step: 241750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:51,949-Speed 9096.15 samples/sec Loss 4.4323 LearningRate 0.0076 Epoch: 14 Global Step: 241760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:53,019-Speed 9574.58 samples/sec Loss 4.4136 LearningRate 0.0076 Epoch: 14 Global Step: 241770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:54,141-Speed 9128.54 samples/sec Loss 4.4444 LearningRate 0.0076 Epoch: 14 Global Step: 241780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:55,243-Speed 9306.51 samples/sec Loss 4.4158 LearningRate 0.0076 Epoch: 14 Global Step: 241790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:56,324-Speed 9472.88 samples/sec Loss 4.4193 LearningRate 0.0076 Epoch: 14 Global Step: 241800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:57,417-Speed 9382.85 samples/sec Loss 4.5000 LearningRate 0.0076 Epoch: 14 Global Step: 241810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:17:58,482-Speed 9615.00 samples/sec Loss 4.4084 LearningRate 0.0076 Epoch: 14 Global Step: 241820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:17:59,544-Speed 9651.74 samples/sec Loss 4.4855 LearningRate 0.0076 Epoch: 14 Global Step: 241830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:18:00,634-Speed 9399.66 samples/sec Loss 4.5434 LearningRate 0.0076 Epoch: 14 Global Step: 241840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:18:01,726-Speed 9382.76 samples/sec Loss 4.4097 LearningRate 0.0076 Epoch: 14 Global Step: 241850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:18:02,823-Speed 9343.62 samples/sec Loss 4.4250 LearningRate 0.0076 Epoch: 14 Global Step: 241860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:18:03,880-Speed 9694.49 samples/sec Loss 4.4804 LearningRate 0.0076 Epoch: 14 Global Step: 241870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:18:04,970-Speed 9392.61 samples/sec Loss 4.5278 LearningRate 0.0076 Epoch: 14 Global Step: 241880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:18:06,064-Speed 9371.77 samples/sec Loss 4.4433 LearningRate 0.0076 Epoch: 14 Global Step: 241890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:18:07,136-Speed 9550.14 samples/sec Loss 4.4226 LearningRate 0.0076 Epoch: 14 Global Step: 241900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:18:08,225-Speed 9409.38 samples/sec Loss 4.4558 LearningRate 0.0076 Epoch: 14 Global Step: 241910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:18:09,325-Speed 9314.44 samples/sec Loss 4.4661 LearningRate 0.0076 Epoch: 14 Global Step: 241920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:18:10,386-Speed 9661.91 samples/sec Loss 4.4640 LearningRate 0.0076 Epoch: 14 Global Step: 241930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:18:11,479-Speed 9369.48 samples/sec Loss 4.4553 LearningRate 0.0076 Epoch: 14 Global Step: 241940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:18:12,564-Speed 9445.48 samples/sec Loss 4.3648 LearningRate 0.0076 Epoch: 14 Global Step: 241950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:18:13,670-Speed 9260.74 samples/sec Loss 4.4183 LearningRate 0.0076 Epoch: 14 Global Step: 241960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:18:14,735-Speed 9620.08 samples/sec Loss 4.3640 LearningRate 0.0076 Epoch: 14 Global Step: 241970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:18:15,820-Speed 9539.04 samples/sec Loss 4.5184 LearningRate 0.0076 Epoch: 14 Global Step: 241980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:18:16,902-Speed 9472.41 samples/sec Loss 4.4235 LearningRate 0.0076 Epoch: 14 Global Step: 241990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:18:18,037-Speed 9032.18 samples/sec Loss 4.4616 LearningRate 0.0076 Epoch: 14 Global Step: 242000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:18:40,165-[lfw][242000]XNorm: 7.533228 Training: 2022-04-11 21:18:40,165-[lfw][242000]Accuracy-Flip: 0.99583+-0.00261 Training: 2022-04-11 21:18:40,166-[lfw][242000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:19:05,632-[cfp_fp][242000]XNorm: 6.510685 Training: 2022-04-11 21:19:05,633-[cfp_fp][242000]Accuracy-Flip: 0.96786+-0.00874 Training: 2022-04-11 21:19:05,633-[cfp_fp][242000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:19:27,513-[agedb_30][242000]XNorm: 7.333325 Training: 2022-04-11 21:19:27,514-[agedb_30][242000]Accuracy-Flip: 0.97233+-0.00879 Training: 2022-04-11 21:19:27,514-[agedb_30][242000]Accuracy-Highest: 0.97250 Training: 2022-04-11 21:19:28,562-Speed 145.20 samples/sec Loss 4.3943 LearningRate 0.0076 Epoch: 14 Global Step: 242010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:29,628-Speed 9611.64 samples/sec Loss 4.4348 LearningRate 0.0076 Epoch: 14 Global Step: 242020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:19:30,724-Speed 9352.32 samples/sec Loss 4.4556 LearningRate 0.0076 Epoch: 14 Global Step: 242030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:19:31,806-Speed 9461.96 samples/sec Loss 4.4900 LearningRate 0.0076 Epoch: 14 Global Step: 242040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:32,909-Speed 9291.55 samples/sec Loss 4.4181 LearningRate 0.0076 Epoch: 14 Global Step: 242050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:33,974-Speed 9620.97 samples/sec Loss 4.3913 LearningRate 0.0076 Epoch: 14 Global Step: 242060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:35,047-Speed 9554.13 samples/sec Loss 4.5014 LearningRate 0.0076 Epoch: 14 Global Step: 242070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:36,154-Speed 9252.78 samples/sec Loss 4.3996 LearningRate 0.0076 Epoch: 14 Global Step: 242080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:37,235-Speed 9478.57 samples/sec Loss 4.4059 LearningRate 0.0076 Epoch: 14 Global Step: 242090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:38,312-Speed 9507.70 samples/sec Loss 4.4306 LearningRate 0.0075 Epoch: 14 Global Step: 242100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:39,413-Speed 9311.33 samples/sec Loss 4.4025 LearningRate 0.0075 Epoch: 14 Global Step: 242110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:40,489-Speed 9518.73 samples/sec Loss 4.3773 LearningRate 0.0075 Epoch: 14 Global Step: 242120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:41,559-Speed 9573.09 samples/sec Loss 4.4327 LearningRate 0.0075 Epoch: 14 Global Step: 242130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:42,613-Speed 9718.22 samples/sec Loss 4.4687 LearningRate 0.0075 Epoch: 14 Global Step: 242140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:19:43,682-Speed 9586.13 samples/sec Loss 4.4441 LearningRate 0.0075 Epoch: 14 Global Step: 242150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:19:44,748-Speed 9617.87 samples/sec Loss 4.5702 LearningRate 0.0075 Epoch: 14 Global Step: 242160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:45,849-Speed 9310.91 samples/sec Loss 4.4166 LearningRate 0.0075 Epoch: 14 Global Step: 242170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:46,924-Speed 9528.57 samples/sec Loss 4.4307 LearningRate 0.0075 Epoch: 14 Global Step: 242180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:48,079-Speed 8868.95 samples/sec Loss 4.3975 LearningRate 0.0075 Epoch: 14 Global Step: 242190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:49,213-Speed 9032.33 samples/sec Loss 4.5695 LearningRate 0.0075 Epoch: 14 Global Step: 242200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:50,245-Speed 9931.32 samples/sec Loss 4.3573 LearningRate 0.0075 Epoch: 14 Global Step: 242210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:51,285-Speed 9855.88 samples/sec Loss 4.3906 LearningRate 0.0075 Epoch: 14 Global Step: 242220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:52,357-Speed 9559.24 samples/sec Loss 4.4117 LearningRate 0.0075 Epoch: 14 Global Step: 242230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:53,422-Speed 9614.34 samples/sec Loss 4.4073 LearningRate 0.0075 Epoch: 14 Global Step: 242240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:54,527-Speed 9275.55 samples/sec Loss 4.4252 LearningRate 0.0075 Epoch: 14 Global Step: 242250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:19:55,584-Speed 9693.08 samples/sec Loss 4.4416 LearningRate 0.0075 Epoch: 14 Global Step: 242260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:19:56,640-Speed 9699.58 samples/sec Loss 4.4088 LearningRate 0.0075 Epoch: 14 Global Step: 242270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:19:57,735-Speed 9356.22 samples/sec Loss 4.4164 LearningRate 0.0075 Epoch: 14 Global Step: 242280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:19:58,769-Speed 9914.49 samples/sec Loss 4.3067 LearningRate 0.0075 Epoch: 14 Global Step: 242290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:19:59,870-Speed 9305.69 samples/sec Loss 4.4884 LearningRate 0.0075 Epoch: 14 Global Step: 242300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:00,937-Speed 9600.63 samples/sec Loss 4.4315 LearningRate 0.0075 Epoch: 14 Global Step: 242310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:02,036-Speed 9324.67 samples/sec Loss 4.4518 LearningRate 0.0075 Epoch: 14 Global Step: 242320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:03,128-Speed 9380.21 samples/sec Loss 4.4821 LearningRate 0.0075 Epoch: 14 Global Step: 242330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:04,173-Speed 9805.83 samples/sec Loss 4.3732 LearningRate 0.0075 Epoch: 14 Global Step: 242340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:05,249-Speed 9525.29 samples/sec Loss 4.5130 LearningRate 0.0075 Epoch: 14 Global Step: 242350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:06,283-Speed 9903.58 samples/sec Loss 4.4575 LearningRate 0.0075 Epoch: 14 Global Step: 242360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:07,359-Speed 9524.25 samples/sec Loss 4.4585 LearningRate 0.0075 Epoch: 14 Global Step: 242370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:08,447-Speed 9420.31 samples/sec Loss 4.3572 LearningRate 0.0075 Epoch: 14 Global Step: 242380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:09,538-Speed 9390.26 samples/sec Loss 4.4823 LearningRate 0.0075 Epoch: 14 Global Step: 242390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:10,604-Speed 9610.25 samples/sec Loss 4.4384 LearningRate 0.0075 Epoch: 14 Global Step: 242400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:11,665-Speed 9656.83 samples/sec Loss 4.4489 LearningRate 0.0075 Epoch: 14 Global Step: 242410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:12,729-Speed 9631.29 samples/sec Loss 4.4634 LearningRate 0.0075 Epoch: 14 Global Step: 242420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:13,839-Speed 9227.26 samples/sec Loss 4.4266 LearningRate 0.0075 Epoch: 14 Global Step: 242430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:14,880-Speed 9850.92 samples/sec Loss 4.4256 LearningRate 0.0075 Epoch: 14 Global Step: 242440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:15,945-Speed 9619.58 samples/sec Loss 4.4448 LearningRate 0.0075 Epoch: 14 Global Step: 242450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:17,020-Speed 9527.93 samples/sec Loss 4.4766 LearningRate 0.0075 Epoch: 14 Global Step: 242460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:18,141-Speed 9142.19 samples/sec Loss 4.4883 LearningRate 0.0075 Epoch: 14 Global Step: 242470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:19,274-Speed 9042.87 samples/sec Loss 4.4243 LearningRate 0.0075 Epoch: 14 Global Step: 242480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:20,349-Speed 9529.38 samples/sec Loss 4.4740 LearningRate 0.0075 Epoch: 14 Global Step: 242490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:21,430-Speed 9482.66 samples/sec Loss 4.4981 LearningRate 0.0075 Epoch: 14 Global Step: 242500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:22,513-Speed 9459.96 samples/sec Loss 4.4549 LearningRate 0.0075 Epoch: 14 Global Step: 242510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:23,587-Speed 9542.69 samples/sec Loss 4.4616 LearningRate 0.0075 Epoch: 14 Global Step: 242520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:24,646-Speed 9674.23 samples/sec Loss 4.3642 LearningRate 0.0075 Epoch: 14 Global Step: 242530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:25,694-Speed 9772.86 samples/sec Loss 4.4477 LearningRate 0.0075 Epoch: 14 Global Step: 242540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:26,757-Speed 9642.93 samples/sec Loss 4.4542 LearningRate 0.0075 Epoch: 14 Global Step: 242550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:27,872-Speed 9191.55 samples/sec Loss 4.3824 LearningRate 0.0075 Epoch: 14 Global Step: 242560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:28,962-Speed 9401.36 samples/sec Loss 4.4910 LearningRate 0.0075 Epoch: 14 Global Step: 242570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:30,049-Speed 9423.38 samples/sec Loss 4.4729 LearningRate 0.0075 Epoch: 14 Global Step: 242580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:31,155-Speed 9259.98 samples/sec Loss 4.4733 LearningRate 0.0075 Epoch: 14 Global Step: 242590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:32,250-Speed 9360.79 samples/sec Loss 4.4408 LearningRate 0.0075 Epoch: 14 Global Step: 242600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:33,297-Speed 9782.71 samples/sec Loss 4.5060 LearningRate 0.0075 Epoch: 14 Global Step: 242610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:34,345-Speed 9784.32 samples/sec Loss 4.5115 LearningRate 0.0075 Epoch: 14 Global Step: 242620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:35,430-Speed 9437.34 samples/sec Loss 4.4723 LearningRate 0.0075 Epoch: 14 Global Step: 242630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:36,506-Speed 9521.61 samples/sec Loss 4.6213 LearningRate 0.0075 Epoch: 14 Global Step: 242640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:37,566-Speed 9665.99 samples/sec Loss 4.4246 LearningRate 0.0075 Epoch: 14 Global Step: 242650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:38,628-Speed 9648.65 samples/sec Loss 4.3707 LearningRate 0.0075 Epoch: 14 Global Step: 242660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:39,696-Speed 9595.38 samples/sec Loss 4.4479 LearningRate 0.0075 Epoch: 14 Global Step: 242670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:40,809-Speed 9205.40 samples/sec Loss 4.5454 LearningRate 0.0075 Epoch: 14 Global Step: 242680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:41,891-Speed 9469.53 samples/sec Loss 4.4417 LearningRate 0.0075 Epoch: 14 Global Step: 242690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:43,017-Speed 9098.39 samples/sec Loss 4.4013 LearningRate 0.0075 Epoch: 14 Global Step: 242700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:44,125-Speed 9247.60 samples/sec Loss 4.3829 LearningRate 0.0074 Epoch: 14 Global Step: 242710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:45,198-Speed 9549.34 samples/sec Loss 4.4710 LearningRate 0.0074 Epoch: 14 Global Step: 242720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:46,258-Speed 9667.26 samples/sec Loss 4.4228 LearningRate 0.0074 Epoch: 14 Global Step: 242730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:47,338-Speed 9491.17 samples/sec Loss 4.4198 LearningRate 0.0074 Epoch: 14 Global Step: 242740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:48,418-Speed 9483.31 samples/sec Loss 4.4002 LearningRate 0.0074 Epoch: 14 Global Step: 242750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:49,498-Speed 9493.29 samples/sec Loss 4.3035 LearningRate 0.0074 Epoch: 14 Global Step: 242760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:50,595-Speed 9332.82 samples/sec Loss 4.2938 LearningRate 0.0074 Epoch: 14 Global Step: 242770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:51,656-Speed 9663.38 samples/sec Loss 4.4513 LearningRate 0.0074 Epoch: 14 Global Step: 242780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:20:52,713-Speed 9698.86 samples/sec Loss 4.4338 LearningRate 0.0074 Epoch: 14 Global Step: 242790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:53,804-Speed 9387.86 samples/sec Loss 4.4892 LearningRate 0.0074 Epoch: 14 Global Step: 242800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:54,881-Speed 9511.89 samples/sec Loss 4.4005 LearningRate 0.0074 Epoch: 14 Global Step: 242810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:55,958-Speed 9516.21 samples/sec Loss 4.4975 LearningRate 0.0074 Epoch: 14 Global Step: 242820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:57,040-Speed 9470.12 samples/sec Loss 4.5238 LearningRate 0.0074 Epoch: 14 Global Step: 242830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:58,092-Speed 9732.42 samples/sec Loss 4.4186 LearningRate 0.0074 Epoch: 14 Global Step: 242840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:20:59,185-Speed 9376.16 samples/sec Loss 4.4352 LearningRate 0.0074 Epoch: 14 Global Step: 242850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:00,260-Speed 9532.47 samples/sec Loss 4.3631 LearningRate 0.0074 Epoch: 14 Global Step: 242860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:01,317-Speed 9691.63 samples/sec Loss 4.5003 LearningRate 0.0074 Epoch: 14 Global Step: 242870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:02,373-Speed 9703.51 samples/sec Loss 4.3813 LearningRate 0.0074 Epoch: 14 Global Step: 242880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:03,475-Speed 9295.11 samples/sec Loss 4.4484 LearningRate 0.0074 Epoch: 14 Global Step: 242890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:04,584-Speed 9250.82 samples/sec Loss 4.4260 LearningRate 0.0074 Epoch: 14 Global Step: 242900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:05,655-Speed 9566.49 samples/sec Loss 4.5105 LearningRate 0.0074 Epoch: 14 Global Step: 242910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:06,701-Speed 9805.58 samples/sec Loss 4.5101 LearningRate 0.0074 Epoch: 14 Global Step: 242920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:07,789-Speed 9411.55 samples/sec Loss 4.2898 LearningRate 0.0074 Epoch: 14 Global Step: 242930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:08,883-Speed 9363.52 samples/sec Loss 4.5026 LearningRate 0.0074 Epoch: 14 Global Step: 242940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:09,976-Speed 9374.37 samples/sec Loss 4.4852 LearningRate 0.0074 Epoch: 14 Global Step: 242950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:11,049-Speed 9548.87 samples/sec Loss 4.4503 LearningRate 0.0074 Epoch: 14 Global Step: 242960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:12,115-Speed 9608.55 samples/sec Loss 4.4266 LearningRate 0.0074 Epoch: 14 Global Step: 242970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:13,191-Speed 9524.70 samples/sec Loss 4.3656 LearningRate 0.0074 Epoch: 14 Global Step: 242980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:14,275-Speed 9456.55 samples/sec Loss 4.4185 LearningRate 0.0074 Epoch: 14 Global Step: 242990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:15,377-Speed 9293.96 samples/sec Loss 4.3651 LearningRate 0.0074 Epoch: 14 Global Step: 243000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:16,476-Speed 9327.27 samples/sec Loss 4.3752 LearningRate 0.0074 Epoch: 14 Global Step: 243010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:17,543-Speed 9601.24 samples/sec Loss 4.3763 LearningRate 0.0074 Epoch: 14 Global Step: 243020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:18,635-Speed 9383.72 samples/sec Loss 4.3748 LearningRate 0.0074 Epoch: 14 Global Step: 243030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:19,737-Speed 9290.91 samples/sec Loss 4.4254 LearningRate 0.0074 Epoch: 14 Global Step: 243040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:20,787-Speed 9761.59 samples/sec Loss 4.4184 LearningRate 0.0074 Epoch: 14 Global Step: 243050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:21,897-Speed 9236.31 samples/sec Loss 4.4663 LearningRate 0.0074 Epoch: 14 Global Step: 243060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:22,968-Speed 9565.04 samples/sec Loss 4.4603 LearningRate 0.0074 Epoch: 14 Global Step: 243070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:24,086-Speed 9166.62 samples/sec Loss 4.4685 LearningRate 0.0074 Epoch: 14 Global Step: 243080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:25,157-Speed 9571.47 samples/sec Loss 4.4015 LearningRate 0.0074 Epoch: 14 Global Step: 243090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:26,239-Speed 9461.62 samples/sec Loss 4.4842 LearningRate 0.0074 Epoch: 14 Global Step: 243100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:27,336-Speed 9339.75 samples/sec Loss 4.5097 LearningRate 0.0074 Epoch: 14 Global Step: 243110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:28,446-Speed 9232.27 samples/sec Loss 4.5176 LearningRate 0.0074 Epoch: 14 Global Step: 243120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:29,519-Speed 9548.00 samples/sec Loss 4.4913 LearningRate 0.0074 Epoch: 14 Global Step: 243130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:30,597-Speed 9501.14 samples/sec Loss 4.3968 LearningRate 0.0074 Epoch: 14 Global Step: 243140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:31,673-Speed 9522.40 samples/sec Loss 4.4844 LearningRate 0.0074 Epoch: 14 Global Step: 243150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:32,739-Speed 9613.66 samples/sec Loss 4.3958 LearningRate 0.0074 Epoch: 14 Global Step: 243160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:33,827-Speed 9417.25 samples/sec Loss 4.4605 LearningRate 0.0074 Epoch: 14 Global Step: 243170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:34,894-Speed 9606.77 samples/sec Loss 4.4616 LearningRate 0.0074 Epoch: 14 Global Step: 243180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:36,022-Speed 9084.86 samples/sec Loss 4.4263 LearningRate 0.0074 Epoch: 14 Global Step: 243190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:37,104-Speed 9470.73 samples/sec Loss 4.4514 LearningRate 0.0074 Epoch: 14 Global Step: 243200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:38,193-Speed 9406.34 samples/sec Loss 4.4744 LearningRate 0.0074 Epoch: 14 Global Step: 243210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:39,287-Speed 9367.51 samples/sec Loss 4.5465 LearningRate 0.0074 Epoch: 14 Global Step: 243220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:40,370-Speed 9459.78 samples/sec Loss 4.5047 LearningRate 0.0074 Epoch: 14 Global Step: 243230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:41,444-Speed 9545.53 samples/sec Loss 4.4398 LearningRate 0.0074 Epoch: 14 Global Step: 243240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:42,533-Speed 9405.84 samples/sec Loss 4.4444 LearningRate 0.0074 Epoch: 14 Global Step: 243250 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 21:21:43,612-Speed 9499.19 samples/sec Loss 4.5367 LearningRate 0.0074 Epoch: 14 Global Step: 243260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:44,681-Speed 9587.59 samples/sec Loss 4.4531 LearningRate 0.0074 Epoch: 14 Global Step: 243270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:45,734-Speed 9724.05 samples/sec Loss 4.5452 LearningRate 0.0074 Epoch: 14 Global Step: 243280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:46,786-Speed 9739.92 samples/sec Loss 4.3546 LearningRate 0.0074 Epoch: 14 Global Step: 243290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:47,894-Speed 9247.66 samples/sec Loss 4.3800 LearningRate 0.0074 Epoch: 14 Global Step: 243300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:48,988-Speed 9368.45 samples/sec Loss 4.4560 LearningRate 0.0074 Epoch: 14 Global Step: 243310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:50,062-Speed 9537.16 samples/sec Loss 4.4641 LearningRate 0.0073 Epoch: 14 Global Step: 243320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:51,132-Speed 9583.83 samples/sec Loss 4.4383 LearningRate 0.0073 Epoch: 14 Global Step: 243330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:52,237-Speed 9269.09 samples/sec Loss 4.4887 LearningRate 0.0073 Epoch: 14 Global Step: 243340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:53,305-Speed 9595.93 samples/sec Loss 4.4298 LearningRate 0.0073 Epoch: 14 Global Step: 243350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:54,387-Speed 9462.41 samples/sec Loss 4.3160 LearningRate 0.0073 Epoch: 14 Global Step: 243360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:55,429-Speed 9838.01 samples/sec Loss 4.3973 LearningRate 0.0073 Epoch: 14 Global Step: 243370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:56,518-Speed 9404.85 samples/sec Loss 4.4060 LearningRate 0.0073 Epoch: 14 Global Step: 243380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:57,599-Speed 9480.83 samples/sec Loss 4.3765 LearningRate 0.0073 Epoch: 14 Global Step: 243390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:21:58,647-Speed 9772.73 samples/sec Loss 4.4108 LearningRate 0.0073 Epoch: 14 Global Step: 243400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:21:59,718-Speed 9569.12 samples/sec Loss 4.4370 LearningRate 0.0073 Epoch: 14 Global Step: 243410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:00,818-Speed 9317.75 samples/sec Loss 4.3962 LearningRate 0.0073 Epoch: 14 Global Step: 243420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:01,906-Speed 9413.16 samples/sec Loss 4.5225 LearningRate 0.0073 Epoch: 14 Global Step: 243430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:03,011-Speed 9273.38 samples/sec Loss 4.4058 LearningRate 0.0073 Epoch: 14 Global Step: 243440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:04,118-Speed 9258.23 samples/sec Loss 4.4651 LearningRate 0.0073 Epoch: 14 Global Step: 243450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:05,228-Speed 9233.97 samples/sec Loss 4.3890 LearningRate 0.0073 Epoch: 14 Global Step: 243460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:06,308-Speed 9487.38 samples/sec Loss 4.4489 LearningRate 0.0073 Epoch: 14 Global Step: 243470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:07,374-Speed 9607.87 samples/sec Loss 4.4188 LearningRate 0.0073 Epoch: 14 Global Step: 243480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:08,440-Speed 9615.53 samples/sec Loss 4.5176 LearningRate 0.0073 Epoch: 14 Global Step: 243490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:09,550-Speed 9230.66 samples/sec Loss 4.5352 LearningRate 0.0073 Epoch: 14 Global Step: 243500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:10,660-Speed 9228.78 samples/sec Loss 4.4442 LearningRate 0.0073 Epoch: 14 Global Step: 243510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:11,792-Speed 9046.02 samples/sec Loss 4.4096 LearningRate 0.0073 Epoch: 14 Global Step: 243520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:12,897-Speed 9278.27 samples/sec Loss 4.4478 LearningRate 0.0073 Epoch: 14 Global Step: 243530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:14,061-Speed 8798.37 samples/sec Loss 4.4850 LearningRate 0.0073 Epoch: 14 Global Step: 243540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:15,144-Speed 9473.03 samples/sec Loss 4.4484 LearningRate 0.0073 Epoch: 14 Global Step: 243550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:16,194-Speed 9756.34 samples/sec Loss 4.5040 LearningRate 0.0073 Epoch: 14 Global Step: 243560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:17,276-Speed 9466.72 samples/sec Loss 4.5364 LearningRate 0.0073 Epoch: 14 Global Step: 243570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:18,364-Speed 9420.73 samples/sec Loss 4.4567 LearningRate 0.0073 Epoch: 14 Global Step: 243580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:19,432-Speed 9593.24 samples/sec Loss 4.4446 LearningRate 0.0073 Epoch: 14 Global Step: 243590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:20,492-Speed 9661.93 samples/sec Loss 4.3916 LearningRate 0.0073 Epoch: 14 Global Step: 243600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:21,560-Speed 9598.90 samples/sec Loss 4.4877 LearningRate 0.0073 Epoch: 14 Global Step: 243610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:22,647-Speed 9427.79 samples/sec Loss 4.4909 LearningRate 0.0073 Epoch: 14 Global Step: 243620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:23,744-Speed 9336.35 samples/sec Loss 4.4024 LearningRate 0.0073 Epoch: 14 Global Step: 243630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:24,812-Speed 9588.45 samples/sec Loss 4.5237 LearningRate 0.0073 Epoch: 14 Global Step: 243640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:25,885-Speed 9549.60 samples/sec Loss 4.4134 LearningRate 0.0073 Epoch: 14 Global Step: 243650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:27,004-Speed 9161.44 samples/sec Loss 4.4148 LearningRate 0.0073 Epoch: 14 Global Step: 243660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:28,146-Speed 8966.15 samples/sec Loss 4.4295 LearningRate 0.0073 Epoch: 14 Global Step: 243670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:29,188-Speed 9836.66 samples/sec Loss 4.4100 LearningRate 0.0073 Epoch: 14 Global Step: 243680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:30,248-Speed 9668.92 samples/sec Loss 4.3552 LearningRate 0.0073 Epoch: 14 Global Step: 243690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:31,313-Speed 9617.65 samples/sec Loss 4.4527 LearningRate 0.0073 Epoch: 14 Global Step: 243700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:32,423-Speed 9234.87 samples/sec Loss 4.4721 LearningRate 0.0073 Epoch: 14 Global Step: 243710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:33,517-Speed 9360.29 samples/sec Loss 4.4637 LearningRate 0.0073 Epoch: 14 Global Step: 243720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:34,640-Speed 9129.24 samples/sec Loss 4.4801 LearningRate 0.0073 Epoch: 14 Global Step: 243730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:35,708-Speed 9594.76 samples/sec Loss 4.4873 LearningRate 0.0073 Epoch: 14 Global Step: 243740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:36,821-Speed 9200.03 samples/sec Loss 4.4843 LearningRate 0.0073 Epoch: 14 Global Step: 243750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:37,917-Speed 9350.08 samples/sec Loss 4.4219 LearningRate 0.0073 Epoch: 14 Global Step: 243760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:39,018-Speed 9307.04 samples/sec Loss 4.4553 LearningRate 0.0073 Epoch: 14 Global Step: 243770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:40,078-Speed 9664.63 samples/sec Loss 4.4566 LearningRate 0.0073 Epoch: 14 Global Step: 243780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:41,171-Speed 9371.34 samples/sec Loss 4.3756 LearningRate 0.0073 Epoch: 14 Global Step: 243790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:42,253-Speed 9471.46 samples/sec Loss 4.6206 LearningRate 0.0073 Epoch: 14 Global Step: 243800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:43,348-Speed 9450.34 samples/sec Loss 4.3953 LearningRate 0.0073 Epoch: 14 Global Step: 243810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:44,421-Speed 9545.98 samples/sec Loss 4.5068 LearningRate 0.0073 Epoch: 14 Global Step: 243820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:45,544-Speed 9126.73 samples/sec Loss 4.4576 LearningRate 0.0073 Epoch: 14 Global Step: 243830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:46,604-Speed 9668.94 samples/sec Loss 4.4532 LearningRate 0.0073 Epoch: 14 Global Step: 243840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:47,714-Speed 9228.53 samples/sec Loss 4.3757 LearningRate 0.0073 Epoch: 14 Global Step: 243850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:48,847-Speed 9047.02 samples/sec Loss 4.5126 LearningRate 0.0073 Epoch: 14 Global Step: 243860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:49,967-Speed 9142.45 samples/sec Loss 4.5194 LearningRate 0.0073 Epoch: 14 Global Step: 243870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:51,018-Speed 9753.93 samples/sec Loss 4.3375 LearningRate 0.0073 Epoch: 14 Global Step: 243880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:52,076-Speed 9688.40 samples/sec Loss 4.3649 LearningRate 0.0073 Epoch: 14 Global Step: 243890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:53,116-Speed 9849.31 samples/sec Loss 4.5461 LearningRate 0.0073 Epoch: 14 Global Step: 243900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:54,166-Speed 9759.08 samples/sec Loss 4.4234 LearningRate 0.0073 Epoch: 14 Global Step: 243910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:55,260-Speed 9365.89 samples/sec Loss 4.5047 LearningRate 0.0073 Epoch: 14 Global Step: 243920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:56,339-Speed 9496.54 samples/sec Loss 4.3921 LearningRate 0.0073 Epoch: 14 Global Step: 243930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:22:57,420-Speed 9477.47 samples/sec Loss 4.4024 LearningRate 0.0072 Epoch: 14 Global Step: 243940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:58,470-Speed 9760.18 samples/sec Loss 4.3415 LearningRate 0.0072 Epoch: 14 Global Step: 243950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:22:59,521-Speed 9750.35 samples/sec Loss 4.4732 LearningRate 0.0072 Epoch: 14 Global Step: 243960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:23:00,564-Speed 9823.64 samples/sec Loss 4.4858 LearningRate 0.0072 Epoch: 14 Global Step: 243970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:23:01,628-Speed 9625.64 samples/sec Loss 4.5101 LearningRate 0.0072 Epoch: 14 Global Step: 243980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:23:02,682-Speed 9717.34 samples/sec Loss 4.4658 LearningRate 0.0072 Epoch: 14 Global Step: 243990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:23:03,761-Speed 9496.02 samples/sec Loss 4.4451 LearningRate 0.0072 Epoch: 14 Global Step: 244000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:23:25,835-[lfw][244000]XNorm: 7.556710 Training: 2022-04-11 21:23:25,836-[lfw][244000]Accuracy-Flip: 0.99650+-0.00252 Training: 2022-04-11 21:23:25,836-[lfw][244000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:23:51,317-[cfp_fp][244000]XNorm: 6.529520 Training: 2022-04-11 21:23:51,318-[cfp_fp][244000]Accuracy-Flip: 0.96971+-0.00878 Training: 2022-04-11 21:23:51,318-[cfp_fp][244000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:24:13,359-[agedb_30][244000]XNorm: 7.335563 Training: 2022-04-11 21:24:13,359-[agedb_30][244000]Accuracy-Flip: 0.97200+-0.00806 Training: 2022-04-11 21:24:13,360-[agedb_30][244000]Accuracy-Highest: 0.97250 Training: 2022-04-11 21:24:14,446-Speed 144.87 samples/sec Loss 4.5110 LearningRate 0.0072 Epoch: 14 Global Step: 244010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:15,511-Speed 9625.03 samples/sec Loss 4.4725 LearningRate 0.0072 Epoch: 14 Global Step: 244020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:16,617-Speed 9266.86 samples/sec Loss 4.4516 LearningRate 0.0072 Epoch: 14 Global Step: 244030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:17,698-Speed 9477.89 samples/sec Loss 4.3428 LearningRate 0.0072 Epoch: 14 Global Step: 244040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:18,773-Speed 9536.69 samples/sec Loss 4.4094 LearningRate 0.0072 Epoch: 14 Global Step: 244050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:19,871-Speed 9329.61 samples/sec Loss 4.4553 LearningRate 0.0072 Epoch: 14 Global Step: 244060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:20,955-Speed 9450.77 samples/sec Loss 4.4248 LearningRate 0.0072 Epoch: 14 Global Step: 244070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:22,047-Speed 9386.03 samples/sec Loss 4.4950 LearningRate 0.0072 Epoch: 14 Global Step: 244080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:23,171-Speed 9119.47 samples/sec Loss 4.3912 LearningRate 0.0072 Epoch: 14 Global Step: 244090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:24,251-Speed 9481.35 samples/sec Loss 4.5278 LearningRate 0.0072 Epoch: 14 Global Step: 244100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:25,345-Speed 9371.16 samples/sec Loss 4.4685 LearningRate 0.0072 Epoch: 14 Global Step: 244110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:26,406-Speed 9651.92 samples/sec Loss 4.3920 LearningRate 0.0072 Epoch: 14 Global Step: 244120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:27,515-Speed 9236.76 samples/sec Loss 4.4605 LearningRate 0.0072 Epoch: 14 Global Step: 244130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:28,636-Speed 9143.67 samples/sec Loss 4.5644 LearningRate 0.0072 Epoch: 14 Global Step: 244140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:29,740-Speed 9282.04 samples/sec Loss 4.4802 LearningRate 0.0072 Epoch: 14 Global Step: 244150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:30,827-Speed 9420.60 samples/sec Loss 4.4181 LearningRate 0.0072 Epoch: 14 Global Step: 244160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:31,907-Speed 9491.06 samples/sec Loss 4.3918 LearningRate 0.0072 Epoch: 14 Global Step: 244170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:32,992-Speed 9439.54 samples/sec Loss 4.4951 LearningRate 0.0072 Epoch: 14 Global Step: 244180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:34,068-Speed 9523.74 samples/sec Loss 4.4518 LearningRate 0.0072 Epoch: 14 Global Step: 244190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:35,183-Speed 9190.31 samples/sec Loss 4.5082 LearningRate 0.0072 Epoch: 14 Global Step: 244200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:36,271-Speed 9419.04 samples/sec Loss 4.5365 LearningRate 0.0072 Epoch: 14 Global Step: 244210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:37,343-Speed 9561.42 samples/sec Loss 4.4208 LearningRate 0.0072 Epoch: 14 Global Step: 244220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:38,421-Speed 9509.95 samples/sec Loss 4.5336 LearningRate 0.0072 Epoch: 14 Global Step: 244230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:39,511-Speed 9395.95 samples/sec Loss 4.5129 LearningRate 0.0072 Epoch: 14 Global Step: 244240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:40,615-Speed 9280.15 samples/sec Loss 4.3692 LearningRate 0.0072 Epoch: 14 Global Step: 244250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:41,681-Speed 9612.79 samples/sec Loss 4.4056 LearningRate 0.0072 Epoch: 14 Global Step: 244260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:42,755-Speed 9540.36 samples/sec Loss 4.3459 LearningRate 0.0072 Epoch: 14 Global Step: 244270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:43,856-Speed 9308.27 samples/sec Loss 4.4329 LearningRate 0.0072 Epoch: 14 Global Step: 244280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:44,909-Speed 9738.60 samples/sec Loss 4.4390 LearningRate 0.0072 Epoch: 14 Global Step: 244290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:45,974-Speed 9620.06 samples/sec Loss 4.4634 LearningRate 0.0072 Epoch: 14 Global Step: 244300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:47,042-Speed 9590.89 samples/sec Loss 4.4176 LearningRate 0.0072 Epoch: 14 Global Step: 244310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:48,114-Speed 9557.82 samples/sec Loss 4.4771 LearningRate 0.0072 Epoch: 14 Global Step: 244320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:49,218-Speed 9284.73 samples/sec Loss 4.4584 LearningRate 0.0072 Epoch: 14 Global Step: 244330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:24:50,320-Speed 9295.90 samples/sec Loss 4.4447 LearningRate 0.0072 Epoch: 14 Global Step: 244340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:51,418-Speed 9332.27 samples/sec Loss 4.5149 LearningRate 0.0072 Epoch: 14 Global Step: 244350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:52,506-Speed 9414.48 samples/sec Loss 4.3766 LearningRate 0.0072 Epoch: 14 Global Step: 244360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:53,598-Speed 9382.88 samples/sec Loss 4.4968 LearningRate 0.0072 Epoch: 14 Global Step: 244370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:54,636-Speed 9870.81 samples/sec Loss 4.5259 LearningRate 0.0072 Epoch: 14 Global Step: 244380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:55,700-Speed 9631.62 samples/sec Loss 4.4680 LearningRate 0.0072 Epoch: 14 Global Step: 244390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:56,773-Speed 9556.39 samples/sec Loss 4.4425 LearningRate 0.0072 Epoch: 14 Global Step: 244400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:57,885-Speed 9210.25 samples/sec Loss 4.4330 LearningRate 0.0072 Epoch: 14 Global Step: 244410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:24:58,947-Speed 9649.65 samples/sec Loss 4.5018 LearningRate 0.0072 Epoch: 14 Global Step: 244420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:00,046-Speed 9321.82 samples/sec Loss 4.4699 LearningRate 0.0072 Epoch: 14 Global Step: 244430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:01,104-Speed 9685.55 samples/sec Loss 4.4686 LearningRate 0.0072 Epoch: 14 Global Step: 244440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:02,181-Speed 9510.34 samples/sec Loss 4.4491 LearningRate 0.0072 Epoch: 14 Global Step: 244450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:03,266-Speed 9437.30 samples/sec Loss 4.5450 LearningRate 0.0072 Epoch: 14 Global Step: 244460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:04,335-Speed 9592.53 samples/sec Loss 4.4423 LearningRate 0.0072 Epoch: 14 Global Step: 244470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:05,423-Speed 9412.43 samples/sec Loss 4.4959 LearningRate 0.0072 Epoch: 14 Global Step: 244480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:06,537-Speed 9204.61 samples/sec Loss 4.3885 LearningRate 0.0072 Epoch: 14 Global Step: 244490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:07,638-Speed 9298.43 samples/sec Loss 4.4534 LearningRate 0.0072 Epoch: 14 Global Step: 244500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:08,678-Speed 9859.65 samples/sec Loss 4.4593 LearningRate 0.0072 Epoch: 14 Global Step: 244510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:09,726-Speed 9771.11 samples/sec Loss 4.4258 LearningRate 0.0072 Epoch: 14 Global Step: 244520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:10,834-Speed 9246.25 samples/sec Loss 4.4136 LearningRate 0.0072 Epoch: 14 Global Step: 244530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:11,911-Speed 9514.00 samples/sec Loss 4.5405 LearningRate 0.0072 Epoch: 14 Global Step: 244540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:13,018-Speed 9255.57 samples/sec Loss 4.4798 LearningRate 0.0072 Epoch: 14 Global Step: 244550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:14,096-Speed 9499.76 samples/sec Loss 4.3626 LearningRate 0.0071 Epoch: 14 Global Step: 244560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:15,143-Speed 9791.50 samples/sec Loss 4.4664 LearningRate 0.0071 Epoch: 14 Global Step: 244570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:16,205-Speed 9657.81 samples/sec Loss 4.4605 LearningRate 0.0071 Epoch: 14 Global Step: 244580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:17,316-Speed 9215.03 samples/sec Loss 4.4837 LearningRate 0.0071 Epoch: 14 Global Step: 244590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:18,403-Speed 9428.45 samples/sec Loss 4.4446 LearningRate 0.0071 Epoch: 14 Global Step: 244600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:19,485-Speed 9468.75 samples/sec Loss 4.3584 LearningRate 0.0071 Epoch: 14 Global Step: 244610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:20,549-Speed 9633.57 samples/sec Loss 4.4461 LearningRate 0.0071 Epoch: 14 Global Step: 244620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:21,587-Speed 9875.34 samples/sec Loss 4.4486 LearningRate 0.0071 Epoch: 14 Global Step: 244630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:22,636-Speed 9763.84 samples/sec Loss 4.4834 LearningRate 0.0071 Epoch: 14 Global Step: 244640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:23,685-Speed 9763.14 samples/sec Loss 4.3021 LearningRate 0.0071 Epoch: 14 Global Step: 244650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:24,780-Speed 9358.83 samples/sec Loss 4.4329 LearningRate 0.0071 Epoch: 14 Global Step: 244660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:25,847-Speed 9601.93 samples/sec Loss 4.4281 LearningRate 0.0071 Epoch: 14 Global Step: 244670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:26,960-Speed 9203.87 samples/sec Loss 4.3277 LearningRate 0.0071 Epoch: 14 Global Step: 244680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:28,082-Speed 9130.46 samples/sec Loss 4.5035 LearningRate 0.0071 Epoch: 14 Global Step: 244690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:29,135-Speed 9736.30 samples/sec Loss 4.5148 LearningRate 0.0071 Epoch: 14 Global Step: 244700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:30,205-Speed 9572.28 samples/sec Loss 4.4934 LearningRate 0.0071 Epoch: 14 Global Step: 244710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:31,240-Speed 9902.51 samples/sec Loss 4.4371 LearningRate 0.0071 Epoch: 14 Global Step: 244720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:32,293-Speed 9729.88 samples/sec Loss 4.4750 LearningRate 0.0071 Epoch: 14 Global Step: 244730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:33,384-Speed 9393.03 samples/sec Loss 4.3941 LearningRate 0.0071 Epoch: 14 Global Step: 244740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:34,479-Speed 9360.34 samples/sec Loss 4.4632 LearningRate 0.0071 Epoch: 14 Global Step: 244750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:35,526-Speed 9781.80 samples/sec Loss 4.3836 LearningRate 0.0071 Epoch: 14 Global Step: 244760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:36,550-Speed 10005.66 samples/sec Loss 4.4774 LearningRate 0.0071 Epoch: 14 Global Step: 244770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:37,597-Speed 9785.33 samples/sec Loss 4.4511 LearningRate 0.0071 Epoch: 14 Global Step: 244780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:38,681-Speed 9454.96 samples/sec Loss 4.4887 LearningRate 0.0071 Epoch: 14 Global Step: 244790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:39,760-Speed 9491.45 samples/sec Loss 4.4517 LearningRate 0.0071 Epoch: 14 Global Step: 244800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:40,821-Speed 9660.71 samples/sec Loss 4.4865 LearningRate 0.0071 Epoch: 14 Global Step: 244810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:41,869-Speed 9779.06 samples/sec Loss 4.3349 LearningRate 0.0071 Epoch: 14 Global Step: 244820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:42,982-Speed 9198.91 samples/sec Loss 4.5141 LearningRate 0.0071 Epoch: 14 Global Step: 244830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:44,046-Speed 9631.29 samples/sec Loss 4.5150 LearningRate 0.0071 Epoch: 14 Global Step: 244840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:45,117-Speed 9574.72 samples/sec Loss 4.4660 LearningRate 0.0071 Epoch: 14 Global Step: 244850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:46,159-Speed 9832.62 samples/sec Loss 4.4404 LearningRate 0.0071 Epoch: 14 Global Step: 244860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:47,242-Speed 9463.57 samples/sec Loss 4.4043 LearningRate 0.0071 Epoch: 14 Global Step: 244870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:48,295-Speed 9726.01 samples/sec Loss 4.4654 LearningRate 0.0071 Epoch: 14 Global Step: 244880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:49,394-Speed 9326.64 samples/sec Loss 4.4642 LearningRate 0.0071 Epoch: 14 Global Step: 244890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:50,441-Speed 9783.10 samples/sec Loss 4.5490 LearningRate 0.0071 Epoch: 14 Global Step: 244900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:51,547-Speed 9265.29 samples/sec Loss 4.5813 LearningRate 0.0071 Epoch: 14 Global Step: 244910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:52,652-Speed 9267.56 samples/sec Loss 4.4612 LearningRate 0.0071 Epoch: 14 Global Step: 244920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:53,715-Speed 9643.17 samples/sec Loss 4.4512 LearningRate 0.0071 Epoch: 14 Global Step: 244930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:54,780-Speed 9624.15 samples/sec Loss 4.4484 LearningRate 0.0071 Epoch: 14 Global Step: 244940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:55,867-Speed 9419.16 samples/sec Loss 4.4552 LearningRate 0.0071 Epoch: 14 Global Step: 244950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:25:56,962-Speed 9359.14 samples/sec Loss 4.5471 LearningRate 0.0071 Epoch: 14 Global Step: 244960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:58,050-Speed 9422.83 samples/sec Loss 4.3149 LearningRate 0.0071 Epoch: 14 Global Step: 244970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:25:59,135-Speed 9435.63 samples/sec Loss 4.4013 LearningRate 0.0071 Epoch: 14 Global Step: 244980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:00,215-Speed 9489.85 samples/sec Loss 4.4628 LearningRate 0.0071 Epoch: 14 Global Step: 244990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:01,260-Speed 9806.32 samples/sec Loss 4.5002 LearningRate 0.0071 Epoch: 14 Global Step: 245000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:02,410-Speed 8913.53 samples/sec Loss 4.5134 LearningRate 0.0071 Epoch: 14 Global Step: 245010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:03,474-Speed 9628.89 samples/sec Loss 4.5196 LearningRate 0.0071 Epoch: 14 Global Step: 245020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:04,540-Speed 9617.91 samples/sec Loss 4.4375 LearningRate 0.0071 Epoch: 14 Global Step: 245030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:05,638-Speed 9331.94 samples/sec Loss 4.4684 LearningRate 0.0071 Epoch: 14 Global Step: 245040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:06,732-Speed 9362.61 samples/sec Loss 4.4216 LearningRate 0.0071 Epoch: 14 Global Step: 245050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:07,850-Speed 9164.06 samples/sec Loss 4.4520 LearningRate 0.0071 Epoch: 14 Global Step: 245060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:08,933-Speed 9465.39 samples/sec Loss 4.3701 LearningRate 0.0071 Epoch: 14 Global Step: 245070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:09,991-Speed 9682.67 samples/sec Loss 4.4222 LearningRate 0.0071 Epoch: 14 Global Step: 245080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:11,089-Speed 9327.57 samples/sec Loss 4.4561 LearningRate 0.0071 Epoch: 14 Global Step: 245090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:12,145-Speed 9703.43 samples/sec Loss 4.4480 LearningRate 0.0071 Epoch: 14 Global Step: 245100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:13,235-Speed 9403.43 samples/sec Loss 4.5126 LearningRate 0.0071 Epoch: 14 Global Step: 245110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:14,276-Speed 9835.51 samples/sec Loss 4.4377 LearningRate 0.0071 Epoch: 14 Global Step: 245120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:15,385-Speed 9247.85 samples/sec Loss 4.4371 LearningRate 0.0071 Epoch: 14 Global Step: 245130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:16,488-Speed 9285.83 samples/sec Loss 4.4952 LearningRate 0.0071 Epoch: 14 Global Step: 245140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:17,584-Speed 9346.42 samples/sec Loss 4.5583 LearningRate 0.0071 Epoch: 14 Global Step: 245150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:18,712-Speed 9082.95 samples/sec Loss 4.5449 LearningRate 0.0071 Epoch: 14 Global Step: 245160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:19,791-Speed 9495.91 samples/sec Loss 4.3942 LearningRate 0.0071 Epoch: 14 Global Step: 245170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:20,875-Speed 9457.85 samples/sec Loss 4.5140 LearningRate 0.0071 Epoch: 14 Global Step: 245180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:21,930-Speed 9708.64 samples/sec Loss 4.4029 LearningRate 0.0070 Epoch: 14 Global Step: 245190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:22,983-Speed 9734.37 samples/sec Loss 4.5208 LearningRate 0.0070 Epoch: 14 Global Step: 245200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:24,061-Speed 9506.98 samples/sec Loss 4.3742 LearningRate 0.0070 Epoch: 14 Global Step: 245210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:25,134-Speed 9543.73 samples/sec Loss 4.4377 LearningRate 0.0070 Epoch: 14 Global Step: 245220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:26,213-Speed 9498.05 samples/sec Loss 4.5665 LearningRate 0.0070 Epoch: 14 Global Step: 245230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:27,299-Speed 9438.28 samples/sec Loss 4.4014 LearningRate 0.0070 Epoch: 14 Global Step: 245240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:28,387-Speed 9421.24 samples/sec Loss 4.4926 LearningRate 0.0070 Epoch: 14 Global Step: 245250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:29,495-Speed 9242.10 samples/sec Loss 4.4651 LearningRate 0.0070 Epoch: 14 Global Step: 245260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:30,559-Speed 9627.81 samples/sec Loss 4.4565 LearningRate 0.0070 Epoch: 14 Global Step: 245270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:31,635-Speed 9524.67 samples/sec Loss 4.4508 LearningRate 0.0070 Epoch: 14 Global Step: 245280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:32,708-Speed 9543.47 samples/sec Loss 4.4093 LearningRate 0.0070 Epoch: 14 Global Step: 245290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:33,835-Speed 9091.06 samples/sec Loss 4.4159 LearningRate 0.0070 Epoch: 14 Global Step: 245300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:34,926-Speed 9390.55 samples/sec Loss 4.4324 LearningRate 0.0070 Epoch: 14 Global Step: 245310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:35,995-Speed 9588.08 samples/sec Loss 4.4220 LearningRate 0.0070 Epoch: 14 Global Step: 245320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:37,085-Speed 9404.01 samples/sec Loss 4.5010 LearningRate 0.0070 Epoch: 14 Global Step: 245330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:38,164-Speed 9495.26 samples/sec Loss 4.4691 LearningRate 0.0070 Epoch: 14 Global Step: 245340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:39,241-Speed 9511.89 samples/sec Loss 4.4862 LearningRate 0.0070 Epoch: 14 Global Step: 245350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:40,294-Speed 9739.63 samples/sec Loss 4.4621 LearningRate 0.0070 Epoch: 14 Global Step: 245360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:41,381-Speed 9424.62 samples/sec Loss 4.3955 LearningRate 0.0070 Epoch: 14 Global Step: 245370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:42,497-Speed 9178.77 samples/sec Loss 4.5362 LearningRate 0.0070 Epoch: 14 Global Step: 245380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:43,625-Speed 9084.89 samples/sec Loss 4.4757 LearningRate 0.0070 Epoch: 14 Global Step: 245390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:44,724-Speed 9322.68 samples/sec Loss 4.5167 LearningRate 0.0070 Epoch: 14 Global Step: 245400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:45,753-Speed 9955.94 samples/sec Loss 4.4843 LearningRate 0.0070 Epoch: 14 Global Step: 245410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:46,805-Speed 9738.42 samples/sec Loss 4.4703 LearningRate 0.0070 Epoch: 14 Global Step: 245420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:47,872-Speed 9602.23 samples/sec Loss 4.4697 LearningRate 0.0070 Epoch: 14 Global Step: 245430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:48,921-Speed 9762.06 samples/sec Loss 4.4906 LearningRate 0.0070 Epoch: 14 Global Step: 245440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:50,013-Speed 9386.83 samples/sec Loss 4.4403 LearningRate 0.0070 Epoch: 14 Global Step: 245450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:51,102-Speed 9409.83 samples/sec Loss 4.3939 LearningRate 0.0070 Epoch: 14 Global Step: 245460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:52,239-Speed 9013.88 samples/sec Loss 4.4282 LearningRate 0.0070 Epoch: 14 Global Step: 245470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:53,386-Speed 8927.56 samples/sec Loss 4.5276 LearningRate 0.0070 Epoch: 14 Global Step: 245480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:54,454-Speed 9596.30 samples/sec Loss 4.4557 LearningRate 0.0070 Epoch: 14 Global Step: 245490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:55,526-Speed 9562.64 samples/sec Loss 4.4253 LearningRate 0.0070 Epoch: 14 Global Step: 245500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:56,653-Speed 9093.65 samples/sec Loss 4.4456 LearningRate 0.0070 Epoch: 14 Global Step: 245510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:26:57,698-Speed 9805.27 samples/sec Loss 4.4817 LearningRate 0.0070 Epoch: 14 Global Step: 245520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:58,764-Speed 9610.52 samples/sec Loss 4.4725 LearningRate 0.0070 Epoch: 14 Global Step: 245530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:26:59,815-Speed 9748.84 samples/sec Loss 4.4279 LearningRate 0.0070 Epoch: 14 Global Step: 245540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:00,885-Speed 9580.87 samples/sec Loss 4.5060 LearningRate 0.0070 Epoch: 14 Global Step: 245550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:01,983-Speed 9325.07 samples/sec Loss 4.4345 LearningRate 0.0070 Epoch: 14 Global Step: 245560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:03,042-Speed 9673.74 samples/sec Loss 4.4726 LearningRate 0.0070 Epoch: 14 Global Step: 245570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:04,148-Speed 9266.02 samples/sec Loss 4.4512 LearningRate 0.0070 Epoch: 14 Global Step: 245580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:05,225-Speed 9517.38 samples/sec Loss 4.4222 LearningRate 0.0070 Epoch: 14 Global Step: 245590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:06,321-Speed 9347.31 samples/sec Loss 4.4418 LearningRate 0.0070 Epoch: 14 Global Step: 245600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:07,431-Speed 9230.23 samples/sec Loss 4.5671 LearningRate 0.0070 Epoch: 14 Global Step: 245610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:08,532-Speed 9302.55 samples/sec Loss 4.5095 LearningRate 0.0070 Epoch: 14 Global Step: 245620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:09,582-Speed 9757.95 samples/sec Loss 4.4652 LearningRate 0.0070 Epoch: 14 Global Step: 245630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:10,625-Speed 9834.61 samples/sec Loss 4.3613 LearningRate 0.0070 Epoch: 14 Global Step: 245640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:11,704-Speed 9504.10 samples/sec Loss 4.4977 LearningRate 0.0070 Epoch: 14 Global Step: 245650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:12,755-Speed 9749.06 samples/sec Loss 4.3626 LearningRate 0.0070 Epoch: 14 Global Step: 245660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:13,831-Speed 9522.76 samples/sec Loss 4.4168 LearningRate 0.0070 Epoch: 14 Global Step: 245670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:14,928-Speed 9343.13 samples/sec Loss 4.3454 LearningRate 0.0070 Epoch: 14 Global Step: 245680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:15,993-Speed 9618.42 samples/sec Loss 4.3904 LearningRate 0.0070 Epoch: 14 Global Step: 245690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:17,098-Speed 9273.82 samples/sec Loss 4.5075 LearningRate 0.0070 Epoch: 14 Global Step: 245700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:18,221-Speed 9118.28 samples/sec Loss 4.4621 LearningRate 0.0070 Epoch: 14 Global Step: 245710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:19,302-Speed 9485.96 samples/sec Loss 4.4194 LearningRate 0.0070 Epoch: 14 Global Step: 245720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:20,389-Speed 9427.69 samples/sec Loss 4.4041 LearningRate 0.0070 Epoch: 14 Global Step: 245730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:21,502-Speed 9200.91 samples/sec Loss 4.4144 LearningRate 0.0070 Epoch: 14 Global Step: 245740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:22,575-Speed 9548.06 samples/sec Loss 4.4441 LearningRate 0.0070 Epoch: 14 Global Step: 245750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:23,613-Speed 9875.46 samples/sec Loss 4.4898 LearningRate 0.0070 Epoch: 14 Global Step: 245760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:24,719-Speed 9261.15 samples/sec Loss 4.4492 LearningRate 0.0070 Epoch: 14 Global Step: 245770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:25,784-Speed 9619.00 samples/sec Loss 4.4604 LearningRate 0.0070 Epoch: 14 Global Step: 245780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:26,827-Speed 9830.06 samples/sec Loss 4.2958 LearningRate 0.0070 Epoch: 14 Global Step: 245790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:27,878-Speed 9747.99 samples/sec Loss 4.4157 LearningRate 0.0070 Epoch: 14 Global Step: 245800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:28,938-Speed 9664.11 samples/sec Loss 4.3863 LearningRate 0.0070 Epoch: 14 Global Step: 245810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:30,000-Speed 9651.52 samples/sec Loss 4.4425 LearningRate 0.0069 Epoch: 14 Global Step: 245820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:31,085-Speed 9448.31 samples/sec Loss 4.5262 LearningRate 0.0069 Epoch: 14 Global Step: 245830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:32,186-Speed 9304.02 samples/sec Loss 4.4484 LearningRate 0.0069 Epoch: 14 Global Step: 245840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:33,260-Speed 9537.41 samples/sec Loss 4.4378 LearningRate 0.0069 Epoch: 14 Global Step: 245850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:34,335-Speed 9533.49 samples/sec Loss 4.4471 LearningRate 0.0069 Epoch: 14 Global Step: 245860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:35,420-Speed 9440.90 samples/sec Loss 4.3500 LearningRate 0.0069 Epoch: 14 Global Step: 245870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:36,519-Speed 9323.68 samples/sec Loss 4.4713 LearningRate 0.0069 Epoch: 14 Global Step: 245880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:37,608-Speed 9408.20 samples/sec Loss 4.4522 LearningRate 0.0069 Epoch: 14 Global Step: 245890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:27:38,698-Speed 9395.59 samples/sec Loss 4.4633 LearningRate 0.0069 Epoch: 14 Global Step: 245900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:39,750-Speed 9744.37 samples/sec Loss 4.5197 LearningRate 0.0069 Epoch: 14 Global Step: 245910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:40,806-Speed 9704.29 samples/sec Loss 4.5112 LearningRate 0.0069 Epoch: 14 Global Step: 245920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:41,882-Speed 9522.27 samples/sec Loss 4.3923 LearningRate 0.0069 Epoch: 14 Global Step: 245930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:42,974-Speed 9378.16 samples/sec Loss 4.4881 LearningRate 0.0069 Epoch: 14 Global Step: 245940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:44,047-Speed 9547.77 samples/sec Loss 4.4791 LearningRate 0.0069 Epoch: 14 Global Step: 245950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:45,131-Speed 9453.30 samples/sec Loss 4.5167 LearningRate 0.0069 Epoch: 14 Global Step: 245960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:46,260-Speed 9072.27 samples/sec Loss 4.4952 LearningRate 0.0069 Epoch: 14 Global Step: 245970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:47,393-Speed 9050.20 samples/sec Loss 4.3798 LearningRate 0.0069 Epoch: 14 Global Step: 245980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:48,532-Speed 8992.58 samples/sec Loss 4.4837 LearningRate 0.0069 Epoch: 14 Global Step: 245990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:27:49,635-Speed 9291.96 samples/sec Loss 4.5163 LearningRate 0.0069 Epoch: 14 Global Step: 246000 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 21:28:11,569-[lfw][246000]XNorm: 7.421774 Training: 2022-04-11 21:28:11,570-[lfw][246000]Accuracy-Flip: 0.99583+-0.00239 Training: 2022-04-11 21:28:11,570-[lfw][246000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:28:36,844-[cfp_fp][246000]XNorm: 6.422978 Training: 2022-04-11 21:28:36,845-[cfp_fp][246000]Accuracy-Flip: 0.97114+-0.00774 Training: 2022-04-11 21:28:36,846-[cfp_fp][246000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:28:58,676-[agedb_30][246000]XNorm: 7.163093 Training: 2022-04-11 21:28:58,676-[agedb_30][246000]Accuracy-Flip: 0.97167+-0.00882 Training: 2022-04-11 21:28:58,677-[agedb_30][246000]Accuracy-Highest: 0.97250 Training: 2022-04-11 21:28:59,762-Speed 146.02 samples/sec Loss 4.4916 LearningRate 0.0069 Epoch: 14 Global Step: 246010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:00,870-Speed 9247.07 samples/sec Loss 4.3457 LearningRate 0.0069 Epoch: 14 Global Step: 246020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:01,904-Speed 9910.93 samples/sec Loss 4.3838 LearningRate 0.0069 Epoch: 14 Global Step: 246030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:02,962-Speed 9679.41 samples/sec Loss 4.4333 LearningRate 0.0069 Epoch: 14 Global Step: 246040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:04,045-Speed 9456.89 samples/sec Loss 4.4544 LearningRate 0.0069 Epoch: 14 Global Step: 246050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:05,101-Speed 9710.06 samples/sec Loss 4.5278 LearningRate 0.0069 Epoch: 14 Global Step: 246060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:06,202-Speed 9305.04 samples/sec Loss 4.3803 LearningRate 0.0069 Epoch: 14 Global Step: 246070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:07,309-Speed 9257.49 samples/sec Loss 4.3871 LearningRate 0.0069 Epoch: 14 Global Step: 246080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:08,395-Speed 9435.32 samples/sec Loss 4.4540 LearningRate 0.0069 Epoch: 14 Global Step: 246090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:09,469-Speed 9533.96 samples/sec Loss 4.3353 LearningRate 0.0069 Epoch: 14 Global Step: 246100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:10,550-Speed 9477.72 samples/sec Loss 4.4865 LearningRate 0.0069 Epoch: 14 Global Step: 246110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:11,632-Speed 9470.86 samples/sec Loss 4.4118 LearningRate 0.0069 Epoch: 14 Global Step: 246120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:12,713-Speed 9477.37 samples/sec Loss 4.4889 LearningRate 0.0069 Epoch: 14 Global Step: 246130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:13,779-Speed 9609.76 samples/sec Loss 4.4900 LearningRate 0.0069 Epoch: 14 Global Step: 246140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:14,878-Speed 9326.18 samples/sec Loss 4.4820 LearningRate 0.0069 Epoch: 14 Global Step: 246150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:15,964-Speed 9430.64 samples/sec Loss 4.4850 LearningRate 0.0069 Epoch: 14 Global Step: 246160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:17,014-Speed 9760.01 samples/sec Loss 4.3963 LearningRate 0.0069 Epoch: 14 Global Step: 246170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:18,109-Speed 9360.56 samples/sec Loss 4.4965 LearningRate 0.0069 Epoch: 14 Global Step: 246180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:19,251-Speed 8973.50 samples/sec Loss 4.5159 LearningRate 0.0069 Epoch: 14 Global Step: 246190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:20,303-Speed 9748.69 samples/sec Loss 4.4273 LearningRate 0.0069 Epoch: 14 Global Step: 246200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:21,372-Speed 9582.02 samples/sec Loss 4.2959 LearningRate 0.0069 Epoch: 14 Global Step: 246210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:22,481-Speed 9237.25 samples/sec Loss 4.3841 LearningRate 0.0069 Epoch: 14 Global Step: 246220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:23,534-Speed 9731.14 samples/sec Loss 4.5603 LearningRate 0.0069 Epoch: 14 Global Step: 246230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:24,586-Speed 9741.53 samples/sec Loss 4.4487 LearningRate 0.0069 Epoch: 14 Global Step: 246240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:25,676-Speed 9394.43 samples/sec Loss 4.4381 LearningRate 0.0069 Epoch: 14 Global Step: 246250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:26,747-Speed 9570.55 samples/sec Loss 4.4769 LearningRate 0.0069 Epoch: 14 Global Step: 246260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:27,824-Speed 9511.86 samples/sec Loss 4.5333 LearningRate 0.0069 Epoch: 14 Global Step: 246270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:28,930-Speed 9262.29 samples/sec Loss 4.4558 LearningRate 0.0069 Epoch: 14 Global Step: 246280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:30,026-Speed 9352.45 samples/sec Loss 4.5372 LearningRate 0.0069 Epoch: 14 Global Step: 246290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:31,149-Speed 9123.04 samples/sec Loss 4.3946 LearningRate 0.0069 Epoch: 14 Global Step: 246300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:32,201-Speed 9740.78 samples/sec Loss 4.4484 LearningRate 0.0069 Epoch: 14 Global Step: 246310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:33,281-Speed 9490.66 samples/sec Loss 4.4378 LearningRate 0.0069 Epoch: 14 Global Step: 246320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:34,334-Speed 9727.22 samples/sec Loss 4.4354 LearningRate 0.0069 Epoch: 14 Global Step: 246330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:35,412-Speed 9503.50 samples/sec Loss 4.4635 LearningRate 0.0069 Epoch: 14 Global Step: 246340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:36,460-Speed 9780.30 samples/sec Loss 4.4433 LearningRate 0.0069 Epoch: 14 Global Step: 246350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:37,527-Speed 9595.85 samples/sec Loss 4.4001 LearningRate 0.0069 Epoch: 14 Global Step: 246360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:38,580-Speed 9735.93 samples/sec Loss 4.3610 LearningRate 0.0069 Epoch: 14 Global Step: 246370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:39,614-Speed 9911.17 samples/sec Loss 4.4597 LearningRate 0.0069 Epoch: 14 Global Step: 246380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:40,651-Speed 9887.38 samples/sec Loss 4.4155 LearningRate 0.0069 Epoch: 14 Global Step: 246390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:41,765-Speed 9190.88 samples/sec Loss 4.5000 LearningRate 0.0069 Epoch: 14 Global Step: 246400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:42,854-Speed 9409.12 samples/sec Loss 4.3937 LearningRate 0.0069 Epoch: 14 Global Step: 246410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:43,920-Speed 9610.10 samples/sec Loss 4.4474 LearningRate 0.0069 Epoch: 14 Global Step: 246420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:45,024-Speed 9279.31 samples/sec Loss 4.3843 LearningRate 0.0069 Epoch: 14 Global Step: 246430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:46,093-Speed 9587.47 samples/sec Loss 4.5301 LearningRate 0.0069 Epoch: 14 Global Step: 246440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:47,184-Speed 9389.44 samples/sec Loss 4.4670 LearningRate 0.0069 Epoch: 14 Global Step: 246450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:48,284-Speed 9318.71 samples/sec Loss 4.4492 LearningRate 0.0068 Epoch: 14 Global Step: 246460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:49,349-Speed 9615.26 samples/sec Loss 4.4175 LearningRate 0.0068 Epoch: 14 Global Step: 246470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:50,439-Speed 9400.00 samples/sec Loss 4.4748 LearningRate 0.0068 Epoch: 14 Global Step: 246480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:51,489-Speed 9759.33 samples/sec Loss 4.4353 LearningRate 0.0068 Epoch: 14 Global Step: 246490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:52,562-Speed 9552.88 samples/sec Loss 4.4392 LearningRate 0.0068 Epoch: 14 Global Step: 246500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:53,648-Speed 9433.14 samples/sec Loss 4.3366 LearningRate 0.0068 Epoch: 14 Global Step: 246510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:54,733-Speed 9444.34 samples/sec Loss 4.4073 LearningRate 0.0068 Epoch: 14 Global Step: 246520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:55,835-Speed 9299.95 samples/sec Loss 4.4194 LearningRate 0.0068 Epoch: 14 Global Step: 246530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:56,928-Speed 9369.80 samples/sec Loss 4.4722 LearningRate 0.0068 Epoch: 14 Global Step: 246540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:29:58,041-Speed 9213.25 samples/sec Loss 4.4142 LearningRate 0.0068 Epoch: 14 Global Step: 246550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:29:59,066-Speed 9988.88 samples/sec Loss 4.4256 LearningRate 0.0068 Epoch: 14 Global Step: 246560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:00,137-Speed 9570.17 samples/sec Loss 4.4041 LearningRate 0.0068 Epoch: 14 Global Step: 246570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:01,230-Speed 9368.86 samples/sec Loss 4.4293 LearningRate 0.0068 Epoch: 14 Global Step: 246580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:02,327-Speed 9342.41 samples/sec Loss 4.3716 LearningRate 0.0068 Epoch: 14 Global Step: 246590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:03,436-Speed 9244.33 samples/sec Loss 4.4904 LearningRate 0.0068 Epoch: 14 Global Step: 246600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:04,524-Speed 9413.94 samples/sec Loss 4.5036 LearningRate 0.0068 Epoch: 14 Global Step: 246610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:05,601-Speed 9519.44 samples/sec Loss 4.4534 LearningRate 0.0068 Epoch: 14 Global Step: 246620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:06,692-Speed 9388.82 samples/sec Loss 4.3544 LearningRate 0.0068 Epoch: 14 Global Step: 246630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:07,857-Speed 8792.73 samples/sec Loss 4.3996 LearningRate 0.0068 Epoch: 14 Global Step: 246640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:08,963-Speed 9263.02 samples/sec Loss 4.4357 LearningRate 0.0068 Epoch: 14 Global Step: 246650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:10,086-Speed 9127.09 samples/sec Loss 4.3893 LearningRate 0.0068 Epoch: 14 Global Step: 246660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:11,191-Speed 9274.63 samples/sec Loss 4.4958 LearningRate 0.0068 Epoch: 14 Global Step: 246670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:12,343-Speed 8894.79 samples/sec Loss 4.5662 LearningRate 0.0068 Epoch: 14 Global Step: 246680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:13,437-Speed 9364.32 samples/sec Loss 4.4130 LearningRate 0.0068 Epoch: 14 Global Step: 246690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:14,491-Speed 9724.90 samples/sec Loss 4.4026 LearningRate 0.0068 Epoch: 14 Global Step: 246700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:15,555-Speed 9624.94 samples/sec Loss 4.4091 LearningRate 0.0068 Epoch: 14 Global Step: 246710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:16,611-Speed 9709.92 samples/sec Loss 4.4232 LearningRate 0.0068 Epoch: 14 Global Step: 246720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:17,693-Speed 9468.47 samples/sec Loss 4.4258 LearningRate 0.0068 Epoch: 14 Global Step: 246730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:18,757-Speed 9624.30 samples/sec Loss 4.3744 LearningRate 0.0068 Epoch: 14 Global Step: 246740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:19,879-Speed 9132.55 samples/sec Loss 4.5099 LearningRate 0.0068 Epoch: 14 Global Step: 246750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:20,962-Speed 9458.27 samples/sec Loss 4.4948 LearningRate 0.0068 Epoch: 14 Global Step: 246760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:22,107-Speed 8955.29 samples/sec Loss 4.3724 LearningRate 0.0068 Epoch: 14 Global Step: 246770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:23,216-Speed 9241.56 samples/sec Loss 4.5073 LearningRate 0.0068 Epoch: 14 Global Step: 246780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:24,320-Speed 9275.57 samples/sec Loss 4.4259 LearningRate 0.0068 Epoch: 14 Global Step: 246790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:25,425-Speed 9278.46 samples/sec Loss 4.4859 LearningRate 0.0068 Epoch: 14 Global Step: 246800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:26,514-Speed 9402.45 samples/sec Loss 4.5532 LearningRate 0.0068 Epoch: 14 Global Step: 246810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:27,562-Speed 9775.75 samples/sec Loss 4.4768 LearningRate 0.0068 Epoch: 14 Global Step: 246820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:28,645-Speed 9462.45 samples/sec Loss 4.4693 LearningRate 0.0068 Epoch: 14 Global Step: 246830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:29,741-Speed 9351.71 samples/sec Loss 4.4427 LearningRate 0.0068 Epoch: 14 Global Step: 246840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:30,797-Speed 9698.56 samples/sec Loss 4.4764 LearningRate 0.0068 Epoch: 14 Global Step: 246850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:31,854-Speed 9695.34 samples/sec Loss 4.5067 LearningRate 0.0068 Epoch: 14 Global Step: 246860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:32,932-Speed 9507.58 samples/sec Loss 4.4648 LearningRate 0.0068 Epoch: 14 Global Step: 246870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:34,016-Speed 9460.24 samples/sec Loss 4.4785 LearningRate 0.0068 Epoch: 14 Global Step: 246880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:35,128-Speed 9211.44 samples/sec Loss 4.4296 LearningRate 0.0068 Epoch: 14 Global Step: 246890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:36,249-Speed 9141.89 samples/sec Loss 4.3752 LearningRate 0.0068 Epoch: 14 Global Step: 246900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:37,326-Speed 9511.85 samples/sec Loss 4.4850 LearningRate 0.0068 Epoch: 14 Global Step: 246910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 21:30:38,367-Speed 9845.84 samples/sec Loss 4.4293 LearningRate 0.0068 Epoch: 14 Global Step: 246920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:39,426-Speed 9678.68 samples/sec Loss 4.4694 LearningRate 0.0068 Epoch: 14 Global Step: 246930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:40,483-Speed 9690.79 samples/sec Loss 4.4766 LearningRate 0.0068 Epoch: 14 Global Step: 246940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:41,551-Speed 9595.60 samples/sec Loss 4.4591 LearningRate 0.0068 Epoch: 14 Global Step: 246950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:42,640-Speed 9406.20 samples/sec Loss 4.4763 LearningRate 0.0068 Epoch: 14 Global Step: 246960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:43,789-Speed 8911.93 samples/sec Loss 4.5325 LearningRate 0.0068 Epoch: 14 Global Step: 246970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:44,883-Speed 9369.55 samples/sec Loss 4.5154 LearningRate 0.0068 Epoch: 14 Global Step: 246980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:45,987-Speed 9279.30 samples/sec Loss 4.3916 LearningRate 0.0068 Epoch: 14 Global Step: 246990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:47,062-Speed 9529.66 samples/sec Loss 4.4254 LearningRate 0.0068 Epoch: 14 Global Step: 247000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:48,145-Speed 9460.98 samples/sec Loss 4.4043 LearningRate 0.0068 Epoch: 14 Global Step: 247010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:49,243-Speed 9327.59 samples/sec Loss 4.5010 LearningRate 0.0068 Epoch: 14 Global Step: 247020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:50,293-Speed 9765.49 samples/sec Loss 4.5163 LearningRate 0.0068 Epoch: 14 Global Step: 247030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:51,385-Speed 9387.01 samples/sec Loss 4.4213 LearningRate 0.0068 Epoch: 14 Global Step: 247040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:52,470-Speed 9448.88 samples/sec Loss 4.4005 LearningRate 0.0068 Epoch: 14 Global Step: 247050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:53,547-Speed 9519.47 samples/sec Loss 4.4281 LearningRate 0.0068 Epoch: 14 Global Step: 247060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 21:30:54,632-Speed 9443.49 samples/sec Loss 4.4223 LearningRate 0.0068 Epoch: 14 Global Step: 247070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:30:55,707-Speed 9529.26 samples/sec Loss 4.3786 LearningRate 0.0068 Epoch: 14 Global Step: 247080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:30:56,768-Speed 9657.65 samples/sec Loss 4.4418 LearningRate 0.0068 Epoch: 14 Global Step: 247090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:30:57,870-Speed 9294.42 samples/sec Loss 4.3726 LearningRate 0.0067 Epoch: 14 Global Step: 247100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:30:58,945-Speed 9528.90 samples/sec Loss 4.5020 LearningRate 0.0067 Epoch: 14 Global Step: 247110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:00,050-Speed 9277.24 samples/sec Loss 4.5815 LearningRate 0.0067 Epoch: 14 Global Step: 247120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:01,122-Speed 9556.61 samples/sec Loss 4.3731 LearningRate 0.0067 Epoch: 14 Global Step: 247130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:02,207-Speed 9443.89 samples/sec Loss 4.5220 LearningRate 0.0067 Epoch: 14 Global Step: 247140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:03,280-Speed 9548.99 samples/sec Loss 4.4829 LearningRate 0.0067 Epoch: 14 Global Step: 247150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:04,354-Speed 9537.99 samples/sec Loss 4.5029 LearningRate 0.0067 Epoch: 14 Global Step: 247160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:05,414-Speed 9669.46 samples/sec Loss 4.4660 LearningRate 0.0067 Epoch: 14 Global Step: 247170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:06,518-Speed 9281.53 samples/sec Loss 4.4861 LearningRate 0.0067 Epoch: 14 Global Step: 247180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:07,595-Speed 9512.70 samples/sec Loss 4.4712 LearningRate 0.0067 Epoch: 14 Global Step: 247190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:08,661-Speed 9612.03 samples/sec Loss 4.4967 LearningRate 0.0067 Epoch: 14 Global Step: 247200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:09,775-Speed 9197.32 samples/sec Loss 4.4704 LearningRate 0.0067 Epoch: 14 Global Step: 247210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:10,855-Speed 9485.95 samples/sec Loss 4.4270 LearningRate 0.0067 Epoch: 14 Global Step: 247220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:11,973-Speed 9166.00 samples/sec Loss 4.4563 LearningRate 0.0067 Epoch: 14 Global Step: 247230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:13,084-Speed 9224.02 samples/sec Loss 4.5118 LearningRate 0.0067 Epoch: 14 Global Step: 247240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:14,212-Speed 9082.88 samples/sec Loss 4.3977 LearningRate 0.0067 Epoch: 14 Global Step: 247250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:15,292-Speed 9483.23 samples/sec Loss 4.4588 LearningRate 0.0067 Epoch: 14 Global Step: 247260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:16,385-Speed 9373.63 samples/sec Loss 4.3880 LearningRate 0.0067 Epoch: 14 Global Step: 247270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:17,439-Speed 9721.27 samples/sec Loss 4.4963 LearningRate 0.0067 Epoch: 14 Global Step: 247280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:18,489-Speed 9756.20 samples/sec Loss 4.4539 LearningRate 0.0067 Epoch: 14 Global Step: 247290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:19,567-Speed 9510.06 samples/sec Loss 4.5018 LearningRate 0.0067 Epoch: 14 Global Step: 247300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:20,649-Speed 9462.93 samples/sec Loss 4.5268 LearningRate 0.0067 Epoch: 14 Global Step: 247310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:21,765-Speed 9186.10 samples/sec Loss 4.4245 LearningRate 0.0067 Epoch: 14 Global Step: 247320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:22,879-Speed 9201.67 samples/sec Loss 4.4071 LearningRate 0.0067 Epoch: 14 Global Step: 247330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:23,988-Speed 9235.00 samples/sec Loss 4.4121 LearningRate 0.0067 Epoch: 14 Global Step: 247340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:25,023-Speed 9895.97 samples/sec Loss 4.5459 LearningRate 0.0067 Epoch: 14 Global Step: 247350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:26,073-Speed 9763.83 samples/sec Loss 4.5407 LearningRate 0.0067 Epoch: 14 Global Step: 247360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:27,165-Speed 9378.46 samples/sec Loss 4.3382 LearningRate 0.0067 Epoch: 14 Global Step: 247370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:28,283-Speed 9167.19 samples/sec Loss 4.4054 LearningRate 0.0067 Epoch: 14 Global Step: 247380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:29,416-Speed 9038.99 samples/sec Loss 4.4383 LearningRate 0.0067 Epoch: 14 Global Step: 247390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:30,514-Speed 9339.58 samples/sec Loss 4.4522 LearningRate 0.0067 Epoch: 14 Global Step: 247400 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-04-11 21:31:31,630-Speed 9181.30 samples/sec Loss 4.4551 LearningRate 0.0067 Epoch: 14 Global Step: 247410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:32,692-Speed 9647.83 samples/sec Loss 4.4270 LearningRate 0.0067 Epoch: 14 Global Step: 247420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:33,789-Speed 9339.62 samples/sec Loss 4.4902 LearningRate 0.0067 Epoch: 14 Global Step: 247430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:34,861-Speed 9555.24 samples/sec Loss 4.4046 LearningRate 0.0067 Epoch: 14 Global Step: 247440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:35,955-Speed 9364.87 samples/sec Loss 4.5236 LearningRate 0.0067 Epoch: 14 Global Step: 247450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:37,017-Speed 9649.66 samples/sec Loss 4.4291 LearningRate 0.0067 Epoch: 14 Global Step: 247460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:38,077-Speed 9661.31 samples/sec Loss 4.5323 LearningRate 0.0067 Epoch: 14 Global Step: 247470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:39,133-Speed 9709.31 samples/sec Loss 4.4828 LearningRate 0.0067 Epoch: 14 Global Step: 247480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:40,204-Speed 9570.11 samples/sec Loss 4.4530 LearningRate 0.0067 Epoch: 14 Global Step: 247490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:41,274-Speed 9577.54 samples/sec Loss 4.3989 LearningRate 0.0067 Epoch: 14 Global Step: 247500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:42,313-Speed 9861.98 samples/sec Loss 4.4588 LearningRate 0.0067 Epoch: 14 Global Step: 247510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:43,396-Speed 9453.93 samples/sec Loss 4.4068 LearningRate 0.0067 Epoch: 14 Global Step: 247520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:44,462-Speed 9615.34 samples/sec Loss 4.3775 LearningRate 0.0067 Epoch: 14 Global Step: 247530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:45,621-Speed 8836.10 samples/sec Loss 4.6051 LearningRate 0.0067 Epoch: 14 Global Step: 247540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:46,684-Speed 9638.27 samples/sec Loss 4.4515 LearningRate 0.0067 Epoch: 14 Global Step: 247550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:47,766-Speed 9476.92 samples/sec Loss 4.2657 LearningRate 0.0067 Epoch: 14 Global Step: 247560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:48,834-Speed 9590.54 samples/sec Loss 4.4732 LearningRate 0.0067 Epoch: 14 Global Step: 247570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:49,887-Speed 9734.57 samples/sec Loss 4.4567 LearningRate 0.0067 Epoch: 14 Global Step: 247580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:50,916-Speed 9953.33 samples/sec Loss 4.5225 LearningRate 0.0067 Epoch: 14 Global Step: 247590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:51,979-Speed 9638.09 samples/sec Loss 4.3780 LearningRate 0.0067 Epoch: 14 Global Step: 247600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:31:53,088-Speed 9240.63 samples/sec Loss 4.3358 LearningRate 0.0067 Epoch: 14 Global Step: 247610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:54,151-Speed 9641.20 samples/sec Loss 4.4226 LearningRate 0.0067 Epoch: 14 Global Step: 247620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:55,267-Speed 9179.37 samples/sec Loss 4.4350 LearningRate 0.0067 Epoch: 14 Global Step: 247630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:56,347-Speed 9487.61 samples/sec Loss 4.4311 LearningRate 0.0067 Epoch: 14 Global Step: 247640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:57,439-Speed 9381.19 samples/sec Loss 4.5929 LearningRate 0.0067 Epoch: 14 Global Step: 247650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:58,540-Speed 9311.59 samples/sec Loss 4.4372 LearningRate 0.0067 Epoch: 14 Global Step: 247660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:31:59,680-Speed 8980.03 samples/sec Loss 4.4614 LearningRate 0.0067 Epoch: 14 Global Step: 247670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:00,814-Speed 9036.26 samples/sec Loss 4.5480 LearningRate 0.0067 Epoch: 14 Global Step: 247680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:01,913-Speed 9319.92 samples/sec Loss 4.4972 LearningRate 0.0067 Epoch: 14 Global Step: 247690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:03,021-Speed 9248.12 samples/sec Loss 4.5207 LearningRate 0.0067 Epoch: 14 Global Step: 247700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:04,109-Speed 9422.90 samples/sec Loss 4.3861 LearningRate 0.0067 Epoch: 14 Global Step: 247710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:05,243-Speed 9034.56 samples/sec Loss 4.4641 LearningRate 0.0067 Epoch: 14 Global Step: 247720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:06,339-Speed 9351.74 samples/sec Loss 4.4445 LearningRate 0.0067 Epoch: 14 Global Step: 247730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:07,423-Speed 9450.96 samples/sec Loss 4.4224 LearningRate 0.0066 Epoch: 14 Global Step: 247740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:08,499-Speed 9523.59 samples/sec Loss 4.4072 LearningRate 0.0066 Epoch: 14 Global Step: 247750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:09,599-Speed 9314.74 samples/sec Loss 4.4491 LearningRate 0.0066 Epoch: 14 Global Step: 247760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:10,701-Speed 9300.26 samples/sec Loss 4.4013 LearningRate 0.0066 Epoch: 14 Global Step: 247770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:11,796-Speed 9357.40 samples/sec Loss 4.4885 LearningRate 0.0066 Epoch: 14 Global Step: 247780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:12,874-Speed 9499.64 samples/sec Loss 4.4887 LearningRate 0.0066 Epoch: 14 Global Step: 247790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:13,937-Speed 9645.90 samples/sec Loss 4.4715 LearningRate 0.0066 Epoch: 14 Global Step: 247800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:14,981-Speed 9812.17 samples/sec Loss 4.4543 LearningRate 0.0066 Epoch: 14 Global Step: 247810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:16,032-Speed 9761.20 samples/sec Loss 4.4751 LearningRate 0.0066 Epoch: 14 Global Step: 247820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:17,133-Speed 9301.47 samples/sec Loss 4.4659 LearningRate 0.0066 Epoch: 14 Global Step: 247830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:18,185-Speed 9745.22 samples/sec Loss 4.4166 LearningRate 0.0066 Epoch: 14 Global Step: 247840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:19,257-Speed 9557.35 samples/sec Loss 4.4781 LearningRate 0.0066 Epoch: 14 Global Step: 247850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:20,380-Speed 9125.27 samples/sec Loss 4.4721 LearningRate 0.0066 Epoch: 14 Global Step: 247860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:21,455-Speed 9524.26 samples/sec Loss 4.4047 LearningRate 0.0066 Epoch: 14 Global Step: 247870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:22,560-Speed 9273.01 samples/sec Loss 4.4482 LearningRate 0.0066 Epoch: 14 Global Step: 247880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:23,659-Speed 9327.84 samples/sec Loss 4.4244 LearningRate 0.0066 Epoch: 14 Global Step: 247890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:24,756-Speed 9341.11 samples/sec Loss 4.3668 LearningRate 0.0066 Epoch: 14 Global Step: 247900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:32:25,829-Speed 9546.98 samples/sec Loss 4.3897 LearningRate 0.0066 Epoch: 14 Global Step: 247910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:26,945-Speed 9183.94 samples/sec Loss 4.4428 LearningRate 0.0066 Epoch: 14 Global Step: 247920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:28,065-Speed 9151.53 samples/sec Loss 4.4575 LearningRate 0.0066 Epoch: 14 Global Step: 247930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:29,188-Speed 9120.33 samples/sec Loss 4.4654 LearningRate 0.0066 Epoch: 14 Global Step: 247940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:30,303-Speed 9188.85 samples/sec Loss 4.3813 LearningRate 0.0066 Epoch: 14 Global Step: 247950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:31,431-Speed 9081.84 samples/sec Loss 4.4997 LearningRate 0.0066 Epoch: 14 Global Step: 247960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:32,528-Speed 9340.81 samples/sec Loss 4.4408 LearningRate 0.0066 Epoch: 14 Global Step: 247970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:33,614-Speed 9437.80 samples/sec Loss 4.4398 LearningRate 0.0066 Epoch: 14 Global Step: 247980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:34,686-Speed 9555.51 samples/sec Loss 4.4027 LearningRate 0.0066 Epoch: 14 Global Step: 247990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:35,770-Speed 9451.35 samples/sec Loss 4.3953 LearningRate 0.0066 Epoch: 14 Global Step: 248000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:32:57,638-[lfw][248000]XNorm: 7.377807 Training: 2022-04-11 21:32:57,638-[lfw][248000]Accuracy-Flip: 0.99600+-0.00281 Training: 2022-04-11 21:32:57,639-[lfw][248000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:33:22,930-[cfp_fp][248000]XNorm: 6.355365 Training: 2022-04-11 21:33:22,931-[cfp_fp][248000]Accuracy-Flip: 0.96800+-0.00980 Training: 2022-04-11 21:33:22,931-[cfp_fp][248000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:33:44,762-[agedb_30][248000]XNorm: 7.191690 Training: 2022-04-11 21:33:44,763-[agedb_30][248000]Accuracy-Flip: 0.97350+-0.00828 Training: 2022-04-11 21:33:44,763-[agedb_30][248000]Accuracy-Highest: 0.97350 Training: 2022-04-11 21:33:45,872-Speed 146.08 samples/sec Loss 4.4540 LearningRate 0.0066 Epoch: 14 Global Step: 248010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:33:46,948-Speed 9521.21 samples/sec Loss 4.4630 LearningRate 0.0066 Epoch: 14 Global Step: 248020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:33:48,045-Speed 9343.85 samples/sec Loss 4.4350 LearningRate 0.0066 Epoch: 14 Global Step: 248030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:33:49,158-Speed 9205.35 samples/sec Loss 4.4011 LearningRate 0.0066 Epoch: 14 Global Step: 248040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:33:50,248-Speed 9395.67 samples/sec Loss 4.4670 LearningRate 0.0066 Epoch: 14 Global Step: 248050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:33:51,349-Speed 9305.68 samples/sec Loss 4.4717 LearningRate 0.0066 Epoch: 14 Global Step: 248060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:33:52,439-Speed 9405.33 samples/sec Loss 4.4463 LearningRate 0.0066 Epoch: 14 Global Step: 248070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:33:53,489-Speed 9754.86 samples/sec Loss 4.4349 LearningRate 0.0066 Epoch: 14 Global Step: 248080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:33:54,582-Speed 9378.72 samples/sec Loss 4.4074 LearningRate 0.0066 Epoch: 14 Global Step: 248090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:33:55,683-Speed 9305.46 samples/sec Loss 4.4078 LearningRate 0.0066 Epoch: 14 Global Step: 248100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:33:56,795-Speed 9210.90 samples/sec Loss 4.4617 LearningRate 0.0066 Epoch: 14 Global Step: 248110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:33:57,869-Speed 9534.70 samples/sec Loss 4.3979 LearningRate 0.0066 Epoch: 14 Global Step: 248120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:33:58,947-Speed 9511.03 samples/sec Loss 4.4293 LearningRate 0.0066 Epoch: 14 Global Step: 248130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:00,045-Speed 9326.39 samples/sec Loss 4.4536 LearningRate 0.0066 Epoch: 14 Global Step: 248140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:01,113-Speed 9592.96 samples/sec Loss 4.4265 LearningRate 0.0066 Epoch: 14 Global Step: 248150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:02,174-Speed 9659.72 samples/sec Loss 4.5242 LearningRate 0.0066 Epoch: 14 Global Step: 248160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:03,274-Speed 9312.96 samples/sec Loss 4.5184 LearningRate 0.0066 Epoch: 14 Global Step: 248170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:04,389-Speed 9193.89 samples/sec Loss 4.5075 LearningRate 0.0066 Epoch: 14 Global Step: 248180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:05,504-Speed 9185.26 samples/sec Loss 4.4505 LearningRate 0.0066 Epoch: 14 Global Step: 248190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:06,559-Speed 9709.53 samples/sec Loss 4.4970 LearningRate 0.0066 Epoch: 14 Global Step: 248200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:07,654-Speed 9360.17 samples/sec Loss 4.4709 LearningRate 0.0066 Epoch: 14 Global Step: 248210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:08,772-Speed 9168.45 samples/sec Loss 4.4398 LearningRate 0.0066 Epoch: 14 Global Step: 248220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:09,870-Speed 9329.95 samples/sec Loss 4.4085 LearningRate 0.0066 Epoch: 14 Global Step: 248230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:10,946-Speed 9522.06 samples/sec Loss 4.3575 LearningRate 0.0066 Epoch: 14 Global Step: 248240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:12,070-Speed 9115.82 samples/sec Loss 4.4376 LearningRate 0.0066 Epoch: 14 Global Step: 248250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:13,163-Speed 9376.72 samples/sec Loss 4.5176 LearningRate 0.0066 Epoch: 14 Global Step: 248260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:14,237-Speed 9537.88 samples/sec Loss 4.4801 LearningRate 0.0066 Epoch: 14 Global Step: 248270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:15,328-Speed 9390.13 samples/sec Loss 4.5333 LearningRate 0.0066 Epoch: 14 Global Step: 248280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:16,404-Speed 9525.61 samples/sec Loss 4.4361 LearningRate 0.0066 Epoch: 14 Global Step: 248290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:17,521-Speed 9170.31 samples/sec Loss 4.5095 LearningRate 0.0066 Epoch: 14 Global Step: 248300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:18,636-Speed 9193.09 samples/sec Loss 4.4474 LearningRate 0.0066 Epoch: 14 Global Step: 248310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:19,715-Speed 9496.44 samples/sec Loss 4.4859 LearningRate 0.0066 Epoch: 14 Global Step: 248320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:20,771-Speed 9704.55 samples/sec Loss 4.4244 LearningRate 0.0066 Epoch: 14 Global Step: 248330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:21,873-Speed 9302.58 samples/sec Loss 4.4889 LearningRate 0.0066 Epoch: 14 Global Step: 248340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:22,947-Speed 9532.04 samples/sec Loss 4.3984 LearningRate 0.0066 Epoch: 14 Global Step: 248350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:24,061-Speed 9204.50 samples/sec Loss 4.4375 LearningRate 0.0066 Epoch: 14 Global Step: 248360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:25,159-Speed 9330.26 samples/sec Loss 4.5056 LearningRate 0.0066 Epoch: 14 Global Step: 248370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:26,222-Speed 9633.08 samples/sec Loss 4.4489 LearningRate 0.0066 Epoch: 14 Global Step: 248380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:27,301-Speed 9496.36 samples/sec Loss 4.4358 LearningRate 0.0065 Epoch: 14 Global Step: 248390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:28,397-Speed 9351.13 samples/sec Loss 4.4648 LearningRate 0.0065 Epoch: 14 Global Step: 248400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:29,458-Speed 9653.79 samples/sec Loss 4.4314 LearningRate 0.0065 Epoch: 14 Global Step: 248410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:30,579-Speed 9139.80 samples/sec Loss 4.4412 LearningRate 0.0065 Epoch: 14 Global Step: 248420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:31,708-Speed 9078.98 samples/sec Loss 4.3827 LearningRate 0.0065 Epoch: 14 Global Step: 248430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:32,799-Speed 9393.13 samples/sec Loss 4.3911 LearningRate 0.0065 Epoch: 14 Global Step: 248440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:33,881-Speed 9480.74 samples/sec Loss 4.4652 LearningRate 0.0065 Epoch: 14 Global Step: 248450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:34,982-Speed 9303.15 samples/sec Loss 4.4497 LearningRate 0.0065 Epoch: 14 Global Step: 248460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:36,104-Speed 9127.98 samples/sec Loss 4.4596 LearningRate 0.0065 Epoch: 14 Global Step: 248470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:37,171-Speed 9608.66 samples/sec Loss 4.4496 LearningRate 0.0065 Epoch: 14 Global Step: 248480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:38,227-Speed 9697.84 samples/sec Loss 4.4096 LearningRate 0.0065 Epoch: 14 Global Step: 248490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:39,279-Speed 9740.89 samples/sec Loss 4.5496 LearningRate 0.0065 Epoch: 14 Global Step: 248500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:40,343-Speed 9633.91 samples/sec Loss 4.4728 LearningRate 0.0065 Epoch: 14 Global Step: 248510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:41,427-Speed 9450.02 samples/sec Loss 4.5480 LearningRate 0.0065 Epoch: 14 Global Step: 248520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:42,502-Speed 9527.04 samples/sec Loss 4.4710 LearningRate 0.0065 Epoch: 14 Global Step: 248530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:43,578-Speed 9525.63 samples/sec Loss 4.3678 LearningRate 0.0065 Epoch: 14 Global Step: 248540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:44,690-Speed 9211.76 samples/sec Loss 4.5416 LearningRate 0.0065 Epoch: 14 Global Step: 248550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:45,789-Speed 9334.74 samples/sec Loss 4.5371 LearningRate 0.0065 Epoch: 14 Global Step: 248560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:46,848-Speed 9675.47 samples/sec Loss 4.4859 LearningRate 0.0065 Epoch: 14 Global Step: 248570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:47,931-Speed 9455.27 samples/sec Loss 4.4285 LearningRate 0.0065 Epoch: 14 Global Step: 248580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:49,085-Speed 8877.40 samples/sec Loss 4.3999 LearningRate 0.0065 Epoch: 14 Global Step: 248590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:50,139-Speed 9725.67 samples/sec Loss 4.4834 LearningRate 0.0065 Epoch: 14 Global Step: 248600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:51,208-Speed 9589.22 samples/sec Loss 4.4723 LearningRate 0.0065 Epoch: 14 Global Step: 248610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:52,333-Speed 9108.23 samples/sec Loss 4.4541 LearningRate 0.0065 Epoch: 14 Global Step: 248620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:53,401-Speed 9591.67 samples/sec Loss 4.4288 LearningRate 0.0065 Epoch: 14 Global Step: 248630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:54,496-Speed 9351.42 samples/sec Loss 4.3826 LearningRate 0.0065 Epoch: 14 Global Step: 248640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:55,633-Speed 9016.01 samples/sec Loss 4.4258 LearningRate 0.0065 Epoch: 14 Global Step: 248650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:56,735-Speed 9292.50 samples/sec Loss 4.4593 LearningRate 0.0065 Epoch: 14 Global Step: 248660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:34:57,858-Speed 9132.05 samples/sec Loss 4.4984 LearningRate 0.0065 Epoch: 14 Global Step: 248670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:34:58,913-Speed 9706.97 samples/sec Loss 4.4785 LearningRate 0.0065 Epoch: 14 Global Step: 248680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:35:00,005-Speed 9382.60 samples/sec Loss 4.4823 LearningRate 0.0065 Epoch: 14 Global Step: 248690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:01,125-Speed 9149.85 samples/sec Loss 4.5059 LearningRate 0.0065 Epoch: 14 Global Step: 248700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:02,217-Speed 9380.14 samples/sec Loss 4.3256 LearningRate 0.0065 Epoch: 14 Global Step: 248710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:03,368-Speed 8904.85 samples/sec Loss 4.4706 LearningRate 0.0065 Epoch: 14 Global Step: 248720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:04,466-Speed 9331.37 samples/sec Loss 4.4293 LearningRate 0.0065 Epoch: 14 Global Step: 248730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:05,562-Speed 9349.27 samples/sec Loss 4.4588 LearningRate 0.0065 Epoch: 14 Global Step: 248740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:06,625-Speed 9636.37 samples/sec Loss 4.4325 LearningRate 0.0065 Epoch: 14 Global Step: 248750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:07,669-Speed 9809.39 samples/sec Loss 4.3976 LearningRate 0.0065 Epoch: 14 Global Step: 248760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:08,772-Speed 9294.68 samples/sec Loss 4.3965 LearningRate 0.0065 Epoch: 14 Global Step: 248770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:09,841-Speed 9588.09 samples/sec Loss 4.4420 LearningRate 0.0065 Epoch: 14 Global Step: 248780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:10,910-Speed 9586.12 samples/sec Loss 4.3875 LearningRate 0.0065 Epoch: 14 Global Step: 248790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:11,983-Speed 9549.66 samples/sec Loss 4.5206 LearningRate 0.0065 Epoch: 14 Global Step: 248800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:13,021-Speed 9870.71 samples/sec Loss 4.4422 LearningRate 0.0065 Epoch: 14 Global Step: 248810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:14,108-Speed 9429.90 samples/sec Loss 4.4732 LearningRate 0.0065 Epoch: 14 Global Step: 248820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:15,157-Speed 9767.74 samples/sec Loss 4.4271 LearningRate 0.0065 Epoch: 14 Global Step: 248830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:16,203-Speed 9799.96 samples/sec Loss 4.4012 LearningRate 0.0065 Epoch: 14 Global Step: 248840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:17,280-Speed 9513.25 samples/sec Loss 4.5663 LearningRate 0.0065 Epoch: 14 Global Step: 248850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:18,402-Speed 9126.69 samples/sec Loss 4.3996 LearningRate 0.0065 Epoch: 14 Global Step: 248860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:19,511-Speed 9235.81 samples/sec Loss 4.5020 LearningRate 0.0065 Epoch: 14 Global Step: 248870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:20,583-Speed 9561.76 samples/sec Loss 4.4330 LearningRate 0.0065 Epoch: 14 Global Step: 248880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:21,662-Speed 9492.42 samples/sec Loss 4.4856 LearningRate 0.0065 Epoch: 14 Global Step: 248890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:35:22,728-Speed 9612.79 samples/sec Loss 4.4159 LearningRate 0.0065 Epoch: 14 Global Step: 248900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:35:23,800-Speed 9559.78 samples/sec Loss 4.4958 LearningRate 0.0065 Epoch: 14 Global Step: 248910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:35:24,857-Speed 9686.79 samples/sec Loss 4.5128 LearningRate 0.0065 Epoch: 14 Global Step: 248920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:25,941-Speed 9455.12 samples/sec Loss 4.4449 LearningRate 0.0065 Epoch: 14 Global Step: 248930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:27,055-Speed 9200.49 samples/sec Loss 4.3992 LearningRate 0.0065 Epoch: 14 Global Step: 248940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:28,152-Speed 9343.47 samples/sec Loss 4.4234 LearningRate 0.0065 Epoch: 14 Global Step: 248950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:29,199-Speed 9782.77 samples/sec Loss 4.5084 LearningRate 0.0065 Epoch: 14 Global Step: 248960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:30,286-Speed 9431.51 samples/sec Loss 4.4446 LearningRate 0.0065 Epoch: 14 Global Step: 248970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:31,373-Speed 9423.57 samples/sec Loss 4.4391 LearningRate 0.0065 Epoch: 14 Global Step: 248980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:32,488-Speed 9186.44 samples/sec Loss 4.4667 LearningRate 0.0065 Epoch: 14 Global Step: 248990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:33,554-Speed 9611.71 samples/sec Loss 4.4823 LearningRate 0.0065 Epoch: 14 Global Step: 249000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:34,657-Speed 9286.44 samples/sec Loss 4.3338 LearningRate 0.0065 Epoch: 14 Global Step: 249010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:35,774-Speed 9175.79 samples/sec Loss 4.4783 LearningRate 0.0065 Epoch: 14 Global Step: 249020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:35:36,862-Speed 9417.38 samples/sec Loss 4.4075 LearningRate 0.0065 Epoch: 14 Global Step: 249030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:35:37,919-Speed 9695.18 samples/sec Loss 4.4636 LearningRate 0.0065 Epoch: 14 Global Step: 249040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:39,020-Speed 9308.00 samples/sec Loss 4.4000 LearningRate 0.0064 Epoch: 14 Global Step: 249050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:40,091-Speed 9566.16 samples/sec Loss 4.4736 LearningRate 0.0064 Epoch: 14 Global Step: 249060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:41,188-Speed 9334.21 samples/sec Loss 4.5353 LearningRate 0.0064 Epoch: 14 Global Step: 249070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:42,298-Speed 9235.11 samples/sec Loss 4.3922 LearningRate 0.0064 Epoch: 14 Global Step: 249080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:43,390-Speed 9384.56 samples/sec Loss 4.5090 LearningRate 0.0064 Epoch: 14 Global Step: 249090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:44,432-Speed 9829.27 samples/sec Loss 4.4021 LearningRate 0.0064 Epoch: 14 Global Step: 249100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:45,513-Speed 9481.48 samples/sec Loss 4.5009 LearningRate 0.0064 Epoch: 14 Global Step: 249110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:46,627-Speed 9201.58 samples/sec Loss 4.4660 LearningRate 0.0064 Epoch: 14 Global Step: 249120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:47,723-Speed 9347.89 samples/sec Loss 4.4449 LearningRate 0.0064 Epoch: 14 Global Step: 249130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:48,861-Speed 8999.96 samples/sec Loss 4.3950 LearningRate 0.0064 Epoch: 14 Global Step: 249140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:35:49,905-Speed 9816.95 samples/sec Loss 4.4353 LearningRate 0.0064 Epoch: 14 Global Step: 249150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:50,973-Speed 9596.89 samples/sec Loss 4.4668 LearningRate 0.0064 Epoch: 14 Global Step: 249160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:52,107-Speed 9031.93 samples/sec Loss 4.4191 LearningRate 0.0064 Epoch: 14 Global Step: 249170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:53,173-Speed 9608.24 samples/sec Loss 4.2824 LearningRate 0.0064 Epoch: 14 Global Step: 249180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:54,225-Speed 9745.25 samples/sec Loss 4.4494 LearningRate 0.0064 Epoch: 14 Global Step: 249190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:55,338-Speed 9203.77 samples/sec Loss 4.4797 LearningRate 0.0064 Epoch: 14 Global Step: 249200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:56,464-Speed 9096.81 samples/sec Loss 4.4069 LearningRate 0.0064 Epoch: 14 Global Step: 249210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:57,532-Speed 9598.45 samples/sec Loss 4.4044 LearningRate 0.0064 Epoch: 14 Global Step: 249220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:58,597-Speed 9615.49 samples/sec Loss 4.4151 LearningRate 0.0064 Epoch: 14 Global Step: 249230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:35:59,656-Speed 9683.33 samples/sec Loss 4.4488 LearningRate 0.0064 Epoch: 14 Global Step: 249240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:00,715-Speed 9671.55 samples/sec Loss 4.5099 LearningRate 0.0064 Epoch: 14 Global Step: 249250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:01,785-Speed 9578.74 samples/sec Loss 4.4531 LearningRate 0.0064 Epoch: 14 Global Step: 249260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:02,899-Speed 9201.76 samples/sec Loss 4.4893 LearningRate 0.0064 Epoch: 14 Global Step: 249270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:03,996-Speed 9334.84 samples/sec Loss 4.4551 LearningRate 0.0064 Epoch: 14 Global Step: 249280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:05,123-Speed 9094.80 samples/sec Loss 4.4811 LearningRate 0.0064 Epoch: 14 Global Step: 249290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:06,197-Speed 9541.77 samples/sec Loss 4.4064 LearningRate 0.0064 Epoch: 14 Global Step: 249300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:07,248-Speed 9745.16 samples/sec Loss 4.4592 LearningRate 0.0064 Epoch: 14 Global Step: 249310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:08,304-Speed 9702.29 samples/sec Loss 4.4038 LearningRate 0.0064 Epoch: 14 Global Step: 249320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:09,418-Speed 9203.85 samples/sec Loss 4.4059 LearningRate 0.0064 Epoch: 14 Global Step: 249330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:10,522-Speed 9274.42 samples/sec Loss 4.3713 LearningRate 0.0064 Epoch: 14 Global Step: 249340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:11,625-Speed 9288.53 samples/sec Loss 4.5061 LearningRate 0.0064 Epoch: 14 Global Step: 249350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:12,751-Speed 9104.85 samples/sec Loss 4.4836 LearningRate 0.0064 Epoch: 14 Global Step: 249360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:13,807-Speed 9699.01 samples/sec Loss 4.5065 LearningRate 0.0064 Epoch: 14 Global Step: 249370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:14,846-Speed 9858.66 samples/sec Loss 4.4674 LearningRate 0.0064 Epoch: 14 Global Step: 249380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:15,946-Speed 9322.68 samples/sec Loss 4.4141 LearningRate 0.0064 Epoch: 14 Global Step: 249390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:17,039-Speed 9371.44 samples/sec Loss 4.4026 LearningRate 0.0064 Epoch: 14 Global Step: 249400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:18,168-Speed 9075.54 samples/sec Loss 4.5242 LearningRate 0.0064 Epoch: 14 Global Step: 249410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:19,248-Speed 9483.10 samples/sec Loss 4.5057 LearningRate 0.0064 Epoch: 14 Global Step: 249420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:20,305-Speed 9698.00 samples/sec Loss 4.4137 LearningRate 0.0064 Epoch: 14 Global Step: 249430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:21,351-Speed 9794.07 samples/sec Loss 4.4319 LearningRate 0.0064 Epoch: 14 Global Step: 249440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:22,415-Speed 9631.91 samples/sec Loss 4.4891 LearningRate 0.0064 Epoch: 14 Global Step: 249450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:23,489-Speed 9540.80 samples/sec Loss 4.3428 LearningRate 0.0064 Epoch: 14 Global Step: 249460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:24,561-Speed 9558.98 samples/sec Loss 4.3802 LearningRate 0.0064 Epoch: 14 Global Step: 249470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:25,648-Speed 9428.57 samples/sec Loss 4.4059 LearningRate 0.0064 Epoch: 14 Global Step: 249480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:26,723-Speed 9527.52 samples/sec Loss 4.3775 LearningRate 0.0064 Epoch: 14 Global Step: 249490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:27,787-Speed 9623.70 samples/sec Loss 4.4437 LearningRate 0.0064 Epoch: 14 Global Step: 249500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:28,893-Speed 9267.03 samples/sec Loss 4.4975 LearningRate 0.0064 Epoch: 14 Global Step: 249510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:29,965-Speed 9562.80 samples/sec Loss 4.3579 LearningRate 0.0064 Epoch: 14 Global Step: 249520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:31,029-Speed 9630.39 samples/sec Loss 4.4161 LearningRate 0.0064 Epoch: 14 Global Step: 249530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:32,125-Speed 9347.20 samples/sec Loss 4.5017 LearningRate 0.0064 Epoch: 14 Global Step: 249540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:33,183-Speed 9677.49 samples/sec Loss 4.4790 LearningRate 0.0064 Epoch: 14 Global Step: 249550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:34,236-Speed 9737.08 samples/sec Loss 4.5556 LearningRate 0.0064 Epoch: 14 Global Step: 249560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:35,336-Speed 9310.91 samples/sec Loss 4.5091 LearningRate 0.0064 Epoch: 14 Global Step: 249570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:36,397-Speed 9654.07 samples/sec Loss 4.4040 LearningRate 0.0064 Epoch: 14 Global Step: 249580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:37,453-Speed 9703.93 samples/sec Loss 4.4376 LearningRate 0.0064 Epoch: 14 Global Step: 249590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:38,573-Speed 9153.53 samples/sec Loss 4.4561 LearningRate 0.0064 Epoch: 14 Global Step: 249600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:39,655-Speed 9474.73 samples/sec Loss 4.5395 LearningRate 0.0064 Epoch: 14 Global Step: 249610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:40,723-Speed 9595.30 samples/sec Loss 4.4378 LearningRate 0.0064 Epoch: 14 Global Step: 249620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:41,787-Speed 9629.64 samples/sec Loss 4.4597 LearningRate 0.0064 Epoch: 14 Global Step: 249630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:42,893-Speed 9260.29 samples/sec Loss 4.4758 LearningRate 0.0064 Epoch: 14 Global Step: 249640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:44,005-Speed 9214.24 samples/sec Loss 4.4195 LearningRate 0.0064 Epoch: 14 Global Step: 249650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:45,072-Speed 9607.66 samples/sec Loss 4.3638 LearningRate 0.0064 Epoch: 14 Global Step: 249660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:46,156-Speed 9451.22 samples/sec Loss 4.4654 LearningRate 0.0064 Epoch: 14 Global Step: 249670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:47,267-Speed 9226.51 samples/sec Loss 4.4705 LearningRate 0.0064 Epoch: 14 Global Step: 249680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:48,301-Speed 9910.15 samples/sec Loss 4.3656 LearningRate 0.0064 Epoch: 14 Global Step: 249690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:49,435-Speed 9032.23 samples/sec Loss 4.3874 LearningRate 0.0064 Epoch: 14 Global Step: 249700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:50,548-Speed 9205.88 samples/sec Loss 4.4336 LearningRate 0.0063 Epoch: 14 Global Step: 249710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:36:51,616-Speed 9591.99 samples/sec Loss 4.5154 LearningRate 0.0063 Epoch: 14 Global Step: 249720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:52,686-Speed 9572.38 samples/sec Loss 4.4798 LearningRate 0.0063 Epoch: 14 Global Step: 249730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:53,776-Speed 9396.88 samples/sec Loss 4.4403 LearningRate 0.0063 Epoch: 14 Global Step: 249740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:54,933-Speed 8858.87 samples/sec Loss 4.5231 LearningRate 0.0063 Epoch: 14 Global Step: 249750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:55,988-Speed 9710.71 samples/sec Loss 4.3819 LearningRate 0.0063 Epoch: 14 Global Step: 249760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:57,059-Speed 9563.73 samples/sec Loss 4.4750 LearningRate 0.0063 Epoch: 14 Global Step: 249770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:58,091-Speed 9934.58 samples/sec Loss 4.4370 LearningRate 0.0063 Epoch: 14 Global Step: 249780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:36:59,120-Speed 9958.43 samples/sec Loss 4.4496 LearningRate 0.0063 Epoch: 14 Global Step: 249790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:00,225-Speed 9276.88 samples/sec Loss 4.3742 LearningRate 0.0063 Epoch: 14 Global Step: 249800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:01,285-Speed 9659.80 samples/sec Loss 4.5376 LearningRate 0.0063 Epoch: 14 Global Step: 249810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:02,363-Speed 9506.71 samples/sec Loss 4.3668 LearningRate 0.0063 Epoch: 14 Global Step: 249820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:03,425-Speed 9644.31 samples/sec Loss 4.4156 LearningRate 0.0063 Epoch: 14 Global Step: 249830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:37:04,540-Speed 9190.67 samples/sec Loss 4.4819 LearningRate 0.0063 Epoch: 14 Global Step: 249840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:37:05,641-Speed 9305.22 samples/sec Loss 4.3679 LearningRate 0.0063 Epoch: 14 Global Step: 249850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:37:06,692-Speed 9752.62 samples/sec Loss 4.4129 LearningRate 0.0063 Epoch: 14 Global Step: 249860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:37:07,759-Speed 9600.78 samples/sec Loss 4.4897 LearningRate 0.0063 Epoch: 14 Global Step: 249870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:37:08,848-Speed 9413.32 samples/sec Loss 4.3985 LearningRate 0.0063 Epoch: 14 Global Step: 249880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:37:09,928-Speed 9481.07 samples/sec Loss 4.4317 LearningRate 0.0063 Epoch: 14 Global Step: 249890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:37:10,985-Speed 9698.58 samples/sec Loss 4.4115 LearningRate 0.0063 Epoch: 14 Global Step: 249900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:37:12,076-Speed 9387.70 samples/sec Loss 4.3899 LearningRate 0.0063 Epoch: 14 Global Step: 249910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:37:13,139-Speed 9639.24 samples/sec Loss 4.4865 LearningRate 0.0063 Epoch: 14 Global Step: 249920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:37:14,221-Speed 9464.35 samples/sec Loss 4.4083 LearningRate 0.0063 Epoch: 14 Global Step: 249930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:15,294-Speed 9555.83 samples/sec Loss 4.4873 LearningRate 0.0063 Epoch: 14 Global Step: 249940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:16,388-Speed 9364.66 samples/sec Loss 4.3430 LearningRate 0.0063 Epoch: 14 Global Step: 249950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:17,465-Speed 9514.76 samples/sec Loss 4.4572 LearningRate 0.0063 Epoch: 14 Global Step: 249960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:18,559-Speed 9367.60 samples/sec Loss 4.4778 LearningRate 0.0063 Epoch: 14 Global Step: 249970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:19,672-Speed 9202.68 samples/sec Loss 4.4608 LearningRate 0.0063 Epoch: 14 Global Step: 249980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:20,775-Speed 9293.06 samples/sec Loss 4.4632 LearningRate 0.0063 Epoch: 14 Global Step: 249990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:21,858-Speed 9459.69 samples/sec Loss 4.4505 LearningRate 0.0063 Epoch: 14 Global Step: 250000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:37:43,946-[lfw][250000]XNorm: 7.486033 Training: 2022-04-11 21:37:43,947-[lfw][250000]Accuracy-Flip: 0.99617+-0.00224 Training: 2022-04-11 21:37:43,947-[lfw][250000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:38:09,429-[cfp_fp][250000]XNorm: 6.442418 Training: 2022-04-11 21:38:09,430-[cfp_fp][250000]Accuracy-Flip: 0.96886+-0.00983 Training: 2022-04-11 21:38:09,430-[cfp_fp][250000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:38:31,399-[agedb_30][250000]XNorm: 7.234231 Training: 2022-04-11 21:38:31,400-[agedb_30][250000]Accuracy-Flip: 0.97100+-0.00867 Training: 2022-04-11 21:38:31,400-[agedb_30][250000]Accuracy-Highest: 0.97350 Training: 2022-04-11 21:38:32,485-Speed 144.99 samples/sec Loss 4.4628 LearningRate 0.0063 Epoch: 14 Global Step: 250010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:33,603-Speed 9166.87 samples/sec Loss 4.3672 LearningRate 0.0063 Epoch: 14 Global Step: 250020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:34,700-Speed 9340.65 samples/sec Loss 4.4093 LearningRate 0.0063 Epoch: 14 Global Step: 250030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:35,791-Speed 9389.31 samples/sec Loss 4.4830 LearningRate 0.0063 Epoch: 14 Global Step: 250040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:36,879-Speed 9423.16 samples/sec Loss 4.4056 LearningRate 0.0063 Epoch: 14 Global Step: 250050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:38:37,951-Speed 9560.92 samples/sec Loss 4.3609 LearningRate 0.0063 Epoch: 14 Global Step: 250060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:38:39,037-Speed 9532.92 samples/sec Loss 4.4189 LearningRate 0.0063 Epoch: 14 Global Step: 250070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:38:40,157-Speed 9151.48 samples/sec Loss 4.4376 LearningRate 0.0063 Epoch: 14 Global Step: 250080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:38:41,240-Speed 9460.10 samples/sec Loss 4.4340 LearningRate 0.0063 Epoch: 14 Global Step: 250090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:38:42,307-Speed 9601.20 samples/sec Loss 4.4443 LearningRate 0.0063 Epoch: 14 Global Step: 250100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:38:43,393-Speed 9431.83 samples/sec Loss 4.4459 LearningRate 0.0063 Epoch: 14 Global Step: 250110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:38:44,496-Speed 9297.45 samples/sec Loss 4.4210 LearningRate 0.0063 Epoch: 14 Global Step: 250120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:38:45,573-Speed 9506.75 samples/sec Loss 4.4598 LearningRate 0.0063 Epoch: 14 Global Step: 250130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:38:46,666-Speed 9376.79 samples/sec Loss 4.5550 LearningRate 0.0063 Epoch: 14 Global Step: 250140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:38:47,749-Speed 9463.64 samples/sec Loss 4.5029 LearningRate 0.0063 Epoch: 14 Global Step: 250150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:48,876-Speed 9093.33 samples/sec Loss 4.4675 LearningRate 0.0063 Epoch: 14 Global Step: 250160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:49,956-Speed 9482.25 samples/sec Loss 4.3748 LearningRate 0.0063 Epoch: 14 Global Step: 250170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:51,020-Speed 9629.54 samples/sec Loss 4.3865 LearningRate 0.0063 Epoch: 14 Global Step: 250180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:52,087-Speed 9602.34 samples/sec Loss 4.4796 LearningRate 0.0063 Epoch: 14 Global Step: 250190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:53,167-Speed 9488.76 samples/sec Loss 4.4347 LearningRate 0.0063 Epoch: 14 Global Step: 250200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:54,269-Speed 9294.09 samples/sec Loss 4.3875 LearningRate 0.0063 Epoch: 14 Global Step: 250210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:55,364-Speed 9357.93 samples/sec Loss 4.4620 LearningRate 0.0063 Epoch: 14 Global Step: 250220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:56,424-Speed 9669.28 samples/sec Loss 4.5197 LearningRate 0.0063 Epoch: 14 Global Step: 250230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:57,521-Speed 9339.18 samples/sec Loss 4.3506 LearningRate 0.0063 Epoch: 14 Global Step: 250240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:38:58,621-Speed 9316.91 samples/sec Loss 4.3607 LearningRate 0.0063 Epoch: 14 Global Step: 250250 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-04-11 21:38:59,730-Speed 9236.59 samples/sec Loss 4.4107 LearningRate 0.0063 Epoch: 14 Global Step: 250260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:00,868-Speed 9001.52 samples/sec Loss 4.4246 LearningRate 0.0063 Epoch: 14 Global Step: 250270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:02,028-Speed 8832.77 samples/sec Loss 4.3925 LearningRate 0.0063 Epoch: 14 Global Step: 250280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:03,123-Speed 9354.97 samples/sec Loss 4.4575 LearningRate 0.0063 Epoch: 14 Global Step: 250290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:04,199-Speed 9527.17 samples/sec Loss 4.4048 LearningRate 0.0063 Epoch: 14 Global Step: 250300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:05,257-Speed 9683.42 samples/sec Loss 4.4294 LearningRate 0.0063 Epoch: 14 Global Step: 250310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:06,336-Speed 9500.49 samples/sec Loss 4.3876 LearningRate 0.0063 Epoch: 14 Global Step: 250320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:07,409-Speed 9563.68 samples/sec Loss 4.4252 LearningRate 0.0063 Epoch: 14 Global Step: 250330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:08,495-Speed 9438.74 samples/sec Loss 4.3780 LearningRate 0.0063 Epoch: 14 Global Step: 250340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:09,649-Speed 8875.96 samples/sec Loss 4.5819 LearningRate 0.0063 Epoch: 14 Global Step: 250350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:10,922-Speed 8047.76 samples/sec Loss 4.3836 LearningRate 0.0063 Epoch: 14 Global Step: 250360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:51,369-Speed 253.18 samples/sec Loss 4.1669 LearningRate 0.0062 Epoch: 15 Global Step: 250370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:52,765-Speed 7342.38 samples/sec Loss 3.8133 LearningRate 0.0062 Epoch: 15 Global Step: 250380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:54,341-Speed 6503.48 samples/sec Loss 3.7623 LearningRate 0.0062 Epoch: 15 Global Step: 250390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:55,418-Speed 9505.32 samples/sec Loss 3.8757 LearningRate 0.0062 Epoch: 15 Global Step: 250400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:56,735-Speed 7781.74 samples/sec Loss 3.8940 LearningRate 0.0062 Epoch: 15 Global Step: 250410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:57,911-Speed 8716.76 samples/sec Loss 3.8622 LearningRate 0.0062 Epoch: 15 Global Step: 250420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:39:59,234-Speed 7745.97 samples/sec Loss 3.8714 LearningRate 0.0062 Epoch: 15 Global Step: 250430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:00,301-Speed 9599.45 samples/sec Loss 3.8332 LearningRate 0.0062 Epoch: 15 Global Step: 250440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:01,358-Speed 9698.08 samples/sec Loss 3.7128 LearningRate 0.0062 Epoch: 15 Global Step: 250450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:02,411-Speed 9729.62 samples/sec Loss 3.9174 LearningRate 0.0062 Epoch: 15 Global Step: 250460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:03,488-Speed 9514.05 samples/sec Loss 3.8739 LearningRate 0.0062 Epoch: 15 Global Step: 250470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:04,711-Speed 8373.71 samples/sec Loss 3.8378 LearningRate 0.0062 Epoch: 15 Global Step: 250480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:05,817-Speed 9264.98 samples/sec Loss 3.8023 LearningRate 0.0062 Epoch: 15 Global Step: 250490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:06,915-Speed 9332.33 samples/sec Loss 3.8052 LearningRate 0.0062 Epoch: 15 Global Step: 250500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:08,032-Speed 9171.95 samples/sec Loss 3.8868 LearningRate 0.0062 Epoch: 15 Global Step: 250510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:09,146-Speed 9195.19 samples/sec Loss 3.7932 LearningRate 0.0062 Epoch: 15 Global Step: 250520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:10,260-Speed 9196.19 samples/sec Loss 3.8325 LearningRate 0.0062 Epoch: 15 Global Step: 250530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:11,866-Speed 6378.59 samples/sec Loss 3.8100 LearningRate 0.0062 Epoch: 15 Global Step: 250540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:12,996-Speed 9074.45 samples/sec Loss 3.8991 LearningRate 0.0062 Epoch: 15 Global Step: 250550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:14,109-Speed 9199.39 samples/sec Loss 3.7682 LearningRate 0.0062 Epoch: 15 Global Step: 250560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:15,406-Speed 7899.97 samples/sec Loss 3.8060 LearningRate 0.0062 Epoch: 15 Global Step: 250570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:16,867-Speed 7012.24 samples/sec Loss 3.8766 LearningRate 0.0062 Epoch: 15 Global Step: 250580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:17,933-Speed 9616.81 samples/sec Loss 3.8671 LearningRate 0.0062 Epoch: 15 Global Step: 250590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:19,253-Speed 7757.56 samples/sec Loss 3.9153 LearningRate 0.0062 Epoch: 15 Global Step: 250600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:20,363-Speed 9230.53 samples/sec Loss 3.8881 LearningRate 0.0062 Epoch: 15 Global Step: 250610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:21,714-Speed 7585.15 samples/sec Loss 3.8672 LearningRate 0.0062 Epoch: 15 Global Step: 250620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:22,850-Speed 9017.93 samples/sec Loss 3.9059 LearningRate 0.0062 Epoch: 15 Global Step: 250630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:23,937-Speed 9422.38 samples/sec Loss 3.8423 LearningRate 0.0062 Epoch: 15 Global Step: 250640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:25,038-Speed 9307.37 samples/sec Loss 3.9063 LearningRate 0.0062 Epoch: 15 Global Step: 250650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:26,397-Speed 7544.28 samples/sec Loss 3.8625 LearningRate 0.0062 Epoch: 15 Global Step: 250660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:27,486-Speed 9404.78 samples/sec Loss 3.8226 LearningRate 0.0062 Epoch: 15 Global Step: 250670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:28,557-Speed 9568.50 samples/sec Loss 3.8300 LearningRate 0.0062 Epoch: 15 Global Step: 250680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:29,658-Speed 9304.25 samples/sec Loss 3.9295 LearningRate 0.0062 Epoch: 15 Global Step: 250690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:30,755-Speed 9340.77 samples/sec Loss 3.8676 LearningRate 0.0062 Epoch: 15 Global Step: 250700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:31,869-Speed 9198.70 samples/sec Loss 3.8102 LearningRate 0.0062 Epoch: 15 Global Step: 250710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:32,964-Speed 9356.03 samples/sec Loss 3.8540 LearningRate 0.0062 Epoch: 15 Global Step: 250720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:34,062-Speed 9333.38 samples/sec Loss 3.8438 LearningRate 0.0062 Epoch: 15 Global Step: 250730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:35,173-Speed 9230.32 samples/sec Loss 3.9134 LearningRate 0.0062 Epoch: 15 Global Step: 250740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:36,319-Speed 8933.18 samples/sec Loss 3.8988 LearningRate 0.0062 Epoch: 15 Global Step: 250750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:37,423-Speed 9285.34 samples/sec Loss 3.8336 LearningRate 0.0062 Epoch: 15 Global Step: 250760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:38,475-Speed 9736.57 samples/sec Loss 3.8254 LearningRate 0.0062 Epoch: 15 Global Step: 250770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:39,504-Speed 9955.54 samples/sec Loss 4.0018 LearningRate 0.0062 Epoch: 15 Global Step: 250780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:40,596-Speed 9387.00 samples/sec Loss 3.9697 LearningRate 0.0062 Epoch: 15 Global Step: 250790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:41,691-Speed 9358.10 samples/sec Loss 3.8183 LearningRate 0.0062 Epoch: 15 Global Step: 250800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:42,792-Speed 9302.44 samples/sec Loss 3.8452 LearningRate 0.0062 Epoch: 15 Global Step: 250810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:43,878-Speed 9436.88 samples/sec Loss 3.9248 LearningRate 0.0062 Epoch: 15 Global Step: 250820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:44,925-Speed 9785.72 samples/sec Loss 3.9222 LearningRate 0.0062 Epoch: 15 Global Step: 250830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:45,996-Speed 9565.27 samples/sec Loss 3.8697 LearningRate 0.0062 Epoch: 15 Global Step: 250840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:47,112-Speed 9179.63 samples/sec Loss 3.7737 LearningRate 0.0062 Epoch: 15 Global Step: 250850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:48,204-Speed 9391.26 samples/sec Loss 3.8630 LearningRate 0.0062 Epoch: 15 Global Step: 250860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:49,367-Speed 8803.79 samples/sec Loss 3.8933 LearningRate 0.0062 Epoch: 15 Global Step: 250870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:40:50,406-Speed 9865.57 samples/sec Loss 3.8990 LearningRate 0.0062 Epoch: 15 Global Step: 250880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:51,476-Speed 9577.37 samples/sec Loss 3.8947 LearningRate 0.0062 Epoch: 15 Global Step: 250890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:52,580-Speed 9283.29 samples/sec Loss 3.8628 LearningRate 0.0062 Epoch: 15 Global Step: 250900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:53,673-Speed 9368.63 samples/sec Loss 3.8765 LearningRate 0.0062 Epoch: 15 Global Step: 250910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:54,744-Speed 9569.66 samples/sec Loss 3.8759 LearningRate 0.0062 Epoch: 15 Global Step: 250920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:55,814-Speed 9577.61 samples/sec Loss 3.9128 LearningRate 0.0062 Epoch: 15 Global Step: 250930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:56,917-Speed 9294.64 samples/sec Loss 3.9926 LearningRate 0.0062 Epoch: 15 Global Step: 250940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:57,993-Speed 9522.52 samples/sec Loss 3.8868 LearningRate 0.0062 Epoch: 15 Global Step: 250950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:40:59,098-Speed 9267.69 samples/sec Loss 3.8674 LearningRate 0.0062 Epoch: 15 Global Step: 250960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:00,175-Speed 9514.63 samples/sec Loss 3.8054 LearningRate 0.0062 Epoch: 15 Global Step: 250970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:01,249-Speed 9538.90 samples/sec Loss 3.9095 LearningRate 0.0062 Epoch: 15 Global Step: 250980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:02,315-Speed 9614.42 samples/sec Loss 3.8964 LearningRate 0.0062 Epoch: 15 Global Step: 250990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:03,403-Speed 9414.89 samples/sec Loss 3.8031 LearningRate 0.0062 Epoch: 15 Global Step: 251000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:04,465-Speed 9651.78 samples/sec Loss 3.8981 LearningRate 0.0062 Epoch: 15 Global Step: 251010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:05,573-Speed 9248.80 samples/sec Loss 3.9565 LearningRate 0.0062 Epoch: 15 Global Step: 251020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:06,676-Speed 9282.37 samples/sec Loss 3.9303 LearningRate 0.0062 Epoch: 15 Global Step: 251030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:07,787-Speed 9227.52 samples/sec Loss 3.7801 LearningRate 0.0061 Epoch: 15 Global Step: 251040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:08,827-Speed 9844.09 samples/sec Loss 3.8990 LearningRate 0.0061 Epoch: 15 Global Step: 251050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:09,898-Speed 9568.26 samples/sec Loss 3.8667 LearningRate 0.0061 Epoch: 15 Global Step: 251060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:10,977-Speed 9499.13 samples/sec Loss 3.8695 LearningRate 0.0061 Epoch: 15 Global Step: 251070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:12,057-Speed 9488.65 samples/sec Loss 3.8887 LearningRate 0.0061 Epoch: 15 Global Step: 251080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:13,121-Speed 9633.25 samples/sec Loss 3.8632 LearningRate 0.0061 Epoch: 15 Global Step: 251090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:14,199-Speed 9507.35 samples/sec Loss 4.0023 LearningRate 0.0061 Epoch: 15 Global Step: 251100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:15,308-Speed 9237.70 samples/sec Loss 3.8567 LearningRate 0.0061 Epoch: 15 Global Step: 251110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:16,413-Speed 9271.33 samples/sec Loss 3.9375 LearningRate 0.0061 Epoch: 15 Global Step: 251120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:17,695-Speed 7991.77 samples/sec Loss 3.9138 LearningRate 0.0061 Epoch: 15 Global Step: 251130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:18,791-Speed 9352.83 samples/sec Loss 3.8975 LearningRate 0.0061 Epoch: 15 Global Step: 251140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:19,913-Speed 9131.88 samples/sec Loss 3.8661 LearningRate 0.0061 Epoch: 15 Global Step: 251150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:21,028-Speed 9186.81 samples/sec Loss 3.8483 LearningRate 0.0061 Epoch: 15 Global Step: 251160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:22,096-Speed 9594.94 samples/sec Loss 3.9844 LearningRate 0.0061 Epoch: 15 Global Step: 251170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:23,189-Speed 9367.67 samples/sec Loss 3.9495 LearningRate 0.0061 Epoch: 15 Global Step: 251180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:24,292-Speed 9295.22 samples/sec Loss 3.9083 LearningRate 0.0061 Epoch: 15 Global Step: 251190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:25,408-Speed 9173.37 samples/sec Loss 3.8476 LearningRate 0.0061 Epoch: 15 Global Step: 251200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:26,476-Speed 9598.44 samples/sec Loss 3.9473 LearningRate 0.0061 Epoch: 15 Global Step: 251210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:27,522-Speed 9797.05 samples/sec Loss 3.8992 LearningRate 0.0061 Epoch: 15 Global Step: 251220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:28,671-Speed 8917.21 samples/sec Loss 3.8811 LearningRate 0.0061 Epoch: 15 Global Step: 251230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:29,765-Speed 9366.39 samples/sec Loss 3.8771 LearningRate 0.0061 Epoch: 15 Global Step: 251240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:30,842-Speed 9509.44 samples/sec Loss 3.8759 LearningRate 0.0061 Epoch: 15 Global Step: 251250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:31,923-Speed 9484.48 samples/sec Loss 3.9629 LearningRate 0.0061 Epoch: 15 Global Step: 251260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:33,014-Speed 9391.48 samples/sec Loss 3.8998 LearningRate 0.0061 Epoch: 15 Global Step: 251270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:34,086-Speed 9556.10 samples/sec Loss 3.8512 LearningRate 0.0061 Epoch: 15 Global Step: 251280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:35,152-Speed 9609.87 samples/sec Loss 3.8934 LearningRate 0.0061 Epoch: 15 Global Step: 251290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:36,257-Speed 9276.61 samples/sec Loss 3.8477 LearningRate 0.0061 Epoch: 15 Global Step: 251300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:37,335-Speed 9499.11 samples/sec Loss 3.8947 LearningRate 0.0061 Epoch: 15 Global Step: 251310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:38,415-Speed 9489.25 samples/sec Loss 4.0516 LearningRate 0.0061 Epoch: 15 Global Step: 251320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:39,501-Speed 9436.87 samples/sec Loss 3.9461 LearningRate 0.0061 Epoch: 15 Global Step: 251330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:40,564-Speed 9638.95 samples/sec Loss 3.9541 LearningRate 0.0061 Epoch: 15 Global Step: 251340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:41,634-Speed 9568.60 samples/sec Loss 3.8910 LearningRate 0.0061 Epoch: 15 Global Step: 251350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:42,695-Speed 9659.46 samples/sec Loss 3.9678 LearningRate 0.0061 Epoch: 15 Global Step: 251360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:43,757-Speed 9646.08 samples/sec Loss 3.9563 LearningRate 0.0061 Epoch: 15 Global Step: 251370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:44,841-Speed 9453.60 samples/sec Loss 3.8793 LearningRate 0.0061 Epoch: 15 Global Step: 251380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:45,930-Speed 9410.94 samples/sec Loss 3.9301 LearningRate 0.0061 Epoch: 15 Global Step: 251390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:47,039-Speed 9244.05 samples/sec Loss 3.9045 LearningRate 0.0061 Epoch: 15 Global Step: 251400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:48,127-Speed 9411.85 samples/sec Loss 3.9790 LearningRate 0.0061 Epoch: 15 Global Step: 251410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:49,223-Speed 9350.19 samples/sec Loss 3.8799 LearningRate 0.0061 Epoch: 15 Global Step: 251420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:50,321-Speed 9337.10 samples/sec Loss 3.9304 LearningRate 0.0061 Epoch: 15 Global Step: 251430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:51,368-Speed 9788.76 samples/sec Loss 3.9798 LearningRate 0.0061 Epoch: 15 Global Step: 251440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:52,418-Speed 9755.21 samples/sec Loss 3.7706 LearningRate 0.0061 Epoch: 15 Global Step: 251450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:53,487-Speed 9578.46 samples/sec Loss 3.9605 LearningRate 0.0061 Epoch: 15 Global Step: 251460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:41:54,536-Speed 9774.52 samples/sec Loss 4.0331 LearningRate 0.0061 Epoch: 15 Global Step: 251470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:55,648-Speed 9209.46 samples/sec Loss 3.9411 LearningRate 0.0061 Epoch: 15 Global Step: 251480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:56,753-Speed 9273.94 samples/sec Loss 3.8990 LearningRate 0.0061 Epoch: 15 Global Step: 251490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:57,822-Speed 9582.95 samples/sec Loss 3.9756 LearningRate 0.0061 Epoch: 15 Global Step: 251500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:41:58,932-Speed 9231.84 samples/sec Loss 3.8906 LearningRate 0.0061 Epoch: 15 Global Step: 251510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:00,065-Speed 9043.57 samples/sec Loss 3.9717 LearningRate 0.0061 Epoch: 15 Global Step: 251520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:01,176-Speed 9221.42 samples/sec Loss 3.8907 LearningRate 0.0061 Epoch: 15 Global Step: 251530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:02,257-Speed 9479.51 samples/sec Loss 3.9342 LearningRate 0.0061 Epoch: 15 Global Step: 251540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:03,304-Speed 9793.99 samples/sec Loss 3.9635 LearningRate 0.0061 Epoch: 15 Global Step: 251550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:04,374-Speed 9569.99 samples/sec Loss 3.9958 LearningRate 0.0061 Epoch: 15 Global Step: 251560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:05,438-Speed 9627.87 samples/sec Loss 4.0152 LearningRate 0.0061 Epoch: 15 Global Step: 251570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:06,516-Speed 9510.58 samples/sec Loss 3.8695 LearningRate 0.0061 Epoch: 15 Global Step: 251580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:07,582-Speed 9607.44 samples/sec Loss 3.9427 LearningRate 0.0061 Epoch: 15 Global Step: 251590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:08,676-Speed 9369.92 samples/sec Loss 3.9217 LearningRate 0.0061 Epoch: 15 Global Step: 251600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:09,821-Speed 8945.45 samples/sec Loss 4.0150 LearningRate 0.0061 Epoch: 15 Global Step: 251610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:10,905-Speed 9452.38 samples/sec Loss 3.7965 LearningRate 0.0061 Epoch: 15 Global Step: 251620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:11,945-Speed 9859.66 samples/sec Loss 3.9516 LearningRate 0.0061 Epoch: 15 Global Step: 251630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:13,008-Speed 9633.10 samples/sec Loss 3.8813 LearningRate 0.0061 Epoch: 15 Global Step: 251640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:14,077-Speed 9585.35 samples/sec Loss 3.9319 LearningRate 0.0061 Epoch: 15 Global Step: 251650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:15,116-Speed 9861.84 samples/sec Loss 3.9595 LearningRate 0.0061 Epoch: 15 Global Step: 251660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:16,198-Speed 9477.09 samples/sec Loss 3.9349 LearningRate 0.0061 Epoch: 15 Global Step: 251670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:17,295-Speed 9340.34 samples/sec Loss 3.8778 LearningRate 0.0061 Epoch: 15 Global Step: 251680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:18,383-Speed 9417.37 samples/sec Loss 3.8696 LearningRate 0.0061 Epoch: 15 Global Step: 251690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:19,439-Speed 9706.04 samples/sec Loss 3.9822 LearningRate 0.0061 Epoch: 15 Global Step: 251700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:20,547-Speed 9249.93 samples/sec Loss 3.9657 LearningRate 0.0061 Epoch: 15 Global Step: 251710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:21,666-Speed 9152.69 samples/sec Loss 3.9625 LearningRate 0.0060 Epoch: 15 Global Step: 251720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:22,766-Speed 9316.05 samples/sec Loss 4.0009 LearningRate 0.0060 Epoch: 15 Global Step: 251730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:23,865-Speed 9320.39 samples/sec Loss 3.9512 LearningRate 0.0060 Epoch: 15 Global Step: 251740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:24,937-Speed 9559.48 samples/sec Loss 3.9495 LearningRate 0.0060 Epoch: 15 Global Step: 251750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:26,011-Speed 9542.41 samples/sec Loss 3.9296 LearningRate 0.0060 Epoch: 15 Global Step: 251760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:27,113-Speed 9302.03 samples/sec Loss 3.9180 LearningRate 0.0060 Epoch: 15 Global Step: 251770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:28,228-Speed 9184.81 samples/sec Loss 3.9405 LearningRate 0.0060 Epoch: 15 Global Step: 251780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:29,313-Speed 9449.90 samples/sec Loss 3.9194 LearningRate 0.0060 Epoch: 15 Global Step: 251790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:30,410-Speed 9338.07 samples/sec Loss 3.8892 LearningRate 0.0060 Epoch: 15 Global Step: 251800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:31,485-Speed 9534.02 samples/sec Loss 3.9613 LearningRate 0.0060 Epoch: 15 Global Step: 251810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:32,561-Speed 9523.45 samples/sec Loss 3.9012 LearningRate 0.0060 Epoch: 15 Global Step: 251820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:33,639-Speed 9503.75 samples/sec Loss 3.8886 LearningRate 0.0060 Epoch: 15 Global Step: 251830 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-04-11 21:42:34,700-Speed 9661.65 samples/sec Loss 4.0110 LearningRate 0.0060 Epoch: 15 Global Step: 251840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:35,763-Speed 9639.20 samples/sec Loss 3.8862 LearningRate 0.0060 Epoch: 15 Global Step: 251850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:36,863-Speed 9309.96 samples/sec Loss 4.0117 LearningRate 0.0060 Epoch: 15 Global Step: 251860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:37,983-Speed 9149.24 samples/sec Loss 3.8901 LearningRate 0.0060 Epoch: 15 Global Step: 251870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:39,098-Speed 9188.04 samples/sec Loss 3.9247 LearningRate 0.0060 Epoch: 15 Global Step: 251880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:40,176-Speed 9508.03 samples/sec Loss 3.9117 LearningRate 0.0060 Epoch: 15 Global Step: 251890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:41,223-Speed 9779.09 samples/sec Loss 3.9937 LearningRate 0.0060 Epoch: 15 Global Step: 251900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:42,317-Speed 9374.13 samples/sec Loss 3.9467 LearningRate 0.0060 Epoch: 15 Global Step: 251910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:43,419-Speed 9300.02 samples/sec Loss 3.8922 LearningRate 0.0060 Epoch: 15 Global Step: 251920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:44,479-Speed 9661.00 samples/sec Loss 3.9806 LearningRate 0.0060 Epoch: 15 Global Step: 251930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:45,540-Speed 9664.10 samples/sec Loss 4.0174 LearningRate 0.0060 Epoch: 15 Global Step: 251940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:46,605-Speed 9616.56 samples/sec Loss 3.9694 LearningRate 0.0060 Epoch: 15 Global Step: 251950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:47,704-Speed 9328.69 samples/sec Loss 3.9666 LearningRate 0.0060 Epoch: 15 Global Step: 251960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:42:48,775-Speed 9563.87 samples/sec Loss 3.9656 LearningRate 0.0060 Epoch: 15 Global Step: 251970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:49,893-Speed 9165.64 samples/sec Loss 3.9132 LearningRate 0.0060 Epoch: 15 Global Step: 251980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:50,942-Speed 9765.75 samples/sec Loss 3.8673 LearningRate 0.0060 Epoch: 15 Global Step: 251990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:42:52,065-Speed 9119.51 samples/sec Loss 4.0172 LearningRate 0.0060 Epoch: 15 Global Step: 252000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:43:14,054-[lfw][252000]XNorm: 7.405152 Training: 2022-04-11 21:43:14,055-[lfw][252000]Accuracy-Flip: 0.99583+-0.00261 Training: 2022-04-11 21:43:14,055-[lfw][252000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:43:39,449-[cfp_fp][252000]XNorm: 6.373097 Training: 2022-04-11 21:43:39,450-[cfp_fp][252000]Accuracy-Flip: 0.97043+-0.00706 Training: 2022-04-11 21:43:39,450-[cfp_fp][252000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:44:01,317-[agedb_30][252000]XNorm: 7.182551 Training: 2022-04-11 21:44:01,318-[agedb_30][252000]Accuracy-Flip: 0.97050+-0.00823 Training: 2022-04-11 21:44:01,318-[agedb_30][252000]Accuracy-Highest: 0.97350 Training: 2022-04-11 21:44:02,401-Speed 145.59 samples/sec Loss 3.9166 LearningRate 0.0060 Epoch: 15 Global Step: 252010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:03,442-Speed 9841.96 samples/sec Loss 3.9233 LearningRate 0.0060 Epoch: 15 Global Step: 252020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:04,499-Speed 9698.22 samples/sec Loss 3.9180 LearningRate 0.0060 Epoch: 15 Global Step: 252030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:05,572-Speed 9543.24 samples/sec Loss 3.8353 LearningRate 0.0060 Epoch: 15 Global Step: 252040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:06,631-Speed 9670.95 samples/sec Loss 3.9005 LearningRate 0.0060 Epoch: 15 Global Step: 252050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:07,703-Speed 9562.24 samples/sec Loss 3.9602 LearningRate 0.0060 Epoch: 15 Global Step: 252060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:08,778-Speed 9529.77 samples/sec Loss 3.9447 LearningRate 0.0060 Epoch: 15 Global Step: 252070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:09,878-Speed 9318.43 samples/sec Loss 3.9406 LearningRate 0.0060 Epoch: 15 Global Step: 252080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:10,947-Speed 9586.69 samples/sec Loss 4.0464 LearningRate 0.0060 Epoch: 15 Global Step: 252090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:12,012-Speed 9618.45 samples/sec Loss 3.9437 LearningRate 0.0060 Epoch: 15 Global Step: 252100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:13,084-Speed 9562.62 samples/sec Loss 3.9924 LearningRate 0.0060 Epoch: 15 Global Step: 252110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:14,141-Speed 9690.31 samples/sec Loss 4.0180 LearningRate 0.0060 Epoch: 15 Global Step: 252120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:15,230-Speed 9409.10 samples/sec Loss 3.9511 LearningRate 0.0060 Epoch: 15 Global Step: 252130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:16,322-Speed 9385.26 samples/sec Loss 3.9452 LearningRate 0.0060 Epoch: 15 Global Step: 252140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:17,396-Speed 9536.21 samples/sec Loss 3.9027 LearningRate 0.0060 Epoch: 15 Global Step: 252150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:18,532-Speed 9020.96 samples/sec Loss 3.9219 LearningRate 0.0060 Epoch: 15 Global Step: 252160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:19,636-Speed 9279.70 samples/sec Loss 4.0150 LearningRate 0.0060 Epoch: 15 Global Step: 252170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:20,716-Speed 9486.43 samples/sec Loss 3.9406 LearningRate 0.0060 Epoch: 15 Global Step: 252180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:21,848-Speed 9055.52 samples/sec Loss 3.8659 LearningRate 0.0060 Epoch: 15 Global Step: 252190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:23,174-Speed 7724.42 samples/sec Loss 3.9285 LearningRate 0.0060 Epoch: 15 Global Step: 252200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:24,327-Speed 8886.34 samples/sec Loss 3.9242 LearningRate 0.0060 Epoch: 15 Global Step: 252210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:25,434-Speed 9258.04 samples/sec Loss 3.9176 LearningRate 0.0060 Epoch: 15 Global Step: 252220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:26,565-Speed 9060.25 samples/sec Loss 4.0158 LearningRate 0.0060 Epoch: 15 Global Step: 252230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:27,629-Speed 9626.79 samples/sec Loss 3.9774 LearningRate 0.0060 Epoch: 15 Global Step: 252240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:28,748-Speed 9157.11 samples/sec Loss 3.9704 LearningRate 0.0060 Epoch: 15 Global Step: 252250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:29,879-Speed 9064.15 samples/sec Loss 3.9325 LearningRate 0.0060 Epoch: 15 Global Step: 252260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:30,991-Speed 9211.17 samples/sec Loss 3.9287 LearningRate 0.0060 Epoch: 15 Global Step: 252270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:32,071-Speed 9488.25 samples/sec Loss 4.0526 LearningRate 0.0060 Epoch: 15 Global Step: 252280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:33,158-Speed 9434.20 samples/sec Loss 3.9120 LearningRate 0.0060 Epoch: 15 Global Step: 252290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:34,237-Speed 9489.45 samples/sec Loss 3.9114 LearningRate 0.0060 Epoch: 15 Global Step: 252300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:35,348-Speed 9222.35 samples/sec Loss 3.9401 LearningRate 0.0060 Epoch: 15 Global Step: 252310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:36,413-Speed 9622.72 samples/sec Loss 4.0149 LearningRate 0.0060 Epoch: 15 Global Step: 252320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:37,494-Speed 9481.56 samples/sec Loss 3.9715 LearningRate 0.0060 Epoch: 15 Global Step: 252330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:38,557-Speed 9635.02 samples/sec Loss 3.9754 LearningRate 0.0060 Epoch: 15 Global Step: 252340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:39,652-Speed 9358.49 samples/sec Loss 3.9724 LearningRate 0.0060 Epoch: 15 Global Step: 252350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:40,718-Speed 9607.37 samples/sec Loss 3.9141 LearningRate 0.0060 Epoch: 15 Global Step: 252360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:41,759-Speed 9842.40 samples/sec Loss 4.0262 LearningRate 0.0060 Epoch: 15 Global Step: 252370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:42,831-Speed 9558.95 samples/sec Loss 3.8561 LearningRate 0.0060 Epoch: 15 Global Step: 252380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:43,900-Speed 9585.46 samples/sec Loss 3.8763 LearningRate 0.0060 Epoch: 15 Global Step: 252390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:45,068-Speed 8767.50 samples/sec Loss 3.9969 LearningRate 0.0059 Epoch: 15 Global Step: 252400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:46,202-Speed 9041.53 samples/sec Loss 3.8975 LearningRate 0.0059 Epoch: 15 Global Step: 252410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:47,304-Speed 9301.39 samples/sec Loss 3.8930 LearningRate 0.0059 Epoch: 15 Global Step: 252420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:48,387-Speed 9456.97 samples/sec Loss 4.0688 LearningRate 0.0059 Epoch: 15 Global Step: 252430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:49,490-Speed 9294.78 samples/sec Loss 4.0711 LearningRate 0.0059 Epoch: 15 Global Step: 252440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:50,548-Speed 9685.08 samples/sec Loss 3.9199 LearningRate 0.0059 Epoch: 15 Global Step: 252450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:51,577-Speed 9956.69 samples/sec Loss 3.9661 LearningRate 0.0059 Epoch: 15 Global Step: 252460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:52,698-Speed 9137.97 samples/sec Loss 3.9095 LearningRate 0.0059 Epoch: 15 Global Step: 252470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:53,787-Speed 9408.14 samples/sec Loss 3.9033 LearningRate 0.0059 Epoch: 15 Global Step: 252480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:54,874-Speed 9424.63 samples/sec Loss 3.9900 LearningRate 0.0059 Epoch: 15 Global Step: 252490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:55,967-Speed 9378.35 samples/sec Loss 4.0315 LearningRate 0.0059 Epoch: 15 Global Step: 252500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:57,079-Speed 9214.32 samples/sec Loss 4.0365 LearningRate 0.0059 Epoch: 15 Global Step: 252510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:44:58,163-Speed 9449.30 samples/sec Loss 3.9354 LearningRate 0.0059 Epoch: 15 Global Step: 252520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:44:59,235-Speed 9560.08 samples/sec Loss 3.9650 LearningRate 0.0059 Epoch: 15 Global Step: 252530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:00,321-Speed 9429.70 samples/sec Loss 4.0360 LearningRate 0.0059 Epoch: 15 Global Step: 252540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:01,429-Speed 9245.94 samples/sec Loss 3.9604 LearningRate 0.0059 Epoch: 15 Global Step: 252550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:02,529-Speed 9319.62 samples/sec Loss 4.0170 LearningRate 0.0059 Epoch: 15 Global Step: 252560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:03,605-Speed 9520.45 samples/sec Loss 3.9403 LearningRate 0.0059 Epoch: 15 Global Step: 252570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:04,684-Speed 9494.11 samples/sec Loss 3.9369 LearningRate 0.0059 Epoch: 15 Global Step: 252580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:05,750-Speed 9613.74 samples/sec Loss 3.9473 LearningRate 0.0059 Epoch: 15 Global Step: 252590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:06,810-Speed 9669.30 samples/sec Loss 3.9748 LearningRate 0.0059 Epoch: 15 Global Step: 252600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:07,906-Speed 9348.60 samples/sec Loss 3.9300 LearningRate 0.0059 Epoch: 15 Global Step: 252610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:08,943-Speed 9882.85 samples/sec Loss 3.9576 LearningRate 0.0059 Epoch: 15 Global Step: 252620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:10,033-Speed 9399.18 samples/sec Loss 3.9101 LearningRate 0.0059 Epoch: 15 Global Step: 252630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:11,092-Speed 9669.97 samples/sec Loss 4.0240 LearningRate 0.0059 Epoch: 15 Global Step: 252640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:12,178-Speed 9437.99 samples/sec Loss 4.1584 LearningRate 0.0059 Epoch: 15 Global Step: 252650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:13,273-Speed 9352.68 samples/sec Loss 4.0030 LearningRate 0.0059 Epoch: 15 Global Step: 252660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:14,337-Speed 9634.97 samples/sec Loss 4.0307 LearningRate 0.0059 Epoch: 15 Global Step: 252670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:15,469-Speed 9053.58 samples/sec Loss 3.9777 LearningRate 0.0059 Epoch: 15 Global Step: 252680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:16,568-Speed 9319.81 samples/sec Loss 4.0219 LearningRate 0.0059 Epoch: 15 Global Step: 252690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:17,649-Speed 9479.83 samples/sec Loss 3.9661 LearningRate 0.0059 Epoch: 15 Global Step: 252700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:18,718-Speed 9584.55 samples/sec Loss 4.0529 LearningRate 0.0059 Epoch: 15 Global Step: 252710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:19,783-Speed 9620.04 samples/sec Loss 4.0512 LearningRate 0.0059 Epoch: 15 Global Step: 252720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:20,887-Speed 9280.61 samples/sec Loss 3.9319 LearningRate 0.0059 Epoch: 15 Global Step: 252730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:21,961-Speed 9539.46 samples/sec Loss 4.0060 LearningRate 0.0059 Epoch: 15 Global Step: 252740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:23,085-Speed 9117.27 samples/sec Loss 3.9377 LearningRate 0.0059 Epoch: 15 Global Step: 252750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:24,192-Speed 9249.69 samples/sec Loss 3.9905 LearningRate 0.0059 Epoch: 15 Global Step: 252760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:25,308-Speed 9187.86 samples/sec Loss 4.0457 LearningRate 0.0059 Epoch: 15 Global Step: 252770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:26,413-Speed 9277.95 samples/sec Loss 4.0390 LearningRate 0.0059 Epoch: 15 Global Step: 252780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:27,489-Speed 9519.75 samples/sec Loss 4.0180 LearningRate 0.0059 Epoch: 15 Global Step: 252790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:28,609-Speed 9147.12 samples/sec Loss 4.0034 LearningRate 0.0059 Epoch: 15 Global Step: 252800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:29,696-Speed 9426.11 samples/sec Loss 3.9426 LearningRate 0.0059 Epoch: 15 Global Step: 252810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:30,790-Speed 9359.66 samples/sec Loss 3.8598 LearningRate 0.0059 Epoch: 15 Global Step: 252820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:31,906-Speed 9184.89 samples/sec Loss 4.1346 LearningRate 0.0059 Epoch: 15 Global Step: 252830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:33,018-Speed 9212.35 samples/sec Loss 4.0150 LearningRate 0.0059 Epoch: 15 Global Step: 252840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:34,093-Speed 9534.11 samples/sec Loss 3.9678 LearningRate 0.0059 Epoch: 15 Global Step: 252850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:35,168-Speed 9534.27 samples/sec Loss 3.9953 LearningRate 0.0059 Epoch: 15 Global Step: 252860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:36,222-Speed 9720.50 samples/sec Loss 3.9988 LearningRate 0.0059 Epoch: 15 Global Step: 252870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:37,282-Speed 9661.24 samples/sec Loss 4.1124 LearningRate 0.0059 Epoch: 15 Global Step: 252880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:38,377-Speed 9361.01 samples/sec Loss 3.9967 LearningRate 0.0059 Epoch: 15 Global Step: 252890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:39,459-Speed 9465.01 samples/sec Loss 3.9989 LearningRate 0.0059 Epoch: 15 Global Step: 252900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:40,568-Speed 9245.53 samples/sec Loss 4.0217 LearningRate 0.0059 Epoch: 15 Global Step: 252910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:41,660-Speed 9374.89 samples/sec Loss 4.0654 LearningRate 0.0059 Epoch: 15 Global Step: 252920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:42,743-Speed 9462.81 samples/sec Loss 4.0323 LearningRate 0.0059 Epoch: 15 Global Step: 252930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:43,820-Speed 9518.14 samples/sec Loss 3.9808 LearningRate 0.0059 Epoch: 15 Global Step: 252940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:44,904-Speed 9455.97 samples/sec Loss 3.9089 LearningRate 0.0059 Epoch: 15 Global Step: 252950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:45,994-Speed 9399.50 samples/sec Loss 4.0635 LearningRate 0.0059 Epoch: 15 Global Step: 252960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:47,041-Speed 9782.39 samples/sec Loss 3.9888 LearningRate 0.0059 Epoch: 15 Global Step: 252970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:48,117-Speed 9519.81 samples/sec Loss 3.9788 LearningRate 0.0059 Epoch: 15 Global Step: 252980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:49,204-Speed 9424.52 samples/sec Loss 4.0412 LearningRate 0.0059 Epoch: 15 Global Step: 252990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:50,262-Speed 9688.67 samples/sec Loss 3.9576 LearningRate 0.0059 Epoch: 15 Global Step: 253000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:51,356-Speed 9370.27 samples/sec Loss 4.0545 LearningRate 0.0059 Epoch: 15 Global Step: 253010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:52,450-Speed 9358.51 samples/sec Loss 3.9835 LearningRate 0.0059 Epoch: 15 Global Step: 253020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:53,545-Speed 9359.85 samples/sec Loss 3.9759 LearningRate 0.0059 Epoch: 15 Global Step: 253030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:54,662-Speed 9176.39 samples/sec Loss 4.0260 LearningRate 0.0059 Epoch: 15 Global Step: 253040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:55,786-Speed 9109.75 samples/sec Loss 4.0583 LearningRate 0.0059 Epoch: 15 Global Step: 253050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:45:56,888-Speed 9296.60 samples/sec Loss 3.8905 LearningRate 0.0059 Epoch: 15 Global Step: 253060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:57,990-Speed 9299.12 samples/sec Loss 4.0004 LearningRate 0.0059 Epoch: 15 Global Step: 253070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:45:59,058-Speed 9592.59 samples/sec Loss 4.0605 LearningRate 0.0058 Epoch: 15 Global Step: 253080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:00,177-Speed 9162.04 samples/sec Loss 4.0022 LearningRate 0.0058 Epoch: 15 Global Step: 253090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:01,330-Speed 8882.57 samples/sec Loss 3.9245 LearningRate 0.0058 Epoch: 15 Global Step: 253100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:02,433-Speed 9296.44 samples/sec Loss 4.0093 LearningRate 0.0058 Epoch: 15 Global Step: 253110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:03,527-Speed 9365.99 samples/sec Loss 3.9875 LearningRate 0.0058 Epoch: 15 Global Step: 253120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:04,610-Speed 9455.80 samples/sec Loss 3.9623 LearningRate 0.0058 Epoch: 15 Global Step: 253130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:05,679-Speed 9587.06 samples/sec Loss 4.0540 LearningRate 0.0058 Epoch: 15 Global Step: 253140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:06,759-Speed 9487.50 samples/sec Loss 3.9629 LearningRate 0.0058 Epoch: 15 Global Step: 253150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:07,823-Speed 9634.86 samples/sec Loss 3.9522 LearningRate 0.0058 Epoch: 15 Global Step: 253160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:08,888-Speed 9623.05 samples/sec Loss 3.9207 LearningRate 0.0058 Epoch: 15 Global Step: 253170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:09,948-Speed 9665.36 samples/sec Loss 4.0675 LearningRate 0.0058 Epoch: 15 Global Step: 253180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:11,022-Speed 9540.89 samples/sec Loss 4.0415 LearningRate 0.0058 Epoch: 15 Global Step: 253190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:12,087-Speed 9612.82 samples/sec Loss 3.9971 LearningRate 0.0058 Epoch: 15 Global Step: 253200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:13,175-Speed 9423.22 samples/sec Loss 3.9770 LearningRate 0.0058 Epoch: 15 Global Step: 253210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:14,236-Speed 9652.66 samples/sec Loss 4.0101 LearningRate 0.0058 Epoch: 15 Global Step: 253220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:15,321-Speed 9446.51 samples/sec Loss 4.1014 LearningRate 0.0058 Epoch: 15 Global Step: 253230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:16,402-Speed 9473.17 samples/sec Loss 3.9737 LearningRate 0.0058 Epoch: 15 Global Step: 253240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:17,514-Speed 9213.76 samples/sec Loss 4.0012 LearningRate 0.0058 Epoch: 15 Global Step: 253250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:18,590-Speed 9530.65 samples/sec Loss 4.0672 LearningRate 0.0058 Epoch: 15 Global Step: 253260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:19,680-Speed 9402.99 samples/sec Loss 4.0504 LearningRate 0.0058 Epoch: 15 Global Step: 253270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:20,752-Speed 9553.64 samples/sec Loss 4.0348 LearningRate 0.0058 Epoch: 15 Global Step: 253280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:21,903-Speed 8909.21 samples/sec Loss 4.1066 LearningRate 0.0058 Epoch: 15 Global Step: 253290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:22,947-Speed 9806.54 samples/sec Loss 4.0426 LearningRate 0.0058 Epoch: 15 Global Step: 253300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:24,032-Speed 9445.65 samples/sec Loss 4.0111 LearningRate 0.0058 Epoch: 15 Global Step: 253310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:25,071-Speed 9866.28 samples/sec Loss 4.0503 LearningRate 0.0058 Epoch: 15 Global Step: 253320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:26,154-Speed 9454.84 samples/sec Loss 4.0375 LearningRate 0.0058 Epoch: 15 Global Step: 253330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:27,250-Speed 9351.38 samples/sec Loss 3.9773 LearningRate 0.0058 Epoch: 15 Global Step: 253340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:28,324-Speed 9541.64 samples/sec Loss 3.9855 LearningRate 0.0058 Epoch: 15 Global Step: 253350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:29,383-Speed 9671.78 samples/sec Loss 4.0406 LearningRate 0.0058 Epoch: 15 Global Step: 253360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:30,459-Speed 9521.33 samples/sec Loss 4.0325 LearningRate 0.0058 Epoch: 15 Global Step: 253370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:31,526-Speed 9609.25 samples/sec Loss 4.0651 LearningRate 0.0058 Epoch: 15 Global Step: 253380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:32,585-Speed 9674.25 samples/sec Loss 3.9687 LearningRate 0.0058 Epoch: 15 Global Step: 253390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:33,665-Speed 9484.33 samples/sec Loss 3.9778 LearningRate 0.0058 Epoch: 15 Global Step: 253400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:34,744-Speed 9492.00 samples/sec Loss 3.9847 LearningRate 0.0058 Epoch: 15 Global Step: 253410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:35,832-Speed 9421.34 samples/sec Loss 3.9146 LearningRate 0.0058 Epoch: 15 Global Step: 253420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:36,946-Speed 9196.02 samples/sec Loss 3.9923 LearningRate 0.0058 Epoch: 15 Global Step: 253430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:38,033-Speed 9429.06 samples/sec Loss 3.9806 LearningRate 0.0058 Epoch: 15 Global Step: 253440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:39,165-Speed 9054.88 samples/sec Loss 4.0181 LearningRate 0.0058 Epoch: 15 Global Step: 253450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:40,240-Speed 9527.87 samples/sec Loss 3.9311 LearningRate 0.0058 Epoch: 15 Global Step: 253460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:41,320-Speed 9482.54 samples/sec Loss 3.9764 LearningRate 0.0058 Epoch: 15 Global Step: 253470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:42,380-Speed 9670.91 samples/sec Loss 3.9573 LearningRate 0.0058 Epoch: 15 Global Step: 253480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:43,440-Speed 9670.02 samples/sec Loss 4.0689 LearningRate 0.0058 Epoch: 15 Global Step: 253490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:44,480-Speed 9848.15 samples/sec Loss 4.0852 LearningRate 0.0058 Epoch: 15 Global Step: 253500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:45,565-Speed 9439.40 samples/sec Loss 3.9906 LearningRate 0.0058 Epoch: 15 Global Step: 253510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:46,637-Speed 9561.83 samples/sec Loss 4.0474 LearningRate 0.0058 Epoch: 15 Global Step: 253520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:47,737-Speed 9318.32 samples/sec Loss 4.0220 LearningRate 0.0058 Epoch: 15 Global Step: 253530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:48,845-Speed 9243.13 samples/sec Loss 3.9805 LearningRate 0.0058 Epoch: 15 Global Step: 253540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:49,897-Speed 9741.14 samples/sec Loss 4.0684 LearningRate 0.0058 Epoch: 15 Global Step: 253550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:50,967-Speed 9577.92 samples/sec Loss 3.9560 LearningRate 0.0058 Epoch: 15 Global Step: 253560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:52,061-Speed 9367.23 samples/sec Loss 3.9983 LearningRate 0.0058 Epoch: 15 Global Step: 253570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:53,200-Speed 8990.01 samples/sec Loss 4.0273 LearningRate 0.0058 Epoch: 15 Global Step: 253580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:54,272-Speed 9556.44 samples/sec Loss 4.1266 LearningRate 0.0058 Epoch: 15 Global Step: 253590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:55,350-Speed 9506.62 samples/sec Loss 4.0938 LearningRate 0.0058 Epoch: 15 Global Step: 253600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:56,426-Speed 9525.14 samples/sec Loss 4.1231 LearningRate 0.0058 Epoch: 15 Global Step: 253610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:57,488-Speed 9652.09 samples/sec Loss 4.0369 LearningRate 0.0058 Epoch: 15 Global Step: 253620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:46:58,559-Speed 9564.64 samples/sec Loss 3.9875 LearningRate 0.0058 Epoch: 15 Global Step: 253630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:46:59,664-Speed 9270.97 samples/sec Loss 4.0810 LearningRate 0.0058 Epoch: 15 Global Step: 253640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:00,740-Speed 9523.40 samples/sec Loss 3.9775 LearningRate 0.0058 Epoch: 15 Global Step: 253650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:01,807-Speed 9611.54 samples/sec Loss 4.0556 LearningRate 0.0058 Epoch: 15 Global Step: 253660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:02,908-Speed 9304.45 samples/sec Loss 4.1119 LearningRate 0.0058 Epoch: 15 Global Step: 253670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:04,008-Speed 9315.23 samples/sec Loss 4.0964 LearningRate 0.0058 Epoch: 15 Global Step: 253680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:05,099-Speed 9388.47 samples/sec Loss 4.0098 LearningRate 0.0058 Epoch: 15 Global Step: 253690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:06,165-Speed 9612.33 samples/sec Loss 4.0156 LearningRate 0.0058 Epoch: 15 Global Step: 253700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:07,267-Speed 9291.30 samples/sec Loss 4.0229 LearningRate 0.0058 Epoch: 15 Global Step: 253710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:08,384-Speed 9179.43 samples/sec Loss 4.0446 LearningRate 0.0058 Epoch: 15 Global Step: 253720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:09,439-Speed 9708.26 samples/sec Loss 4.0570 LearningRate 0.0058 Epoch: 15 Global Step: 253730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:47:10,542-Speed 9289.03 samples/sec Loss 4.0999 LearningRate 0.0058 Epoch: 15 Global Step: 253740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:47:11,605-Speed 9644.99 samples/sec Loss 4.0957 LearningRate 0.0058 Epoch: 15 Global Step: 253750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:12,682-Speed 9507.09 samples/sec Loss 3.9709 LearningRate 0.0058 Epoch: 15 Global Step: 253760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:13,730-Speed 9780.13 samples/sec Loss 3.9800 LearningRate 0.0058 Epoch: 15 Global Step: 253770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:14,803-Speed 9550.55 samples/sec Loss 3.9447 LearningRate 0.0057 Epoch: 15 Global Step: 253780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:15,906-Speed 9284.73 samples/sec Loss 4.0018 LearningRate 0.0057 Epoch: 15 Global Step: 253790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:17,021-Speed 9189.16 samples/sec Loss 4.0394 LearningRate 0.0057 Epoch: 15 Global Step: 253800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:18,142-Speed 9145.25 samples/sec Loss 4.0005 LearningRate 0.0057 Epoch: 15 Global Step: 253810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:19,229-Speed 9428.78 samples/sec Loss 4.0013 LearningRate 0.0057 Epoch: 15 Global Step: 253820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:20,334-Speed 9270.16 samples/sec Loss 4.0077 LearningRate 0.0057 Epoch: 15 Global Step: 253830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:21,439-Speed 9275.82 samples/sec Loss 4.0956 LearningRate 0.0057 Epoch: 15 Global Step: 253840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:22,508-Speed 9580.59 samples/sec Loss 4.0018 LearningRate 0.0057 Epoch: 15 Global Step: 253850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:47:23,557-Speed 9769.27 samples/sec Loss 4.1339 LearningRate 0.0057 Epoch: 15 Global Step: 253860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:24,622-Speed 9619.44 samples/sec Loss 3.9881 LearningRate 0.0057 Epoch: 15 Global Step: 253870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:25,683-Speed 9658.73 samples/sec Loss 4.1044 LearningRate 0.0057 Epoch: 15 Global Step: 253880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:26,710-Speed 9969.55 samples/sec Loss 4.0065 LearningRate 0.0057 Epoch: 15 Global Step: 253890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:27,786-Speed 9530.36 samples/sec Loss 4.0251 LearningRate 0.0057 Epoch: 15 Global Step: 253900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:28,868-Speed 9465.63 samples/sec Loss 4.0574 LearningRate 0.0057 Epoch: 15 Global Step: 253910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:29,988-Speed 9146.71 samples/sec Loss 4.0095 LearningRate 0.0057 Epoch: 15 Global Step: 253920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:31,103-Speed 9188.57 samples/sec Loss 3.9835 LearningRate 0.0057 Epoch: 15 Global Step: 253930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:32,207-Speed 9280.36 samples/sec Loss 4.1119 LearningRate 0.0057 Epoch: 15 Global Step: 253940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:33,303-Speed 9355.96 samples/sec Loss 4.0401 LearningRate 0.0057 Epoch: 15 Global Step: 253950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:47:34,348-Speed 9801.03 samples/sec Loss 4.0861 LearningRate 0.0057 Epoch: 15 Global Step: 253960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:47:35,487-Speed 8994.76 samples/sec Loss 4.0180 LearningRate 0.0057 Epoch: 15 Global Step: 253970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:47:36,626-Speed 8996.96 samples/sec Loss 4.0180 LearningRate 0.0057 Epoch: 15 Global Step: 253980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:47:37,738-Speed 9215.50 samples/sec Loss 4.0331 LearningRate 0.0057 Epoch: 15 Global Step: 253990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:47:38,855-Speed 9174.17 samples/sec Loss 4.0187 LearningRate 0.0057 Epoch: 15 Global Step: 254000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:48:01,030-[lfw][254000]XNorm: 7.178992 Training: 2022-04-11 21:48:01,031-[lfw][254000]Accuracy-Flip: 0.99667+-0.00269 Training: 2022-04-11 21:48:01,031-[lfw][254000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:48:26,713-[cfp_fp][254000]XNorm: 6.200063 Training: 2022-04-11 21:48:26,713-[cfp_fp][254000]Accuracy-Flip: 0.96957+-0.00777 Training: 2022-04-11 21:48:26,714-[cfp_fp][254000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:48:48,834-[agedb_30][254000]XNorm: 6.969766 Training: 2022-04-11 21:48:48,835-[agedb_30][254000]Accuracy-Flip: 0.97133+-0.00859 Training: 2022-04-11 21:48:48,835-[agedb_30][254000]Accuracy-Highest: 0.97350 Training: 2022-04-11 21:48:49,914-Speed 144.11 samples/sec Loss 4.0010 LearningRate 0.0057 Epoch: 15 Global Step: 254010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:48:50,983-Speed 9590.72 samples/sec Loss 3.9976 LearningRate 0.0057 Epoch: 15 Global Step: 254020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:48:52,033-Speed 9755.71 samples/sec Loss 4.0115 LearningRate 0.0057 Epoch: 15 Global Step: 254030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:48:53,118-Speed 9440.07 samples/sec Loss 4.0696 LearningRate 0.0057 Epoch: 15 Global Step: 254040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:48:54,219-Speed 9313.99 samples/sec Loss 4.0402 LearningRate 0.0057 Epoch: 15 Global Step: 254050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:48:55,299-Speed 9486.08 samples/sec Loss 4.0563 LearningRate 0.0057 Epoch: 15 Global Step: 254060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:48:56,402-Speed 9295.25 samples/sec Loss 4.0530 LearningRate 0.0057 Epoch: 15 Global Step: 254070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:48:57,466-Speed 9625.04 samples/sec Loss 4.0024 LearningRate 0.0057 Epoch: 15 Global Step: 254080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:48:58,536-Speed 9575.29 samples/sec Loss 4.1104 LearningRate 0.0057 Epoch: 15 Global Step: 254090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:48:59,602-Speed 9608.98 samples/sec Loss 4.0649 LearningRate 0.0057 Epoch: 15 Global Step: 254100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:00,662-Speed 9665.55 samples/sec Loss 4.0320 LearningRate 0.0057 Epoch: 15 Global Step: 254110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:01,782-Speed 9154.08 samples/sec Loss 4.0532 LearningRate 0.0057 Epoch: 15 Global Step: 254120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:02,841-Speed 9668.69 samples/sec Loss 4.0254 LearningRate 0.0057 Epoch: 15 Global Step: 254130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:03,933-Speed 9384.23 samples/sec Loss 4.0722 LearningRate 0.0057 Epoch: 15 Global Step: 254140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:05,009-Speed 9526.76 samples/sec Loss 4.0077 LearningRate 0.0057 Epoch: 15 Global Step: 254150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:06,046-Speed 9880.92 samples/sec Loss 4.0843 LearningRate 0.0057 Epoch: 15 Global Step: 254160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:07,104-Speed 9678.47 samples/sec Loss 4.0782 LearningRate 0.0057 Epoch: 15 Global Step: 254170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:08,181-Speed 9523.31 samples/sec Loss 3.9757 LearningRate 0.0057 Epoch: 15 Global Step: 254180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:09,312-Speed 9053.70 samples/sec Loss 4.0362 LearningRate 0.0057 Epoch: 15 Global Step: 254190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:10,437-Speed 9104.72 samples/sec Loss 4.1374 LearningRate 0.0057 Epoch: 15 Global Step: 254200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:11,543-Speed 9265.35 samples/sec Loss 4.0418 LearningRate 0.0057 Epoch: 15 Global Step: 254210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:12,638-Speed 9359.59 samples/sec Loss 4.1229 LearningRate 0.0057 Epoch: 15 Global Step: 254220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:13,719-Speed 9475.43 samples/sec Loss 3.9737 LearningRate 0.0057 Epoch: 15 Global Step: 254230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:14,797-Speed 9502.68 samples/sec Loss 4.0803 LearningRate 0.0057 Epoch: 15 Global Step: 254240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:15,838-Speed 9842.87 samples/sec Loss 4.0889 LearningRate 0.0057 Epoch: 15 Global Step: 254250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:16,890-Speed 9743.48 samples/sec Loss 4.0930 LearningRate 0.0057 Epoch: 15 Global Step: 254260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:18,019-Speed 9078.38 samples/sec Loss 4.0546 LearningRate 0.0057 Epoch: 15 Global Step: 254270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:19,117-Speed 9329.86 samples/sec Loss 4.0956 LearningRate 0.0057 Epoch: 15 Global Step: 254280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:20,195-Speed 9509.99 samples/sec Loss 4.1120 LearningRate 0.0057 Epoch: 15 Global Step: 254290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:21,294-Speed 9326.84 samples/sec Loss 4.0699 LearningRate 0.0057 Epoch: 15 Global Step: 254300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:22,407-Speed 9200.87 samples/sec Loss 4.0464 LearningRate 0.0057 Epoch: 15 Global Step: 254310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:23,534-Speed 9089.78 samples/sec Loss 4.1153 LearningRate 0.0057 Epoch: 15 Global Step: 254320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:24,635-Speed 9307.54 samples/sec Loss 4.1022 LearningRate 0.0057 Epoch: 15 Global Step: 254330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:25,697-Speed 9644.29 samples/sec Loss 4.0263 LearningRate 0.0057 Epoch: 15 Global Step: 254340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:26,775-Speed 9509.67 samples/sec Loss 4.0117 LearningRate 0.0057 Epoch: 15 Global Step: 254350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:27,878-Speed 9286.15 samples/sec Loss 3.9683 LearningRate 0.0057 Epoch: 15 Global Step: 254360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:29,045-Speed 8780.90 samples/sec Loss 4.1636 LearningRate 0.0057 Epoch: 15 Global Step: 254370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:30,092-Speed 9785.04 samples/sec Loss 4.0437 LearningRate 0.0057 Epoch: 15 Global Step: 254380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:31,134-Speed 9833.56 samples/sec Loss 4.0535 LearningRate 0.0057 Epoch: 15 Global Step: 254390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:32,233-Speed 9325.10 samples/sec Loss 4.1194 LearningRate 0.0057 Epoch: 15 Global Step: 254400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:33,364-Speed 9060.49 samples/sec Loss 4.0262 LearningRate 0.0057 Epoch: 15 Global Step: 254410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:34,485-Speed 9134.98 samples/sec Loss 4.0570 LearningRate 0.0057 Epoch: 15 Global Step: 254420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:35,608-Speed 9123.65 samples/sec Loss 4.0883 LearningRate 0.0057 Epoch: 15 Global Step: 254430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:36,744-Speed 9018.65 samples/sec Loss 4.0353 LearningRate 0.0057 Epoch: 15 Global Step: 254440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:37,811-Speed 9608.29 samples/sec Loss 4.1185 LearningRate 0.0057 Epoch: 15 Global Step: 254450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:38,946-Speed 9036.19 samples/sec Loss 4.1100 LearningRate 0.0057 Epoch: 15 Global Step: 254460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:40,055-Speed 9242.79 samples/sec Loss 4.1221 LearningRate 0.0057 Epoch: 15 Global Step: 254470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:41,153-Speed 9326.89 samples/sec Loss 4.0424 LearningRate 0.0056 Epoch: 15 Global Step: 254480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:42,252-Speed 9323.98 samples/sec Loss 4.0242 LearningRate 0.0056 Epoch: 15 Global Step: 254490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:43,338-Speed 9434.67 samples/sec Loss 3.9973 LearningRate 0.0056 Epoch: 15 Global Step: 254500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:44,388-Speed 9755.71 samples/sec Loss 4.1003 LearningRate 0.0056 Epoch: 15 Global Step: 254510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:45,502-Speed 9200.90 samples/sec Loss 4.1129 LearningRate 0.0056 Epoch: 15 Global Step: 254520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:46,591-Speed 9403.55 samples/sec Loss 4.2118 LearningRate 0.0056 Epoch: 15 Global Step: 254530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:47,665-Speed 9544.66 samples/sec Loss 4.0488 LearningRate 0.0056 Epoch: 15 Global Step: 254540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:48,747-Speed 9463.28 samples/sec Loss 4.0643 LearningRate 0.0056 Epoch: 15 Global Step: 254550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:49,830-Speed 9464.05 samples/sec Loss 4.0215 LearningRate 0.0056 Epoch: 15 Global Step: 254560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:50,982-Speed 8895.45 samples/sec Loss 4.1097 LearningRate 0.0056 Epoch: 15 Global Step: 254570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:52,132-Speed 8913.16 samples/sec Loss 4.0260 LearningRate 0.0056 Epoch: 15 Global Step: 254580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:49:53,210-Speed 9499.50 samples/sec Loss 4.0430 LearningRate 0.0056 Epoch: 15 Global Step: 254590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:54,326-Speed 9184.33 samples/sec Loss 4.0018 LearningRate 0.0056 Epoch: 15 Global Step: 254600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:55,432-Speed 9256.92 samples/sec Loss 3.9662 LearningRate 0.0056 Epoch: 15 Global Step: 254610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:56,553-Speed 9146.53 samples/sec Loss 4.0753 LearningRate 0.0056 Epoch: 15 Global Step: 254620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:57,592-Speed 9861.83 samples/sec Loss 3.9917 LearningRate 0.0056 Epoch: 15 Global Step: 254630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:58,659-Speed 9599.65 samples/sec Loss 4.0549 LearningRate 0.0056 Epoch: 15 Global Step: 254640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:49:59,756-Speed 9340.85 samples/sec Loss 4.0616 LearningRate 0.0056 Epoch: 15 Global Step: 254650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:00,851-Speed 9354.81 samples/sec Loss 4.0367 LearningRate 0.0056 Epoch: 15 Global Step: 254660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:01,949-Speed 9338.60 samples/sec Loss 4.1253 LearningRate 0.0056 Epoch: 15 Global Step: 254670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:03,031-Speed 9466.73 samples/sec Loss 4.0886 LearningRate 0.0056 Epoch: 15 Global Step: 254680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:04,118-Speed 9421.62 samples/sec Loss 4.1597 LearningRate 0.0056 Epoch: 15 Global Step: 254690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:05,190-Speed 9560.24 samples/sec Loss 4.0826 LearningRate 0.0056 Epoch: 15 Global Step: 254700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:06,263-Speed 9546.97 samples/sec Loss 4.1409 LearningRate 0.0056 Epoch: 15 Global Step: 254710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:07,370-Speed 9258.09 samples/sec Loss 4.0425 LearningRate 0.0056 Epoch: 15 Global Step: 254720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:08,538-Speed 8773.21 samples/sec Loss 4.0572 LearningRate 0.0056 Epoch: 15 Global Step: 254730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:09,637-Speed 9319.65 samples/sec Loss 4.0956 LearningRate 0.0056 Epoch: 15 Global Step: 254740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:10,706-Speed 9583.35 samples/sec Loss 4.0161 LearningRate 0.0056 Epoch: 15 Global Step: 254750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:11,806-Speed 9314.07 samples/sec Loss 3.9533 LearningRate 0.0056 Epoch: 15 Global Step: 254760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:12,886-Speed 9486.83 samples/sec Loss 4.1109 LearningRate 0.0056 Epoch: 15 Global Step: 254770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:13,997-Speed 9228.58 samples/sec Loss 4.0382 LearningRate 0.0056 Epoch: 15 Global Step: 254780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:15,091-Speed 9365.25 samples/sec Loss 4.0343 LearningRate 0.0056 Epoch: 15 Global Step: 254790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:16,208-Speed 9173.60 samples/sec Loss 4.1738 LearningRate 0.0056 Epoch: 15 Global Step: 254800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:17,303-Speed 9362.70 samples/sec Loss 4.0692 LearningRate 0.0056 Epoch: 15 Global Step: 254810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:18,366-Speed 9637.90 samples/sec Loss 4.1364 LearningRate 0.0056 Epoch: 15 Global Step: 254820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:19,463-Speed 9332.82 samples/sec Loss 4.0113 LearningRate 0.0056 Epoch: 15 Global Step: 254830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:20,532-Speed 9591.68 samples/sec Loss 3.9734 LearningRate 0.0056 Epoch: 15 Global Step: 254840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:21,592-Speed 9663.22 samples/sec Loss 4.0540 LearningRate 0.0056 Epoch: 15 Global Step: 254850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:22,700-Speed 9242.91 samples/sec Loss 4.0610 LearningRate 0.0056 Epoch: 15 Global Step: 254860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:23,786-Speed 9440.39 samples/sec Loss 4.0353 LearningRate 0.0056 Epoch: 15 Global Step: 254870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:24,824-Speed 9870.77 samples/sec Loss 4.0191 LearningRate 0.0056 Epoch: 15 Global Step: 254880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:25,940-Speed 9179.10 samples/sec Loss 4.1371 LearningRate 0.0056 Epoch: 15 Global Step: 254890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:27,041-Speed 9306.87 samples/sec Loss 4.0879 LearningRate 0.0056 Epoch: 15 Global Step: 254900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:28,135-Speed 9365.03 samples/sec Loss 4.0105 LearningRate 0.0056 Epoch: 15 Global Step: 254910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:29,189-Speed 9724.66 samples/sec Loss 4.0944 LearningRate 0.0056 Epoch: 15 Global Step: 254920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:30,268-Speed 9494.79 samples/sec Loss 4.1194 LearningRate 0.0056 Epoch: 15 Global Step: 254930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:31,337-Speed 9583.50 samples/sec Loss 4.0829 LearningRate 0.0056 Epoch: 15 Global Step: 254940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:32,448-Speed 9224.13 samples/sec Loss 4.1139 LearningRate 0.0056 Epoch: 15 Global Step: 254950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:33,558-Speed 9229.36 samples/sec Loss 4.1346 LearningRate 0.0056 Epoch: 15 Global Step: 254960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:34,634-Speed 9524.37 samples/sec Loss 4.0722 LearningRate 0.0056 Epoch: 15 Global Step: 254970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:35,709-Speed 9533.76 samples/sec Loss 4.0559 LearningRate 0.0056 Epoch: 15 Global Step: 254980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:36,759-Speed 9763.05 samples/sec Loss 4.0497 LearningRate 0.0056 Epoch: 15 Global Step: 254990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:37,838-Speed 9490.56 samples/sec Loss 4.0905 LearningRate 0.0056 Epoch: 15 Global Step: 255000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:38,905-Speed 9606.46 samples/sec Loss 4.0457 LearningRate 0.0056 Epoch: 15 Global Step: 255010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:39,978-Speed 9546.24 samples/sec Loss 4.0923 LearningRate 0.0056 Epoch: 15 Global Step: 255020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:41,047-Speed 9585.37 samples/sec Loss 4.1259 LearningRate 0.0056 Epoch: 15 Global Step: 255030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:42,132-Speed 9443.16 samples/sec Loss 4.0537 LearningRate 0.0056 Epoch: 15 Global Step: 255040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:43,244-Speed 9212.79 samples/sec Loss 3.9883 LearningRate 0.0056 Epoch: 15 Global Step: 255050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:44,338-Speed 9370.99 samples/sec Loss 4.0848 LearningRate 0.0056 Epoch: 15 Global Step: 255060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:45,417-Speed 9488.52 samples/sec Loss 4.0680 LearningRate 0.0056 Epoch: 15 Global Step: 255070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:46,553-Speed 9022.09 samples/sec Loss 4.0705 LearningRate 0.0056 Epoch: 15 Global Step: 255080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:47,670-Speed 9173.02 samples/sec Loss 4.1061 LearningRate 0.0056 Epoch: 15 Global Step: 255090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:48,707-Speed 9886.94 samples/sec Loss 3.9456 LearningRate 0.0056 Epoch: 15 Global Step: 255100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:49,787-Speed 9485.02 samples/sec Loss 4.0476 LearningRate 0.0056 Epoch: 15 Global Step: 255110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:50,881-Speed 9364.58 samples/sec Loss 3.9901 LearningRate 0.0056 Epoch: 15 Global Step: 255120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:51,991-Speed 9225.40 samples/sec Loss 4.0074 LearningRate 0.0056 Epoch: 15 Global Step: 255130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:53,083-Speed 9388.48 samples/sec Loss 4.0516 LearningRate 0.0056 Epoch: 15 Global Step: 255140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:54,202-Speed 9154.49 samples/sec Loss 4.0743 LearningRate 0.0056 Epoch: 15 Global Step: 255150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:50:55,280-Speed 9506.54 samples/sec Loss 4.1404 LearningRate 0.0056 Epoch: 15 Global Step: 255160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:56,378-Speed 9329.46 samples/sec Loss 4.0446 LearningRate 0.0056 Epoch: 15 Global Step: 255170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:57,475-Speed 9342.98 samples/sec Loss 4.0133 LearningRate 0.0055 Epoch: 15 Global Step: 255180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:58,576-Speed 9305.34 samples/sec Loss 4.0313 LearningRate 0.0055 Epoch: 15 Global Step: 255190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:50:59,668-Speed 9376.39 samples/sec Loss 4.1053 LearningRate 0.0055 Epoch: 15 Global Step: 255200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:00,802-Speed 9038.53 samples/sec Loss 4.0819 LearningRate 0.0055 Epoch: 15 Global Step: 255210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:01,916-Speed 9192.05 samples/sec Loss 4.0624 LearningRate 0.0055 Epoch: 15 Global Step: 255220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:03,002-Speed 9437.53 samples/sec Loss 4.0822 LearningRate 0.0055 Epoch: 15 Global Step: 255230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:04,077-Speed 9529.04 samples/sec Loss 4.1109 LearningRate 0.0055 Epoch: 15 Global Step: 255240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:05,119-Speed 9831.76 samples/sec Loss 4.0839 LearningRate 0.0055 Epoch: 15 Global Step: 255250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:06,218-Speed 9328.84 samples/sec Loss 4.0836 LearningRate 0.0055 Epoch: 15 Global Step: 255260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:07,304-Speed 9443.13 samples/sec Loss 4.0731 LearningRate 0.0055 Epoch: 15 Global Step: 255270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:08,403-Speed 9317.37 samples/sec Loss 4.0385 LearningRate 0.0055 Epoch: 15 Global Step: 255280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:09,493-Speed 9402.08 samples/sec Loss 4.0194 LearningRate 0.0055 Epoch: 15 Global Step: 255290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:10,598-Speed 9274.13 samples/sec Loss 4.0369 LearningRate 0.0055 Epoch: 15 Global Step: 255300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:11,681-Speed 9463.84 samples/sec Loss 4.0601 LearningRate 0.0055 Epoch: 15 Global Step: 255310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:12,797-Speed 9178.58 samples/sec Loss 3.9944 LearningRate 0.0055 Epoch: 15 Global Step: 255320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:13,915-Speed 9162.02 samples/sec Loss 4.0864 LearningRate 0.0055 Epoch: 15 Global Step: 255330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:14,976-Speed 9658.86 samples/sec Loss 4.1201 LearningRate 0.0055 Epoch: 15 Global Step: 255340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:16,050-Speed 9543.30 samples/sec Loss 4.1278 LearningRate 0.0055 Epoch: 15 Global Step: 255350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:17,139-Speed 9409.42 samples/sec Loss 4.0912 LearningRate 0.0055 Epoch: 15 Global Step: 255360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:18,218-Speed 9494.36 samples/sec Loss 4.0895 LearningRate 0.0055 Epoch: 15 Global Step: 255370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:19,353-Speed 9027.29 samples/sec Loss 4.0584 LearningRate 0.0055 Epoch: 15 Global Step: 255380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:20,452-Speed 9328.31 samples/sec Loss 4.1158 LearningRate 0.0055 Epoch: 15 Global Step: 255390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:21,539-Speed 9425.38 samples/sec Loss 4.0959 LearningRate 0.0055 Epoch: 15 Global Step: 255400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:22,624-Speed 9439.52 samples/sec Loss 4.0795 LearningRate 0.0055 Epoch: 15 Global Step: 255410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:23,711-Speed 9423.50 samples/sec Loss 4.1310 LearningRate 0.0055 Epoch: 15 Global Step: 255420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:24,773-Speed 9654.46 samples/sec Loss 4.1033 LearningRate 0.0055 Epoch: 15 Global Step: 255430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:25,826-Speed 9731.73 samples/sec Loss 4.0356 LearningRate 0.0055 Epoch: 15 Global Step: 255440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:26,905-Speed 9499.06 samples/sec Loss 4.1455 LearningRate 0.0055 Epoch: 15 Global Step: 255450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:27,996-Speed 9387.73 samples/sec Loss 4.1277 LearningRate 0.0055 Epoch: 15 Global Step: 255460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:29,087-Speed 9387.47 samples/sec Loss 4.0895 LearningRate 0.0055 Epoch: 15 Global Step: 255470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:30,162-Speed 9529.23 samples/sec Loss 4.1000 LearningRate 0.0055 Epoch: 15 Global Step: 255480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:31,270-Speed 9258.37 samples/sec Loss 4.0484 LearningRate 0.0055 Epoch: 15 Global Step: 255490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:32,329-Speed 9668.18 samples/sec Loss 4.0790 LearningRate 0.0055 Epoch: 15 Global Step: 255500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:33,381-Speed 9744.09 samples/sec Loss 4.1156 LearningRate 0.0055 Epoch: 15 Global Step: 255510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:34,467-Speed 9433.66 samples/sec Loss 4.1185 LearningRate 0.0055 Epoch: 15 Global Step: 255520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:35,571-Speed 9275.72 samples/sec Loss 4.0124 LearningRate 0.0055 Epoch: 15 Global Step: 255530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:36,718-Speed 8935.79 samples/sec Loss 4.0106 LearningRate 0.0055 Epoch: 15 Global Step: 255540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:37,795-Speed 9509.82 samples/sec Loss 4.0340 LearningRate 0.0055 Epoch: 15 Global Step: 255550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:38,863-Speed 9594.09 samples/sec Loss 4.0405 LearningRate 0.0055 Epoch: 15 Global Step: 255560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:39,929-Speed 9614.37 samples/sec Loss 4.0958 LearningRate 0.0055 Epoch: 15 Global Step: 255570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:41,027-Speed 9325.44 samples/sec Loss 4.1180 LearningRate 0.0055 Epoch: 15 Global Step: 255580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:42,111-Speed 9455.84 samples/sec Loss 4.1482 LearningRate 0.0055 Epoch: 15 Global Step: 255590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:43,273-Speed 8818.97 samples/sec Loss 4.1529 LearningRate 0.0055 Epoch: 15 Global Step: 255600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:44,342-Speed 9591.74 samples/sec Loss 4.0309 LearningRate 0.0055 Epoch: 15 Global Step: 255610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:45,387-Speed 9804.95 samples/sec Loss 4.0490 LearningRate 0.0055 Epoch: 15 Global Step: 255620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:46,436-Speed 9759.79 samples/sec Loss 4.1106 LearningRate 0.0055 Epoch: 15 Global Step: 255630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:47,481-Speed 9810.39 samples/sec Loss 4.0768 LearningRate 0.0055 Epoch: 15 Global Step: 255640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:48,591-Speed 9231.40 samples/sec Loss 4.1646 LearningRate 0.0055 Epoch: 15 Global Step: 255650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:49,680-Speed 9408.45 samples/sec Loss 4.0729 LearningRate 0.0055 Epoch: 15 Global Step: 255660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:50,802-Speed 9128.74 samples/sec Loss 4.1489 LearningRate 0.0055 Epoch: 15 Global Step: 255670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:51,938-Speed 9019.80 samples/sec Loss 4.0726 LearningRate 0.0055 Epoch: 15 Global Step: 255680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:53,073-Speed 9031.17 samples/sec Loss 4.0787 LearningRate 0.0055 Epoch: 15 Global Step: 255690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:54,172-Speed 9319.94 samples/sec Loss 4.0956 LearningRate 0.0055 Epoch: 15 Global Step: 255700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:55,258-Speed 9431.39 samples/sec Loss 4.1315 LearningRate 0.0055 Epoch: 15 Global Step: 255710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:51:56,306-Speed 9774.27 samples/sec Loss 4.0709 LearningRate 0.0055 Epoch: 15 Global Step: 255720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:57,413-Speed 9259.49 samples/sec Loss 4.0938 LearningRate 0.0055 Epoch: 15 Global Step: 255730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:58,500-Speed 9427.14 samples/sec Loss 4.1244 LearningRate 0.0055 Epoch: 15 Global Step: 255740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:51:59,581-Speed 9476.78 samples/sec Loss 4.0477 LearningRate 0.0055 Epoch: 15 Global Step: 255750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:00,693-Speed 9215.86 samples/sec Loss 4.1257 LearningRate 0.0055 Epoch: 15 Global Step: 255760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:01,790-Speed 9337.69 samples/sec Loss 4.1879 LearningRate 0.0055 Epoch: 15 Global Step: 255770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:02,916-Speed 9106.73 samples/sec Loss 4.0775 LearningRate 0.0055 Epoch: 15 Global Step: 255780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:03,974-Speed 9681.72 samples/sec Loss 4.0973 LearningRate 0.0055 Epoch: 15 Global Step: 255790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:05,053-Speed 9492.14 samples/sec Loss 4.0600 LearningRate 0.0055 Epoch: 15 Global Step: 255800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:06,146-Speed 9379.04 samples/sec Loss 4.1090 LearningRate 0.0055 Epoch: 15 Global Step: 255810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:07,212-Speed 9612.56 samples/sec Loss 4.0952 LearningRate 0.0055 Epoch: 15 Global Step: 255820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:52:08,330-Speed 9163.83 samples/sec Loss 4.0753 LearningRate 0.0055 Epoch: 15 Global Step: 255830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:52:09,460-Speed 9062.65 samples/sec Loss 4.0720 LearningRate 0.0055 Epoch: 15 Global Step: 255840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:52:10,531-Speed 9566.35 samples/sec Loss 4.1092 LearningRate 0.0055 Epoch: 15 Global Step: 255850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:11,565-Speed 9910.86 samples/sec Loss 4.1272 LearningRate 0.0055 Epoch: 15 Global Step: 255860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:12,644-Speed 9508.50 samples/sec Loss 4.1561 LearningRate 0.0055 Epoch: 15 Global Step: 255870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:13,693-Speed 9765.11 samples/sec Loss 4.0716 LearningRate 0.0055 Epoch: 15 Global Step: 255880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:14,759-Speed 9616.33 samples/sec Loss 4.1506 LearningRate 0.0054 Epoch: 15 Global Step: 255890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:15,817-Speed 9680.16 samples/sec Loss 4.1093 LearningRate 0.0054 Epoch: 15 Global Step: 255900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:16,862-Speed 9802.15 samples/sec Loss 4.0323 LearningRate 0.0054 Epoch: 15 Global Step: 255910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:17,924-Speed 9647.13 samples/sec Loss 4.0837 LearningRate 0.0054 Epoch: 15 Global Step: 255920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:19,069-Speed 8955.85 samples/sec Loss 4.0639 LearningRate 0.0054 Epoch: 15 Global Step: 255930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:20,148-Speed 9498.20 samples/sec Loss 4.0606 LearningRate 0.0054 Epoch: 15 Global Step: 255940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:52:21,256-Speed 9242.74 samples/sec Loss 4.0136 LearningRate 0.0054 Epoch: 15 Global Step: 255950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:52:22,355-Speed 9323.95 samples/sec Loss 4.0532 LearningRate 0.0054 Epoch: 15 Global Step: 255960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:52:23,407-Speed 9738.17 samples/sec Loss 4.1183 LearningRate 0.0054 Epoch: 15 Global Step: 255970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:52:24,500-Speed 9377.28 samples/sec Loss 4.1127 LearningRate 0.0054 Epoch: 15 Global Step: 255980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:52:25,587-Speed 9430.18 samples/sec Loss 4.0632 LearningRate 0.0054 Epoch: 15 Global Step: 255990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:52:26,702-Speed 9187.37 samples/sec Loss 4.0786 LearningRate 0.0054 Epoch: 15 Global Step: 256000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:52:48,792-[lfw][256000]XNorm: 7.242381 Training: 2022-04-11 21:52:48,793-[lfw][256000]Accuracy-Flip: 0.99617+-0.00269 Training: 2022-04-11 21:52:48,793-[lfw][256000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:53:14,304-[cfp_fp][256000]XNorm: 6.275827 Training: 2022-04-11 21:53:14,305-[cfp_fp][256000]Accuracy-Flip: 0.97086+-0.00547 Training: 2022-04-11 21:53:14,305-[cfp_fp][256000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:53:36,314-[agedb_30][256000]XNorm: 7.043263 Training: 2022-04-11 21:53:36,315-[agedb_30][256000]Accuracy-Flip: 0.96883+-0.00928 Training: 2022-04-11 21:53:36,315-[agedb_30][256000]Accuracy-Highest: 0.97350 Training: 2022-04-11 21:53:37,414-Speed 144.81 samples/sec Loss 4.2174 LearningRate 0.0054 Epoch: 15 Global Step: 256010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:53:38,519-Speed 9268.58 samples/sec Loss 4.2231 LearningRate 0.0054 Epoch: 15 Global Step: 256020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:39,625-Speed 9263.86 samples/sec Loss 4.0869 LearningRate 0.0054 Epoch: 15 Global Step: 256030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:40,702-Speed 9512.48 samples/sec Loss 4.1004 LearningRate 0.0054 Epoch: 15 Global Step: 256040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:41,792-Speed 9408.27 samples/sec Loss 4.2157 LearningRate 0.0054 Epoch: 15 Global Step: 256050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:42,919-Speed 9094.07 samples/sec Loss 4.0989 LearningRate 0.0054 Epoch: 15 Global Step: 256060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:44,008-Speed 9401.94 samples/sec Loss 4.1158 LearningRate 0.0054 Epoch: 15 Global Step: 256070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:45,087-Speed 9495.06 samples/sec Loss 4.0374 LearningRate 0.0054 Epoch: 15 Global Step: 256080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:46,162-Speed 9531.29 samples/sec Loss 4.0555 LearningRate 0.0054 Epoch: 15 Global Step: 256090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:47,291-Speed 9076.07 samples/sec Loss 4.0849 LearningRate 0.0054 Epoch: 15 Global Step: 256100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:48,362-Speed 9569.89 samples/sec Loss 3.9945 LearningRate 0.0054 Epoch: 15 Global Step: 256110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:49,442-Speed 9492.75 samples/sec Loss 4.1681 LearningRate 0.0054 Epoch: 15 Global Step: 256120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:53:50,552-Speed 9224.88 samples/sec Loss 4.1580 LearningRate 0.0054 Epoch: 15 Global Step: 256130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:53:51,632-Speed 9494.95 samples/sec Loss 4.0572 LearningRate 0.0054 Epoch: 15 Global Step: 256140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:53:52,708-Speed 9515.82 samples/sec Loss 4.0756 LearningRate 0.0054 Epoch: 15 Global Step: 256150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:53:53,772-Speed 9632.14 samples/sec Loss 4.0487 LearningRate 0.0054 Epoch: 15 Global Step: 256160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:54,832-Speed 9665.13 samples/sec Loss 4.1041 LearningRate 0.0054 Epoch: 15 Global Step: 256170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:55,975-Speed 8962.89 samples/sec Loss 4.1260 LearningRate 0.0054 Epoch: 15 Global Step: 256180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:57,084-Speed 9235.22 samples/sec Loss 4.0549 LearningRate 0.0054 Epoch: 15 Global Step: 256190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:58,202-Speed 9171.39 samples/sec Loss 4.1090 LearningRate 0.0054 Epoch: 15 Global Step: 256200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:53:59,276-Speed 9539.66 samples/sec Loss 4.0891 LearningRate 0.0054 Epoch: 15 Global Step: 256210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:00,403-Speed 9084.51 samples/sec Loss 4.1196 LearningRate 0.0054 Epoch: 15 Global Step: 256220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:01,499-Speed 9354.14 samples/sec Loss 4.0947 LearningRate 0.0054 Epoch: 15 Global Step: 256230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:02,547-Speed 9775.25 samples/sec Loss 4.1913 LearningRate 0.0054 Epoch: 15 Global Step: 256240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:03,633-Speed 9437.21 samples/sec Loss 4.1994 LearningRate 0.0054 Epoch: 15 Global Step: 256250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:04,706-Speed 9547.75 samples/sec Loss 4.1697 LearningRate 0.0054 Epoch: 15 Global Step: 256260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:05,800-Speed 9364.81 samples/sec Loss 4.1300 LearningRate 0.0054 Epoch: 15 Global Step: 256270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:06,921-Speed 9136.86 samples/sec Loss 4.1725 LearningRate 0.0054 Epoch: 15 Global Step: 256280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:08,036-Speed 9193.61 samples/sec Loss 4.2407 LearningRate 0.0054 Epoch: 15 Global Step: 256290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:09,162-Speed 9096.53 samples/sec Loss 4.2334 LearningRate 0.0054 Epoch: 15 Global Step: 256300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:10,224-Speed 9650.27 samples/sec Loss 4.0222 LearningRate 0.0054 Epoch: 15 Global Step: 256310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:11,359-Speed 9025.30 samples/sec Loss 4.0377 LearningRate 0.0054 Epoch: 15 Global Step: 256320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:12,444-Speed 9454.07 samples/sec Loss 4.0331 LearningRate 0.0054 Epoch: 15 Global Step: 256330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:13,503-Speed 9672.74 samples/sec Loss 3.9791 LearningRate 0.0054 Epoch: 15 Global Step: 256340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:14,592-Speed 9404.79 samples/sec Loss 4.0756 LearningRate 0.0054 Epoch: 15 Global Step: 256350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:15,668-Speed 9530.06 samples/sec Loss 4.2101 LearningRate 0.0054 Epoch: 15 Global Step: 256360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:16,784-Speed 9179.26 samples/sec Loss 4.1247 LearningRate 0.0054 Epoch: 15 Global Step: 256370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:17,903-Speed 9154.71 samples/sec Loss 4.0776 LearningRate 0.0054 Epoch: 15 Global Step: 256380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:18,986-Speed 9462.14 samples/sec Loss 4.1167 LearningRate 0.0054 Epoch: 15 Global Step: 256390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:20,112-Speed 9098.53 samples/sec Loss 4.2075 LearningRate 0.0054 Epoch: 15 Global Step: 256400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:21,220-Speed 9249.00 samples/sec Loss 4.1605 LearningRate 0.0054 Epoch: 15 Global Step: 256410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:22,362-Speed 8968.62 samples/sec Loss 4.1115 LearningRate 0.0054 Epoch: 15 Global Step: 256420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:23,453-Speed 9391.92 samples/sec Loss 4.0420 LearningRate 0.0054 Epoch: 15 Global Step: 256430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:24,513-Speed 9668.87 samples/sec Loss 4.0826 LearningRate 0.0054 Epoch: 15 Global Step: 256440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:25,591-Speed 9509.28 samples/sec Loss 4.1168 LearningRate 0.0054 Epoch: 15 Global Step: 256450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:26,639-Speed 9783.08 samples/sec Loss 4.0476 LearningRate 0.0054 Epoch: 15 Global Step: 256460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:27,723-Speed 9450.26 samples/sec Loss 4.0509 LearningRate 0.0054 Epoch: 15 Global Step: 256470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:28,823-Speed 9310.32 samples/sec Loss 4.1431 LearningRate 0.0054 Epoch: 15 Global Step: 256480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:29,874-Speed 9752.46 samples/sec Loss 4.0339 LearningRate 0.0054 Epoch: 15 Global Step: 256490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:30,938-Speed 9627.61 samples/sec Loss 4.0528 LearningRate 0.0054 Epoch: 15 Global Step: 256500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:31,997-Speed 9674.41 samples/sec Loss 4.0969 LearningRate 0.0054 Epoch: 15 Global Step: 256510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:33,089-Speed 9384.92 samples/sec Loss 4.0717 LearningRate 0.0054 Epoch: 15 Global Step: 256520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:34,177-Speed 9421.98 samples/sec Loss 4.0846 LearningRate 0.0054 Epoch: 15 Global Step: 256530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:35,252-Speed 9531.77 samples/sec Loss 4.0931 LearningRate 0.0054 Epoch: 15 Global Step: 256540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:36,362-Speed 9229.52 samples/sec Loss 4.0663 LearningRate 0.0054 Epoch: 15 Global Step: 256550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:37,469-Speed 9251.44 samples/sec Loss 4.1232 LearningRate 0.0054 Epoch: 15 Global Step: 256560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:38,564-Speed 9353.43 samples/sec Loss 4.1755 LearningRate 0.0054 Epoch: 15 Global Step: 256570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:39,671-Speed 9258.65 samples/sec Loss 4.1121 LearningRate 0.0054 Epoch: 15 Global Step: 256580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:40,727-Speed 9702.03 samples/sec Loss 4.0894 LearningRate 0.0054 Epoch: 15 Global Step: 256590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:41,849-Speed 9131.20 samples/sec Loss 4.1225 LearningRate 0.0054 Epoch: 15 Global Step: 256600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:42,955-Speed 9265.46 samples/sec Loss 4.1192 LearningRate 0.0053 Epoch: 15 Global Step: 256610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:44,051-Speed 9354.45 samples/sec Loss 4.1665 LearningRate 0.0053 Epoch: 15 Global Step: 256620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:45,113-Speed 9647.17 samples/sec Loss 4.0829 LearningRate 0.0053 Epoch: 15 Global Step: 256630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:46,146-Speed 9917.99 samples/sec Loss 4.1196 LearningRate 0.0053 Epoch: 15 Global Step: 256640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:47,192-Speed 9794.62 samples/sec Loss 4.0702 LearningRate 0.0053 Epoch: 15 Global Step: 256650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:48,268-Speed 9522.98 samples/sec Loss 4.0762 LearningRate 0.0053 Epoch: 15 Global Step: 256660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:49,320-Speed 9741.21 samples/sec Loss 4.1317 LearningRate 0.0053 Epoch: 15 Global Step: 256670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:50,428-Speed 9243.41 samples/sec Loss 4.1173 LearningRate 0.0053 Epoch: 15 Global Step: 256680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:51,557-Speed 9074.73 samples/sec Loss 4.0613 LearningRate 0.0053 Epoch: 15 Global Step: 256690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:52,672-Speed 9191.64 samples/sec Loss 4.1041 LearningRate 0.0053 Epoch: 15 Global Step: 256700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:53,766-Speed 9360.79 samples/sec Loss 4.0477 LearningRate 0.0053 Epoch: 15 Global Step: 256710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:54,951-Speed 8651.46 samples/sec Loss 4.0996 LearningRate 0.0053 Epoch: 15 Global Step: 256720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:56,048-Speed 9339.13 samples/sec Loss 4.0398 LearningRate 0.0053 Epoch: 15 Global Step: 256730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:54:57,154-Speed 9261.66 samples/sec Loss 4.1549 LearningRate 0.0053 Epoch: 15 Global Step: 256740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:58,216-Speed 9645.03 samples/sec Loss 4.1029 LearningRate 0.0053 Epoch: 15 Global Step: 256750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:54:59,311-Speed 9353.11 samples/sec Loss 4.1048 LearningRate 0.0053 Epoch: 15 Global Step: 256760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:00,374-Speed 9639.07 samples/sec Loss 4.0695 LearningRate 0.0053 Epoch: 15 Global Step: 256770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:01,452-Speed 9509.21 samples/sec Loss 4.0764 LearningRate 0.0053 Epoch: 15 Global Step: 256780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:02,562-Speed 9236.15 samples/sec Loss 4.1534 LearningRate 0.0053 Epoch: 15 Global Step: 256790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:03,653-Speed 9389.25 samples/sec Loss 4.0887 LearningRate 0.0053 Epoch: 15 Global Step: 256800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:04,771-Speed 9164.63 samples/sec Loss 4.0774 LearningRate 0.0053 Epoch: 15 Global Step: 256810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:05,829-Speed 9681.21 samples/sec Loss 4.1133 LearningRate 0.0053 Epoch: 15 Global Step: 256820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:06,915-Speed 9434.46 samples/sec Loss 4.1547 LearningRate 0.0053 Epoch: 15 Global Step: 256830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:08,013-Speed 9333.96 samples/sec Loss 4.1810 LearningRate 0.0053 Epoch: 15 Global Step: 256840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:09,077-Speed 9626.19 samples/sec Loss 4.0343 LearningRate 0.0053 Epoch: 15 Global Step: 256850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:10,145-Speed 9594.68 samples/sec Loss 4.1715 LearningRate 0.0053 Epoch: 15 Global Step: 256860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:11,272-Speed 9091.90 samples/sec Loss 4.0286 LearningRate 0.0053 Epoch: 15 Global Step: 256870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:12,344-Speed 9565.86 samples/sec Loss 4.1413 LearningRate 0.0053 Epoch: 15 Global Step: 256880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:13,447-Speed 9294.08 samples/sec Loss 4.0566 LearningRate 0.0053 Epoch: 15 Global Step: 256890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:14,594-Speed 8933.21 samples/sec Loss 4.0759 LearningRate 0.0053 Epoch: 15 Global Step: 256900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:15,651-Speed 9703.78 samples/sec Loss 4.1617 LearningRate 0.0053 Epoch: 15 Global Step: 256910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:16,758-Speed 9249.46 samples/sec Loss 4.1599 LearningRate 0.0053 Epoch: 15 Global Step: 256920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:17,813-Speed 9713.20 samples/sec Loss 4.1094 LearningRate 0.0053 Epoch: 15 Global Step: 256930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:18,874-Speed 9663.85 samples/sec Loss 4.0656 LearningRate 0.0053 Epoch: 15 Global Step: 256940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:20,061-Speed 8633.99 samples/sec Loss 4.0948 LearningRate 0.0053 Epoch: 15 Global Step: 256950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:21,147-Speed 9433.35 samples/sec Loss 4.1177 LearningRate 0.0053 Epoch: 15 Global Step: 256960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:22,246-Speed 9324.39 samples/sec Loss 4.1034 LearningRate 0.0053 Epoch: 15 Global Step: 256970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:23,336-Speed 9395.43 samples/sec Loss 4.1705 LearningRate 0.0053 Epoch: 15 Global Step: 256980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:24,460-Speed 9122.01 samples/sec Loss 4.0737 LearningRate 0.0053 Epoch: 15 Global Step: 256990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:25,541-Speed 9477.99 samples/sec Loss 4.0836 LearningRate 0.0053 Epoch: 15 Global Step: 257000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:26,676-Speed 9020.79 samples/sec Loss 4.1722 LearningRate 0.0053 Epoch: 15 Global Step: 257010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:27,804-Speed 9090.86 samples/sec Loss 4.0630 LearningRate 0.0053 Epoch: 15 Global Step: 257020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:28,946-Speed 8963.90 samples/sec Loss 4.1080 LearningRate 0.0053 Epoch: 15 Global Step: 257030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:30,049-Speed 9297.19 samples/sec Loss 4.0262 LearningRate 0.0053 Epoch: 15 Global Step: 257040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:31,135-Speed 9437.05 samples/sec Loss 4.0929 LearningRate 0.0053 Epoch: 15 Global Step: 257050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:32,196-Speed 9658.90 samples/sec Loss 4.1652 LearningRate 0.0053 Epoch: 15 Global Step: 257060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:33,294-Speed 9326.81 samples/sec Loss 4.1684 LearningRate 0.0053 Epoch: 15 Global Step: 257070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:34,367-Speed 9552.13 samples/sec Loss 4.0497 LearningRate 0.0053 Epoch: 15 Global Step: 257080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:35,483-Speed 9179.13 samples/sec Loss 4.0744 LearningRate 0.0053 Epoch: 15 Global Step: 257090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:36,555-Speed 9559.27 samples/sec Loss 4.1150 LearningRate 0.0053 Epoch: 15 Global Step: 257100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:37,676-Speed 9139.55 samples/sec Loss 4.1753 LearningRate 0.0053 Epoch: 15 Global Step: 257110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:38,744-Speed 9594.96 samples/sec Loss 4.0269 LearningRate 0.0053 Epoch: 15 Global Step: 257120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:39,833-Speed 9408.57 samples/sec Loss 4.1622 LearningRate 0.0053 Epoch: 15 Global Step: 257130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:40,932-Speed 9326.41 samples/sec Loss 4.0699 LearningRate 0.0053 Epoch: 15 Global Step: 257140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:41,989-Speed 9691.92 samples/sec Loss 4.1604 LearningRate 0.0053 Epoch: 15 Global Step: 257150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:43,072-Speed 9461.97 samples/sec Loss 4.0456 LearningRate 0.0053 Epoch: 15 Global Step: 257160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:44,176-Speed 9280.70 samples/sec Loss 4.1103 LearningRate 0.0053 Epoch: 15 Global Step: 257170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:45,265-Speed 9403.43 samples/sec Loss 4.1332 LearningRate 0.0053 Epoch: 15 Global Step: 257180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:46,348-Speed 9464.42 samples/sec Loss 4.1138 LearningRate 0.0053 Epoch: 15 Global Step: 257190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:47,513-Speed 8795.42 samples/sec Loss 4.0829 LearningRate 0.0053 Epoch: 15 Global Step: 257200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:48,625-Speed 9226.03 samples/sec Loss 4.0962 LearningRate 0.0053 Epoch: 15 Global Step: 257210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:55:49,671-Speed 9796.09 samples/sec Loss 4.1631 LearningRate 0.0053 Epoch: 15 Global Step: 257220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:50,730-Speed 9674.50 samples/sec Loss 4.0456 LearningRate 0.0053 Epoch: 15 Global Step: 257230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:51,858-Speed 9082.09 samples/sec Loss 4.1188 LearningRate 0.0053 Epoch: 15 Global Step: 257240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:52,957-Speed 9325.81 samples/sec Loss 4.1222 LearningRate 0.0053 Epoch: 15 Global Step: 257250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:54,013-Speed 9702.17 samples/sec Loss 4.1602 LearningRate 0.0053 Epoch: 15 Global Step: 257260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:55,103-Speed 9399.24 samples/sec Loss 4.1034 LearningRate 0.0053 Epoch: 15 Global Step: 257270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:56,190-Speed 9423.96 samples/sec Loss 4.0715 LearningRate 0.0053 Epoch: 15 Global Step: 257280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:57,272-Speed 9468.26 samples/sec Loss 4.1748 LearningRate 0.0053 Epoch: 15 Global Step: 257290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:58,351-Speed 9492.54 samples/sec Loss 4.0531 LearningRate 0.0053 Epoch: 15 Global Step: 257300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:55:59,458-Speed 9260.79 samples/sec Loss 4.1903 LearningRate 0.0053 Epoch: 15 Global Step: 257310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:00,599-Speed 8979.45 samples/sec Loss 4.1989 LearningRate 0.0053 Epoch: 15 Global Step: 257320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:01,657-Speed 9683.72 samples/sec Loss 4.1419 LearningRate 0.0053 Epoch: 15 Global Step: 257330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:02,724-Speed 9604.74 samples/sec Loss 4.1246 LearningRate 0.0052 Epoch: 15 Global Step: 257340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:03,831-Speed 9250.76 samples/sec Loss 4.0800 LearningRate 0.0052 Epoch: 15 Global Step: 257350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:04,924-Speed 9376.77 samples/sec Loss 4.1109 LearningRate 0.0052 Epoch: 15 Global Step: 257360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:05,960-Speed 9891.76 samples/sec Loss 4.0965 LearningRate 0.0052 Epoch: 15 Global Step: 257370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:07,014-Speed 9721.22 samples/sec Loss 4.0393 LearningRate 0.0052 Epoch: 15 Global Step: 257380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:08,062-Speed 9779.14 samples/sec Loss 4.0263 LearningRate 0.0052 Epoch: 15 Global Step: 257390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:09,154-Speed 9378.82 samples/sec Loss 4.2534 LearningRate 0.0052 Epoch: 15 Global Step: 257400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:10,251-Speed 9343.93 samples/sec Loss 4.1472 LearningRate 0.0052 Epoch: 15 Global Step: 257410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:11,311-Speed 9657.23 samples/sec Loss 4.0888 LearningRate 0.0052 Epoch: 15 Global Step: 257420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:12,402-Speed 9401.60 samples/sec Loss 4.1411 LearningRate 0.0052 Epoch: 15 Global Step: 257430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:13,474-Speed 9552.05 samples/sec Loss 4.1140 LearningRate 0.0052 Epoch: 15 Global Step: 257440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:14,540-Speed 9612.28 samples/sec Loss 4.0754 LearningRate 0.0052 Epoch: 15 Global Step: 257450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:15,646-Speed 9262.70 samples/sec Loss 4.0863 LearningRate 0.0052 Epoch: 15 Global Step: 257460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:16,783-Speed 9016.96 samples/sec Loss 4.1782 LearningRate 0.0052 Epoch: 15 Global Step: 257470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:17,907-Speed 9113.20 samples/sec Loss 4.1456 LearningRate 0.0052 Epoch: 15 Global Step: 257480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:19,087-Speed 8686.20 samples/sec Loss 4.1586 LearningRate 0.0052 Epoch: 15 Global Step: 257490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:20,147-Speed 9661.68 samples/sec Loss 4.1619 LearningRate 0.0052 Epoch: 15 Global Step: 257500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:21,214-Speed 9600.56 samples/sec Loss 4.1880 LearningRate 0.0052 Epoch: 15 Global Step: 257510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:22,289-Speed 9531.34 samples/sec Loss 4.1041 LearningRate 0.0052 Epoch: 15 Global Step: 257520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:23,380-Speed 9392.61 samples/sec Loss 4.1490 LearningRate 0.0052 Epoch: 15 Global Step: 257530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:24,473-Speed 9381.69 samples/sec Loss 4.1003 LearningRate 0.0052 Epoch: 15 Global Step: 257540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:25,555-Speed 9465.51 samples/sec Loss 4.1251 LearningRate 0.0052 Epoch: 15 Global Step: 257550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:26,653-Speed 9328.66 samples/sec Loss 4.1285 LearningRate 0.0052 Epoch: 15 Global Step: 257560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:27,789-Speed 9024.35 samples/sec Loss 4.1065 LearningRate 0.0052 Epoch: 15 Global Step: 257570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:28,857-Speed 9594.85 samples/sec Loss 4.1376 LearningRate 0.0052 Epoch: 15 Global Step: 257580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:29,955-Speed 9328.35 samples/sec Loss 4.1587 LearningRate 0.0052 Epoch: 15 Global Step: 257590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:31,081-Speed 9112.68 samples/sec Loss 4.1002 LearningRate 0.0052 Epoch: 15 Global Step: 257600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:32,246-Speed 8794.04 samples/sec Loss 4.0705 LearningRate 0.0052 Epoch: 15 Global Step: 257610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:33,338-Speed 9384.74 samples/sec Loss 4.1400 LearningRate 0.0052 Epoch: 15 Global Step: 257620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:34,439-Speed 9302.61 samples/sec Loss 4.1163 LearningRate 0.0052 Epoch: 15 Global Step: 257630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:35,527-Speed 9420.43 samples/sec Loss 4.1001 LearningRate 0.0052 Epoch: 15 Global Step: 257640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:36,644-Speed 9166.93 samples/sec Loss 4.0838 LearningRate 0.0052 Epoch: 15 Global Step: 257650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:37,790-Speed 8943.71 samples/sec Loss 4.1813 LearningRate 0.0052 Epoch: 15 Global Step: 257660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:38,885-Speed 9356.01 samples/sec Loss 4.0553 LearningRate 0.0052 Epoch: 15 Global Step: 257670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:39,975-Speed 9402.19 samples/sec Loss 4.0774 LearningRate 0.0052 Epoch: 15 Global Step: 257680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:41,061-Speed 9434.07 samples/sec Loss 4.1484 LearningRate 0.0052 Epoch: 15 Global Step: 257690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:42,126-Speed 9622.59 samples/sec Loss 4.0291 LearningRate 0.0052 Epoch: 15 Global Step: 257700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:43,182-Speed 9704.31 samples/sec Loss 4.0950 LearningRate 0.0052 Epoch: 15 Global Step: 257710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:44,272-Speed 9395.18 samples/sec Loss 4.0685 LearningRate 0.0052 Epoch: 15 Global Step: 257720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:45,376-Speed 9285.53 samples/sec Loss 4.1876 LearningRate 0.0052 Epoch: 15 Global Step: 257730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:46,481-Speed 9267.27 samples/sec Loss 4.1846 LearningRate 0.0052 Epoch: 15 Global Step: 257740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:47,593-Speed 9213.13 samples/sec Loss 4.1437 LearningRate 0.0052 Epoch: 15 Global Step: 257750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:48,670-Speed 9520.05 samples/sec Loss 4.2767 LearningRate 0.0052 Epoch: 15 Global Step: 257760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:49,751-Speed 9479.46 samples/sec Loss 4.1169 LearningRate 0.0052 Epoch: 15 Global Step: 257770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:50,868-Speed 9173.23 samples/sec Loss 4.1588 LearningRate 0.0052 Epoch: 15 Global Step: 257780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:51,953-Speed 9439.47 samples/sec Loss 4.1373 LearningRate 0.0052 Epoch: 15 Global Step: 257790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:56:53,054-Speed 9306.52 samples/sec Loss 4.0423 LearningRate 0.0052 Epoch: 15 Global Step: 257800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:54,133-Speed 9512.46 samples/sec Loss 4.1484 LearningRate 0.0052 Epoch: 15 Global Step: 257810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:55,248-Speed 9184.59 samples/sec Loss 4.1810 LearningRate 0.0052 Epoch: 15 Global Step: 257820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:56,357-Speed 9242.81 samples/sec Loss 4.1365 LearningRate 0.0052 Epoch: 15 Global Step: 257830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:57,487-Speed 9070.40 samples/sec Loss 4.1102 LearningRate 0.0052 Epoch: 15 Global Step: 257840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:58,603-Speed 9183.34 samples/sec Loss 4.2298 LearningRate 0.0052 Epoch: 15 Global Step: 257850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:56:59,697-Speed 9362.76 samples/sec Loss 4.1767 LearningRate 0.0052 Epoch: 15 Global Step: 257860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:00,839-Speed 8973.39 samples/sec Loss 4.0935 LearningRate 0.0052 Epoch: 15 Global Step: 257870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:01,897-Speed 9691.07 samples/sec Loss 4.0721 LearningRate 0.0052 Epoch: 15 Global Step: 257880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:02,951-Speed 9720.19 samples/sec Loss 4.1167 LearningRate 0.0052 Epoch: 15 Global Step: 257890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:04,054-Speed 9284.68 samples/sec Loss 4.2073 LearningRate 0.0052 Epoch: 15 Global Step: 257900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:57:05,164-Speed 9234.22 samples/sec Loss 4.1972 LearningRate 0.0052 Epoch: 15 Global Step: 257910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:57:06,275-Speed 9225.75 samples/sec Loss 4.1194 LearningRate 0.0052 Epoch: 15 Global Step: 257920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:57:07,357-Speed 9467.79 samples/sec Loss 4.1427 LearningRate 0.0052 Epoch: 15 Global Step: 257930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:08,419-Speed 9648.91 samples/sec Loss 4.0056 LearningRate 0.0052 Epoch: 15 Global Step: 257940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:09,516-Speed 9339.69 samples/sec Loss 4.1211 LearningRate 0.0052 Epoch: 15 Global Step: 257950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:10,585-Speed 9582.24 samples/sec Loss 4.0686 LearningRate 0.0052 Epoch: 15 Global Step: 257960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:11,713-Speed 9087.04 samples/sec Loss 4.1684 LearningRate 0.0052 Epoch: 15 Global Step: 257970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:12,797-Speed 9451.86 samples/sec Loss 4.1801 LearningRate 0.0052 Epoch: 15 Global Step: 257980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:13,919-Speed 9130.15 samples/sec Loss 4.1599 LearningRate 0.0052 Epoch: 15 Global Step: 257990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:15,029-Speed 9231.78 samples/sec Loss 4.1362 LearningRate 0.0052 Epoch: 15 Global Step: 258000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:57:37,352-[lfw][258000]XNorm: 7.148704 Training: 2022-04-11 21:57:37,352-[lfw][258000]Accuracy-Flip: 0.99717+-0.00299 Training: 2022-04-11 21:57:37,353-[lfw][258000]Accuracy-Highest: 0.99733 Training: 2022-04-11 21:58:02,795-[cfp_fp][258000]XNorm: 6.214708 Training: 2022-04-11 21:58:02,796-[cfp_fp][258000]Accuracy-Flip: 0.97086+-0.00853 Training: 2022-04-11 21:58:02,797-[cfp_fp][258000]Accuracy-Highest: 0.97143 Training: 2022-04-11 21:58:24,794-[agedb_30][258000]XNorm: 6.985307 Training: 2022-04-11 21:58:24,795-[agedb_30][258000]Accuracy-Flip: 0.97083+-0.01014 Training: 2022-04-11 21:58:24,795-[agedb_30][258000]Accuracy-Highest: 0.97350 Training: 2022-04-11 21:58:25,911-Speed 144.47 samples/sec Loss 4.0846 LearningRate 0.0052 Epoch: 15 Global Step: 258010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:26,983-Speed 9556.50 samples/sec Loss 4.1335 LearningRate 0.0052 Epoch: 15 Global Step: 258020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:28,124-Speed 8982.82 samples/sec Loss 4.1306 LearningRate 0.0052 Epoch: 15 Global Step: 258030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:58:29,200-Speed 9513.23 samples/sec Loss 4.0809 LearningRate 0.0052 Epoch: 15 Global Step: 258040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:30,285-Speed 9443.21 samples/sec Loss 4.2575 LearningRate 0.0052 Epoch: 15 Global Step: 258050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:31,389-Speed 9284.38 samples/sec Loss 4.2614 LearningRate 0.0052 Epoch: 15 Global Step: 258060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:32,496-Speed 9257.67 samples/sec Loss 4.1540 LearningRate 0.0051 Epoch: 15 Global Step: 258070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:33,658-Speed 8816.22 samples/sec Loss 4.1871 LearningRate 0.0051 Epoch: 15 Global Step: 258080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:34,773-Speed 9195.68 samples/sec Loss 4.1268 LearningRate 0.0051 Epoch: 15 Global Step: 258090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:35,846-Speed 9542.96 samples/sec Loss 4.1432 LearningRate 0.0051 Epoch: 15 Global Step: 258100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:36,964-Speed 9163.54 samples/sec Loss 4.1290 LearningRate 0.0051 Epoch: 15 Global Step: 258110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:38,046-Speed 9471.48 samples/sec Loss 4.1246 LearningRate 0.0051 Epoch: 15 Global Step: 258120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:39,162-Speed 9181.37 samples/sec Loss 4.1206 LearningRate 0.0051 Epoch: 15 Global Step: 258130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:40,229-Speed 9607.24 samples/sec Loss 4.1737 LearningRate 0.0051 Epoch: 15 Global Step: 258140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:58:41,372-Speed 8962.52 samples/sec Loss 4.2068 LearningRate 0.0051 Epoch: 15 Global Step: 258150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:58:42,503-Speed 9072.01 samples/sec Loss 4.0543 LearningRate 0.0051 Epoch: 15 Global Step: 258160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:58:43,562-Speed 9668.60 samples/sec Loss 4.1038 LearningRate 0.0051 Epoch: 15 Global Step: 258170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:58:44,634-Speed 9556.91 samples/sec Loss 4.1571 LearningRate 0.0051 Epoch: 15 Global Step: 258180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:58:45,672-Speed 9869.38 samples/sec Loss 4.0303 LearningRate 0.0051 Epoch: 15 Global Step: 258190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:58:46,841-Speed 8766.41 samples/sec Loss 4.0875 LearningRate 0.0051 Epoch: 15 Global Step: 258200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:58:47,956-Speed 9189.83 samples/sec Loss 4.1903 LearningRate 0.0051 Epoch: 15 Global Step: 258210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:49,072-Speed 9183.17 samples/sec Loss 4.0615 LearningRate 0.0051 Epoch: 15 Global Step: 258220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:50,172-Speed 9313.56 samples/sec Loss 4.1964 LearningRate 0.0051 Epoch: 15 Global Step: 258230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:51,249-Speed 9510.62 samples/sec Loss 4.1150 LearningRate 0.0051 Epoch: 15 Global Step: 258240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:52,373-Speed 9120.82 samples/sec Loss 4.1297 LearningRate 0.0051 Epoch: 15 Global Step: 258250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:53,513-Speed 8984.67 samples/sec Loss 4.0790 LearningRate 0.0051 Epoch: 15 Global Step: 258260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:54,625-Speed 9216.23 samples/sec Loss 4.0925 LearningRate 0.0051 Epoch: 15 Global Step: 258270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:55,756-Speed 9055.70 samples/sec Loss 4.1316 LearningRate 0.0051 Epoch: 15 Global Step: 258280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:56,849-Speed 9369.79 samples/sec Loss 4.1137 LearningRate 0.0051 Epoch: 15 Global Step: 258290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:57,907-Speed 9689.29 samples/sec Loss 4.0229 LearningRate 0.0051 Epoch: 15 Global Step: 258300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:58:59,004-Speed 9341.07 samples/sec Loss 4.1300 LearningRate 0.0051 Epoch: 15 Global Step: 258310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:00,110-Speed 9256.74 samples/sec Loss 4.1924 LearningRate 0.0051 Epoch: 15 Global Step: 258320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:01,181-Speed 9577.74 samples/sec Loss 4.1395 LearningRate 0.0051 Epoch: 15 Global Step: 258330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:02,317-Speed 9018.53 samples/sec Loss 4.1449 LearningRate 0.0051 Epoch: 15 Global Step: 258340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:03,472-Speed 8873.05 samples/sec Loss 4.0988 LearningRate 0.0051 Epoch: 15 Global Step: 258350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:04,551-Speed 9493.04 samples/sec Loss 4.1283 LearningRate 0.0051 Epoch: 15 Global Step: 258360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:05,601-Speed 9753.74 samples/sec Loss 4.1402 LearningRate 0.0051 Epoch: 15 Global Step: 258370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:06,696-Speed 9363.47 samples/sec Loss 4.1909 LearningRate 0.0051 Epoch: 15 Global Step: 258380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:07,795-Speed 9318.83 samples/sec Loss 4.0892 LearningRate 0.0051 Epoch: 15 Global Step: 258390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:08,889-Speed 9368.40 samples/sec Loss 4.2120 LearningRate 0.0051 Epoch: 15 Global Step: 258400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:09,943-Speed 9721.33 samples/sec Loss 4.1289 LearningRate 0.0051 Epoch: 15 Global Step: 258410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:11,070-Speed 9084.85 samples/sec Loss 4.1227 LearningRate 0.0051 Epoch: 15 Global Step: 258420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:12,156-Speed 9437.57 samples/sec Loss 4.1139 LearningRate 0.0051 Epoch: 15 Global Step: 258430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:13,289-Speed 9047.36 samples/sec Loss 4.1601 LearningRate 0.0051 Epoch: 15 Global Step: 258440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:14,383-Speed 9364.72 samples/sec Loss 4.0954 LearningRate 0.0051 Epoch: 15 Global Step: 258450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:15,492-Speed 9240.74 samples/sec Loss 4.0345 LearningRate 0.0051 Epoch: 15 Global Step: 258460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:16,557-Speed 9613.48 samples/sec Loss 4.1797 LearningRate 0.0051 Epoch: 15 Global Step: 258470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:17,662-Speed 9271.58 samples/sec Loss 4.1088 LearningRate 0.0051 Epoch: 15 Global Step: 258480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:18,768-Speed 9270.14 samples/sec Loss 4.1419 LearningRate 0.0051 Epoch: 15 Global Step: 258490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:19,859-Speed 9398.81 samples/sec Loss 4.1662 LearningRate 0.0051 Epoch: 15 Global Step: 258500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:20,946-Speed 9424.50 samples/sec Loss 4.2438 LearningRate 0.0051 Epoch: 15 Global Step: 258510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:22,022-Speed 9524.81 samples/sec Loss 4.1555 LearningRate 0.0051 Epoch: 15 Global Step: 258520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:23,119-Speed 9341.39 samples/sec Loss 4.0495 LearningRate 0.0051 Epoch: 15 Global Step: 258530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:24,190-Speed 9569.64 samples/sec Loss 4.1335 LearningRate 0.0051 Epoch: 15 Global Step: 258540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:25,304-Speed 9192.16 samples/sec Loss 4.0917 LearningRate 0.0051 Epoch: 15 Global Step: 258550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:26,447-Speed 8961.69 samples/sec Loss 4.1745 LearningRate 0.0051 Epoch: 15 Global Step: 258560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:27,510-Speed 9637.42 samples/sec Loss 4.1413 LearningRate 0.0051 Epoch: 15 Global Step: 258570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:28,575-Speed 9622.00 samples/sec Loss 4.1713 LearningRate 0.0051 Epoch: 15 Global Step: 258580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:29,685-Speed 9233.30 samples/sec Loss 4.2048 LearningRate 0.0051 Epoch: 15 Global Step: 258590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:30,790-Speed 9270.30 samples/sec Loss 4.0936 LearningRate 0.0051 Epoch: 15 Global Step: 258600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:31,891-Speed 9312.02 samples/sec Loss 4.1740 LearningRate 0.0051 Epoch: 15 Global Step: 258610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:32,994-Speed 9282.48 samples/sec Loss 4.1696 LearningRate 0.0051 Epoch: 15 Global Step: 258620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:34,098-Speed 9281.89 samples/sec Loss 4.1057 LearningRate 0.0051 Epoch: 15 Global Step: 258630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:35,186-Speed 9418.57 samples/sec Loss 4.0734 LearningRate 0.0051 Epoch: 15 Global Step: 258640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:36,262-Speed 9525.70 samples/sec Loss 4.0862 LearningRate 0.0051 Epoch: 15 Global Step: 258650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:37,374-Speed 9212.96 samples/sec Loss 4.1073 LearningRate 0.0051 Epoch: 15 Global Step: 258660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:38,470-Speed 9352.18 samples/sec Loss 4.2419 LearningRate 0.0051 Epoch: 15 Global Step: 258670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:39,551-Speed 9477.30 samples/sec Loss 4.0855 LearningRate 0.0051 Epoch: 15 Global Step: 258680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:40,629-Speed 9496.17 samples/sec Loss 4.1428 LearningRate 0.0051 Epoch: 15 Global Step: 258690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:41,695-Speed 9619.41 samples/sec Loss 4.1857 LearningRate 0.0051 Epoch: 15 Global Step: 258700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:42,815-Speed 9151.78 samples/sec Loss 4.0894 LearningRate 0.0051 Epoch: 15 Global Step: 258710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:43,906-Speed 9387.29 samples/sec Loss 4.1727 LearningRate 0.0051 Epoch: 15 Global Step: 258720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:44,991-Speed 9447.99 samples/sec Loss 4.0932 LearningRate 0.0051 Epoch: 15 Global Step: 258730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:46,089-Speed 9325.77 samples/sec Loss 4.1889 LearningRate 0.0051 Epoch: 15 Global Step: 258740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:47,152-Speed 9638.91 samples/sec Loss 4.2205 LearningRate 0.0051 Epoch: 15 Global Step: 258750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:48,206-Speed 9719.63 samples/sec Loss 4.1453 LearningRate 0.0051 Epoch: 15 Global Step: 258760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 21:59:49,308-Speed 9299.50 samples/sec Loss 4.0827 LearningRate 0.0051 Epoch: 15 Global Step: 258770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:50,382-Speed 9547.53 samples/sec Loss 4.2092 LearningRate 0.0051 Epoch: 15 Global Step: 258780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:51,492-Speed 9231.93 samples/sec Loss 4.0984 LearningRate 0.0051 Epoch: 15 Global Step: 258790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:52,568-Speed 9517.72 samples/sec Loss 4.1517 LearningRate 0.0051 Epoch: 15 Global Step: 258800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:53,690-Speed 9136.12 samples/sec Loss 4.1461 LearningRate 0.0050 Epoch: 15 Global Step: 258810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:54,810-Speed 9143.53 samples/sec Loss 4.1295 LearningRate 0.0050 Epoch: 15 Global Step: 258820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:55,926-Speed 9180.50 samples/sec Loss 4.0911 LearningRate 0.0050 Epoch: 15 Global Step: 258830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:57,080-Speed 8879.45 samples/sec Loss 4.1526 LearningRate 0.0050 Epoch: 15 Global Step: 258840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:58,199-Speed 9155.77 samples/sec Loss 4.1230 LearningRate 0.0050 Epoch: 15 Global Step: 258850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 21:59:59,337-Speed 9004.04 samples/sec Loss 4.2522 LearningRate 0.0050 Epoch: 15 Global Step: 258860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:00,489-Speed 8893.12 samples/sec Loss 4.2240 LearningRate 0.0050 Epoch: 15 Global Step: 258870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:01,596-Speed 9258.26 samples/sec Loss 4.1608 LearningRate 0.0050 Epoch: 15 Global Step: 258880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:02,723-Speed 9091.43 samples/sec Loss 4.1455 LearningRate 0.0050 Epoch: 15 Global Step: 258890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:03,801-Speed 9505.57 samples/sec Loss 4.1826 LearningRate 0.0050 Epoch: 15 Global Step: 258900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:04,892-Speed 9392.89 samples/sec Loss 4.0750 LearningRate 0.0050 Epoch: 15 Global Step: 258910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:06,003-Speed 9216.15 samples/sec Loss 4.0898 LearningRate 0.0050 Epoch: 15 Global Step: 258920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:07,126-Speed 9122.01 samples/sec Loss 4.1467 LearningRate 0.0050 Epoch: 15 Global Step: 258930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:08,229-Speed 9290.99 samples/sec Loss 4.0582 LearningRate 0.0050 Epoch: 15 Global Step: 258940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:09,340-Speed 9226.40 samples/sec Loss 4.1595 LearningRate 0.0050 Epoch: 15 Global Step: 258950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:10,414-Speed 9535.89 samples/sec Loss 4.0704 LearningRate 0.0050 Epoch: 15 Global Step: 258960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:11,481-Speed 9602.66 samples/sec Loss 4.1678 LearningRate 0.0050 Epoch: 15 Global Step: 258970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:12,546-Speed 9620.61 samples/sec Loss 4.0953 LearningRate 0.0050 Epoch: 15 Global Step: 258980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:13,701-Speed 8872.76 samples/sec Loss 4.0782 LearningRate 0.0050 Epoch: 15 Global Step: 258990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:14,793-Speed 9387.77 samples/sec Loss 4.1198 LearningRate 0.0050 Epoch: 15 Global Step: 259000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:15,922-Speed 9072.08 samples/sec Loss 4.1426 LearningRate 0.0050 Epoch: 15 Global Step: 259010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:16,975-Speed 9729.85 samples/sec Loss 4.1779 LearningRate 0.0050 Epoch: 15 Global Step: 259020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:18,080-Speed 9276.47 samples/sec Loss 4.0911 LearningRate 0.0050 Epoch: 15 Global Step: 259030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:19,207-Speed 9094.89 samples/sec Loss 4.1363 LearningRate 0.0050 Epoch: 15 Global Step: 259040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:20,282-Speed 9526.27 samples/sec Loss 4.0469 LearningRate 0.0050 Epoch: 15 Global Step: 259050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:21,367-Speed 9442.16 samples/sec Loss 4.0553 LearningRate 0.0050 Epoch: 15 Global Step: 259060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:22,514-Speed 8932.62 samples/sec Loss 4.2023 LearningRate 0.0050 Epoch: 15 Global Step: 259070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:23,597-Speed 9465.01 samples/sec Loss 4.0869 LearningRate 0.0050 Epoch: 15 Global Step: 259080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:24,664-Speed 9601.02 samples/sec Loss 4.0144 LearningRate 0.0050 Epoch: 15 Global Step: 259090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:25,770-Speed 9265.74 samples/sec Loss 4.2339 LearningRate 0.0050 Epoch: 15 Global Step: 259100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:26,860-Speed 9394.61 samples/sec Loss 4.1199 LearningRate 0.0050 Epoch: 15 Global Step: 259110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:27,962-Speed 9303.00 samples/sec Loss 4.0674 LearningRate 0.0050 Epoch: 15 Global Step: 259120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:29,077-Speed 9187.47 samples/sec Loss 4.2122 LearningRate 0.0050 Epoch: 15 Global Step: 259130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:30,192-Speed 9190.05 samples/sec Loss 4.1236 LearningRate 0.0050 Epoch: 15 Global Step: 259140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:31,257-Speed 9625.54 samples/sec Loss 4.1655 LearningRate 0.0050 Epoch: 15 Global Step: 259150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:32,363-Speed 9259.78 samples/sec Loss 4.1324 LearningRate 0.0050 Epoch: 15 Global Step: 259160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:33,504-Speed 8980.95 samples/sec Loss 4.1234 LearningRate 0.0050 Epoch: 15 Global Step: 259170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:34,647-Speed 8968.39 samples/sec Loss 4.0401 LearningRate 0.0050 Epoch: 15 Global Step: 259180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:35,715-Speed 9593.18 samples/sec Loss 4.2152 LearningRate 0.0050 Epoch: 15 Global Step: 259190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:36,807-Speed 9378.62 samples/sec Loss 4.0902 LearningRate 0.0050 Epoch: 15 Global Step: 259200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:37,913-Speed 9263.94 samples/sec Loss 4.0940 LearningRate 0.0050 Epoch: 15 Global Step: 259210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:39,010-Speed 9339.31 samples/sec Loss 4.1769 LearningRate 0.0050 Epoch: 15 Global Step: 259220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:40,084-Speed 9542.10 samples/sec Loss 4.0701 LearningRate 0.0050 Epoch: 15 Global Step: 259230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:41,130-Speed 9791.79 samples/sec Loss 4.0977 LearningRate 0.0050 Epoch: 15 Global Step: 259240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:42,210-Speed 9489.78 samples/sec Loss 4.1545 LearningRate 0.0050 Epoch: 15 Global Step: 259250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:43,346-Speed 9025.09 samples/sec Loss 4.1800 LearningRate 0.0050 Epoch: 15 Global Step: 259260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:44,395-Speed 9767.27 samples/sec Loss 4.1379 LearningRate 0.0050 Epoch: 15 Global Step: 259270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:45,462-Speed 9599.18 samples/sec Loss 4.1987 LearningRate 0.0050 Epoch: 15 Global Step: 259280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:46,518-Speed 9699.10 samples/sec Loss 4.1378 LearningRate 0.0050 Epoch: 15 Global Step: 259290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:47,637-Speed 9159.71 samples/sec Loss 4.0865 LearningRate 0.0050 Epoch: 15 Global Step: 259300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:48,764-Speed 9089.43 samples/sec Loss 4.1401 LearningRate 0.0050 Epoch: 15 Global Step: 259310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:49,829-Speed 9624.96 samples/sec Loss 4.0383 LearningRate 0.0050 Epoch: 15 Global Step: 259320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:50,918-Speed 9414.66 samples/sec Loss 4.0611 LearningRate 0.0050 Epoch: 15 Global Step: 259330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:00:52,026-Speed 9245.46 samples/sec Loss 4.0718 LearningRate 0.0050 Epoch: 15 Global Step: 259340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:53,135-Speed 9235.46 samples/sec Loss 4.2239 LearningRate 0.0050 Epoch: 15 Global Step: 259350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:54,241-Speed 9270.08 samples/sec Loss 4.1061 LearningRate 0.0050 Epoch: 15 Global Step: 259360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:55,360-Speed 9153.28 samples/sec Loss 4.1770 LearningRate 0.0050 Epoch: 15 Global Step: 259370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:56,479-Speed 9154.53 samples/sec Loss 4.1539 LearningRate 0.0050 Epoch: 15 Global Step: 259380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:57,590-Speed 9220.19 samples/sec Loss 4.1842 LearningRate 0.0050 Epoch: 15 Global Step: 259390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:58,677-Speed 9430.14 samples/sec Loss 4.1214 LearningRate 0.0050 Epoch: 15 Global Step: 259400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:00:59,762-Speed 9438.53 samples/sec Loss 4.2026 LearningRate 0.0050 Epoch: 15 Global Step: 259410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:00,832-Speed 9575.03 samples/sec Loss 4.1310 LearningRate 0.0050 Epoch: 15 Global Step: 259420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:01,943-Speed 9230.62 samples/sec Loss 4.1627 LearningRate 0.0050 Epoch: 15 Global Step: 259430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:03,031-Speed 9415.01 samples/sec Loss 4.1057 LearningRate 0.0050 Epoch: 15 Global Step: 259440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:04,116-Speed 9436.57 samples/sec Loss 4.1030 LearningRate 0.0050 Epoch: 15 Global Step: 259450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:05,276-Speed 8834.69 samples/sec Loss 4.0621 LearningRate 0.0050 Epoch: 15 Global Step: 259460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:06,388-Speed 9213.60 samples/sec Loss 4.1491 LearningRate 0.0050 Epoch: 15 Global Step: 259470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:07,464-Speed 9518.86 samples/sec Loss 4.1619 LearningRate 0.0050 Epoch: 15 Global Step: 259480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:08,559-Speed 9360.18 samples/sec Loss 4.1682 LearningRate 0.0050 Epoch: 15 Global Step: 259490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:09,668-Speed 9236.35 samples/sec Loss 4.0692 LearningRate 0.0050 Epoch: 15 Global Step: 259500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:10,748-Speed 9487.50 samples/sec Loss 4.1786 LearningRate 0.0050 Epoch: 15 Global Step: 259510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:11,811-Speed 9644.23 samples/sec Loss 4.0722 LearningRate 0.0050 Epoch: 15 Global Step: 259520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:12,881-Speed 9579.01 samples/sec Loss 4.1547 LearningRate 0.0050 Epoch: 15 Global Step: 259530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:13,959-Speed 9500.60 samples/sec Loss 4.1559 LearningRate 0.0050 Epoch: 15 Global Step: 259540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:15,051-Speed 9388.32 samples/sec Loss 4.1993 LearningRate 0.0049 Epoch: 15 Global Step: 259550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:16,149-Speed 9324.59 samples/sec Loss 4.1655 LearningRate 0.0049 Epoch: 15 Global Step: 259560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:17,221-Speed 9559.74 samples/sec Loss 4.1947 LearningRate 0.0049 Epoch: 15 Global Step: 259570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:18,285-Speed 9631.30 samples/sec Loss 4.1382 LearningRate 0.0049 Epoch: 15 Global Step: 259580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:19,400-Speed 9186.81 samples/sec Loss 4.1075 LearningRate 0.0049 Epoch: 15 Global Step: 259590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:20,500-Speed 9318.79 samples/sec Loss 4.1315 LearningRate 0.0049 Epoch: 15 Global Step: 259600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:21,637-Speed 9008.42 samples/sec Loss 4.0746 LearningRate 0.0049 Epoch: 15 Global Step: 259610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:22,724-Speed 9425.16 samples/sec Loss 4.1474 LearningRate 0.0049 Epoch: 15 Global Step: 259620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:23,818-Speed 9365.52 samples/sec Loss 4.1573 LearningRate 0.0049 Epoch: 15 Global Step: 259630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:24,900-Speed 9472.56 samples/sec Loss 4.0878 LearningRate 0.0049 Epoch: 15 Global Step: 259640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:25,974-Speed 9533.88 samples/sec Loss 4.1913 LearningRate 0.0049 Epoch: 15 Global Step: 259650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:27,093-Speed 9157.77 samples/sec Loss 4.1926 LearningRate 0.0049 Epoch: 15 Global Step: 259660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:28,286-Speed 8589.64 samples/sec Loss 4.0526 LearningRate 0.0049 Epoch: 15 Global Step: 259670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:29,405-Speed 9155.07 samples/sec Loss 4.1160 LearningRate 0.0049 Epoch: 15 Global Step: 259680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:30,526-Speed 9145.16 samples/sec Loss 4.1414 LearningRate 0.0049 Epoch: 15 Global Step: 259690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:31,586-Speed 9670.67 samples/sec Loss 4.1216 LearningRate 0.0049 Epoch: 15 Global Step: 259700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:32,753-Speed 8778.03 samples/sec Loss 4.2567 LearningRate 0.0049 Epoch: 15 Global Step: 259710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:33,852-Speed 9321.06 samples/sec Loss 4.1917 LearningRate 0.0049 Epoch: 15 Global Step: 259720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:34,961-Speed 9239.44 samples/sec Loss 4.1786 LearningRate 0.0049 Epoch: 15 Global Step: 259730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:36,041-Speed 9487.39 samples/sec Loss 4.1775 LearningRate 0.0049 Epoch: 15 Global Step: 259740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:37,091-Speed 9756.06 samples/sec Loss 4.0082 LearningRate 0.0049 Epoch: 15 Global Step: 259750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:38,201-Speed 9226.67 samples/sec Loss 4.1598 LearningRate 0.0049 Epoch: 15 Global Step: 259760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:39,296-Speed 9360.97 samples/sec Loss 4.1699 LearningRate 0.0049 Epoch: 15 Global Step: 259770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:40,436-Speed 8990.39 samples/sec Loss 4.0227 LearningRate 0.0049 Epoch: 15 Global Step: 259780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:41,512-Speed 9521.59 samples/sec Loss 4.1000 LearningRate 0.0049 Epoch: 15 Global Step: 259790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:42,582-Speed 9574.51 samples/sec Loss 4.1502 LearningRate 0.0049 Epoch: 15 Global Step: 259800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:43,671-Speed 9410.79 samples/sec Loss 4.1128 LearningRate 0.0049 Epoch: 15 Global Step: 259810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:44,779-Speed 9246.56 samples/sec Loss 4.1640 LearningRate 0.0049 Epoch: 15 Global Step: 259820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:45,852-Speed 9544.97 samples/sec Loss 4.1303 LearningRate 0.0049 Epoch: 15 Global Step: 259830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:46,965-Speed 9205.62 samples/sec Loss 4.2141 LearningRate 0.0049 Epoch: 15 Global Step: 259840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:48,101-Speed 9021.61 samples/sec Loss 4.0687 LearningRate 0.0049 Epoch: 15 Global Step: 259850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:01:49,257-Speed 8866.03 samples/sec Loss 4.1282 LearningRate 0.0049 Epoch: 15 Global Step: 259860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:50,353-Speed 9350.45 samples/sec Loss 4.1747 LearningRate 0.0049 Epoch: 15 Global Step: 259870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:51,488-Speed 9027.93 samples/sec Loss 4.1089 LearningRate 0.0049 Epoch: 15 Global Step: 259880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:52,646-Speed 8847.88 samples/sec Loss 4.1266 LearningRate 0.0049 Epoch: 15 Global Step: 259890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:53,733-Speed 9417.63 samples/sec Loss 4.0871 LearningRate 0.0049 Epoch: 15 Global Step: 259900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:54,855-Speed 9134.66 samples/sec Loss 4.1508 LearningRate 0.0049 Epoch: 15 Global Step: 259910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:55,945-Speed 9401.48 samples/sec Loss 4.1309 LearningRate 0.0049 Epoch: 15 Global Step: 259920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:57,037-Speed 9377.67 samples/sec Loss 4.1545 LearningRate 0.0049 Epoch: 15 Global Step: 259930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:58,133-Speed 9349.18 samples/sec Loss 4.1894 LearningRate 0.0049 Epoch: 15 Global Step: 259940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:01:59,200-Speed 9606.00 samples/sec Loss 4.2098 LearningRate 0.0049 Epoch: 15 Global Step: 259950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:02:00,290-Speed 9396.31 samples/sec Loss 4.2073 LearningRate 0.0049 Epoch: 15 Global Step: 259960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:02:01,414-Speed 9117.40 samples/sec Loss 4.0808 LearningRate 0.0049 Epoch: 15 Global Step: 259970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:02:02,506-Speed 9386.34 samples/sec Loss 4.1646 LearningRate 0.0049 Epoch: 15 Global Step: 259980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:02:03,551-Speed 9801.94 samples/sec Loss 4.0652 LearningRate 0.0049 Epoch: 15 Global Step: 259990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:02:04,609-Speed 9685.15 samples/sec Loss 4.0816 LearningRate 0.0049 Epoch: 15 Global Step: 260000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:02:26,777-[lfw][260000]XNorm: 7.209153 Training: 2022-04-11 22:02:26,777-[lfw][260000]Accuracy-Flip: 0.99617+-0.00248 Training: 2022-04-11 22:02:26,778-[lfw][260000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:02:52,380-[cfp_fp][260000]XNorm: 6.235661 Training: 2022-04-11 22:02:52,381-[cfp_fp][260000]Accuracy-Flip: 0.96943+-0.00980 Training: 2022-04-11 22:02:52,381-[cfp_fp][260000]Accuracy-Highest: 0.97143 Training: 2022-04-11 22:03:14,502-[agedb_30][260000]XNorm: 7.008566 Training: 2022-04-11 22:03:14,503-[agedb_30][260000]Accuracy-Flip: 0.97083+-0.01023 Training: 2022-04-11 22:03:14,503-[agedb_30][260000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:03:15,581-Speed 144.28 samples/sec Loss 4.1186 LearningRate 0.0049 Epoch: 15 Global Step: 260010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:16,666-Speed 9441.91 samples/sec Loss 4.1005 LearningRate 0.0049 Epoch: 15 Global Step: 260020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:17,751-Speed 9438.35 samples/sec Loss 4.1735 LearningRate 0.0049 Epoch: 15 Global Step: 260030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:18,850-Speed 9322.75 samples/sec Loss 4.1442 LearningRate 0.0049 Epoch: 15 Global Step: 260040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:19,958-Speed 9247.33 samples/sec Loss 4.2354 LearningRate 0.0049 Epoch: 15 Global Step: 260050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:21,028-Speed 9582.20 samples/sec Loss 4.1181 LearningRate 0.0049 Epoch: 15 Global Step: 260060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:22,119-Speed 9391.19 samples/sec Loss 4.1786 LearningRate 0.0049 Epoch: 15 Global Step: 260070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:23,228-Speed 9235.04 samples/sec Loss 4.1843 LearningRate 0.0049 Epoch: 15 Global Step: 260080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:24,325-Speed 9342.46 samples/sec Loss 4.1518 LearningRate 0.0049 Epoch: 15 Global Step: 260090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:25,444-Speed 9152.64 samples/sec Loss 4.2131 LearningRate 0.0049 Epoch: 15 Global Step: 260100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:26,630-Speed 8638.96 samples/sec Loss 4.0794 LearningRate 0.0049 Epoch: 15 Global Step: 260110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:27,706-Speed 9522.64 samples/sec Loss 4.1361 LearningRate 0.0049 Epoch: 15 Global Step: 260120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:28,767-Speed 9662.43 samples/sec Loss 4.1477 LearningRate 0.0049 Epoch: 15 Global Step: 260130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:29,869-Speed 9290.36 samples/sec Loss 4.1305 LearningRate 0.0049 Epoch: 15 Global Step: 260140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:30,966-Speed 9347.25 samples/sec Loss 4.1969 LearningRate 0.0049 Epoch: 15 Global Step: 260150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:32,055-Speed 9402.26 samples/sec Loss 4.1509 LearningRate 0.0049 Epoch: 15 Global Step: 260160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:33,163-Speed 9253.06 samples/sec Loss 4.1870 LearningRate 0.0049 Epoch: 15 Global Step: 260170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:34,260-Speed 9344.02 samples/sec Loss 4.0752 LearningRate 0.0049 Epoch: 15 Global Step: 260180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:35,375-Speed 9185.92 samples/sec Loss 4.2288 LearningRate 0.0049 Epoch: 15 Global Step: 260190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:36,505-Speed 9071.01 samples/sec Loss 4.1419 LearningRate 0.0049 Epoch: 15 Global Step: 260200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:37,676-Speed 8745.83 samples/sec Loss 4.1296 LearningRate 0.0049 Epoch: 15 Global Step: 260210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:38,781-Speed 9275.69 samples/sec Loss 4.1418 LearningRate 0.0049 Epoch: 15 Global Step: 260220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:39,877-Speed 9347.24 samples/sec Loss 4.0466 LearningRate 0.0049 Epoch: 15 Global Step: 260230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:40,955-Speed 9509.85 samples/sec Loss 4.1170 LearningRate 0.0049 Epoch: 15 Global Step: 260240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:42,060-Speed 9272.24 samples/sec Loss 4.0410 LearningRate 0.0049 Epoch: 15 Global Step: 260250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:43,161-Speed 9305.46 samples/sec Loss 4.1563 LearningRate 0.0049 Epoch: 15 Global Step: 260260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:44,266-Speed 9269.18 samples/sec Loss 4.0703 LearningRate 0.0049 Epoch: 15 Global Step: 260270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:45,369-Speed 9289.47 samples/sec Loss 4.1945 LearningRate 0.0049 Epoch: 15 Global Step: 260280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:46,485-Speed 9184.53 samples/sec Loss 4.1435 LearningRate 0.0049 Epoch: 15 Global Step: 260290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:47,560-Speed 9529.94 samples/sec Loss 4.1834 LearningRate 0.0049 Epoch: 15 Global Step: 260300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:48,644-Speed 9451.62 samples/sec Loss 4.1452 LearningRate 0.0048 Epoch: 15 Global Step: 260310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:49,761-Speed 9176.30 samples/sec Loss 4.1680 LearningRate 0.0048 Epoch: 15 Global Step: 260320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:50,889-Speed 9083.33 samples/sec Loss 4.1032 LearningRate 0.0048 Epoch: 15 Global Step: 260330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:51,964-Speed 9527.32 samples/sec Loss 4.1488 LearningRate 0.0048 Epoch: 15 Global Step: 260340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:53,048-Speed 9459.03 samples/sec Loss 4.1524 LearningRate 0.0048 Epoch: 15 Global Step: 260350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:54,180-Speed 9048.20 samples/sec Loss 4.1766 LearningRate 0.0048 Epoch: 15 Global Step: 260360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:55,324-Speed 8957.11 samples/sec Loss 4.1180 LearningRate 0.0048 Epoch: 15 Global Step: 260370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:56,464-Speed 8985.84 samples/sec Loss 4.1656 LearningRate 0.0048 Epoch: 15 Global Step: 260380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:57,549-Speed 9450.97 samples/sec Loss 4.1508 LearningRate 0.0048 Epoch: 15 Global Step: 260390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:03:58,738-Speed 8611.84 samples/sec Loss 4.1851 LearningRate 0.0048 Epoch: 15 Global Step: 260400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:03:59,882-Speed 8957.62 samples/sec Loss 4.1746 LearningRate 0.0048 Epoch: 15 Global Step: 260410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:00,986-Speed 9282.52 samples/sec Loss 4.1947 LearningRate 0.0048 Epoch: 15 Global Step: 260420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:02,099-Speed 9200.81 samples/sec Loss 4.1389 LearningRate 0.0048 Epoch: 15 Global Step: 260430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:03,183-Speed 9455.26 samples/sec Loss 4.0908 LearningRate 0.0048 Epoch: 15 Global Step: 260440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:04,292-Speed 9246.86 samples/sec Loss 4.1064 LearningRate 0.0048 Epoch: 15 Global Step: 260450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:05,417-Speed 9102.75 samples/sec Loss 4.2259 LearningRate 0.0048 Epoch: 15 Global Step: 260460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:06,498-Speed 9481.62 samples/sec Loss 4.1802 LearningRate 0.0048 Epoch: 15 Global Step: 260470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:07,614-Speed 9175.80 samples/sec Loss 4.1966 LearningRate 0.0048 Epoch: 15 Global Step: 260480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:08,729-Speed 9190.42 samples/sec Loss 4.1240 LearningRate 0.0048 Epoch: 15 Global Step: 260490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:09,823-Speed 9368.79 samples/sec Loss 4.1844 LearningRate 0.0048 Epoch: 15 Global Step: 260500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:04:10,897-Speed 9538.88 samples/sec Loss 4.1250 LearningRate 0.0048 Epoch: 15 Global Step: 260510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:04:11,975-Speed 9505.40 samples/sec Loss 4.0996 LearningRate 0.0048 Epoch: 15 Global Step: 260520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:13,116-Speed 8977.37 samples/sec Loss 4.1784 LearningRate 0.0048 Epoch: 15 Global Step: 260530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:14,198-Speed 9472.19 samples/sec Loss 4.1324 LearningRate 0.0048 Epoch: 15 Global Step: 260540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:15,326-Speed 9078.96 samples/sec Loss 4.1093 LearningRate 0.0048 Epoch: 15 Global Step: 260550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:16,435-Speed 9239.74 samples/sec Loss 4.1587 LearningRate 0.0048 Epoch: 15 Global Step: 260560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:17,533-Speed 9338.35 samples/sec Loss 4.1437 LearningRate 0.0048 Epoch: 15 Global Step: 260570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:18,582-Speed 9768.18 samples/sec Loss 4.1027 LearningRate 0.0048 Epoch: 15 Global Step: 260580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:19,638-Speed 9699.16 samples/sec Loss 4.2318 LearningRate 0.0048 Epoch: 15 Global Step: 260590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:20,700-Speed 9643.15 samples/sec Loss 4.1705 LearningRate 0.0048 Epoch: 15 Global Step: 260600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:21,783-Speed 9465.65 samples/sec Loss 4.1271 LearningRate 0.0048 Epoch: 15 Global Step: 260610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:22,869-Speed 9434.05 samples/sec Loss 4.1710 LearningRate 0.0048 Epoch: 15 Global Step: 260620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:04:23,999-Speed 9067.95 samples/sec Loss 4.1557 LearningRate 0.0048 Epoch: 15 Global Step: 260630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:04:25,081-Speed 9464.01 samples/sec Loss 4.0982 LearningRate 0.0048 Epoch: 15 Global Step: 260640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:26,152-Speed 9567.47 samples/sec Loss 4.1225 LearningRate 0.0048 Epoch: 15 Global Step: 260650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:27,236-Speed 9452.99 samples/sec Loss 4.1610 LearningRate 0.0048 Epoch: 15 Global Step: 260660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:28,351-Speed 9191.44 samples/sec Loss 4.0703 LearningRate 0.0048 Epoch: 15 Global Step: 260670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:29,479-Speed 9084.69 samples/sec Loss 4.1942 LearningRate 0.0048 Epoch: 15 Global Step: 260680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:30,612-Speed 9047.34 samples/sec Loss 4.1220 LearningRate 0.0048 Epoch: 15 Global Step: 260690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:31,699-Speed 9423.91 samples/sec Loss 4.1752 LearningRate 0.0048 Epoch: 15 Global Step: 260700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:32,827-Speed 9084.15 samples/sec Loss 4.0804 LearningRate 0.0048 Epoch: 15 Global Step: 260710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:33,900-Speed 9555.56 samples/sec Loss 4.1290 LearningRate 0.0048 Epoch: 15 Global Step: 260720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:35,023-Speed 9124.25 samples/sec Loss 4.1042 LearningRate 0.0048 Epoch: 15 Global Step: 260730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:36,127-Speed 9275.03 samples/sec Loss 4.1582 LearningRate 0.0048 Epoch: 15 Global Step: 260740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:04:37,198-Speed 9566.33 samples/sec Loss 4.0841 LearningRate 0.0048 Epoch: 15 Global Step: 260750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:04:38,280-Speed 9472.04 samples/sec Loss 4.1123 LearningRate 0.0048 Epoch: 15 Global Step: 260760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:04:39,336-Speed 9701.35 samples/sec Loss 4.1333 LearningRate 0.0048 Epoch: 15 Global Step: 260770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:40,455-Speed 9152.53 samples/sec Loss 4.1689 LearningRate 0.0048 Epoch: 15 Global Step: 260780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:41,538-Speed 9464.55 samples/sec Loss 4.1397 LearningRate 0.0048 Epoch: 15 Global Step: 260790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:42,603-Speed 9616.40 samples/sec Loss 4.1513 LearningRate 0.0048 Epoch: 15 Global Step: 260800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:43,714-Speed 9225.54 samples/sec Loss 4.1631 LearningRate 0.0048 Epoch: 15 Global Step: 260810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:44,811-Speed 9335.11 samples/sec Loss 4.1732 LearningRate 0.0048 Epoch: 15 Global Step: 260820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:04:45,959-Speed 8925.81 samples/sec Loss 4.2053 LearningRate 0.0048 Epoch: 15 Global Step: 260830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:04:47,021-Speed 9656.01 samples/sec Loss 4.1292 LearningRate 0.0048 Epoch: 15 Global Step: 260840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:04:48,082-Speed 9657.61 samples/sec Loss 4.2235 LearningRate 0.0048 Epoch: 15 Global Step: 260850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:04:49,128-Speed 9800.32 samples/sec Loss 4.2094 LearningRate 0.0048 Epoch: 15 Global Step: 260860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:04:50,209-Speed 9478.36 samples/sec Loss 4.1667 LearningRate 0.0048 Epoch: 15 Global Step: 260870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:04:51,280-Speed 9568.61 samples/sec Loss 4.1461 LearningRate 0.0048 Epoch: 15 Global Step: 260880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:04:52,394-Speed 9198.67 samples/sec Loss 4.2054 LearningRate 0.0048 Epoch: 15 Global Step: 260890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:04:53,505-Speed 9218.78 samples/sec Loss 4.2781 LearningRate 0.0048 Epoch: 15 Global Step: 260900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:04:54,610-Speed 9276.22 samples/sec Loss 4.2369 LearningRate 0.0048 Epoch: 15 Global Step: 260910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:04:55,683-Speed 9541.90 samples/sec Loss 4.1342 LearningRate 0.0048 Epoch: 15 Global Step: 260920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:56,755-Speed 9569.27 samples/sec Loss 4.1043 LearningRate 0.0048 Epoch: 15 Global Step: 260930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:57,887-Speed 9048.18 samples/sec Loss 4.0885 LearningRate 0.0048 Epoch: 15 Global Step: 260940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:04:58,975-Speed 9413.54 samples/sec Loss 4.1396 LearningRate 0.0048 Epoch: 15 Global Step: 260950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:00,062-Speed 9432.61 samples/sec Loss 4.1309 LearningRate 0.0048 Epoch: 15 Global Step: 260960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:01,152-Speed 9396.71 samples/sec Loss 4.1228 LearningRate 0.0048 Epoch: 15 Global Step: 260970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:02,261-Speed 9238.77 samples/sec Loss 4.0521 LearningRate 0.0048 Epoch: 15 Global Step: 260980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:03,368-Speed 9256.00 samples/sec Loss 4.1699 LearningRate 0.0048 Epoch: 15 Global Step: 260990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:04,472-Speed 9280.93 samples/sec Loss 4.0672 LearningRate 0.0048 Epoch: 15 Global Step: 261000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:05,600-Speed 9084.67 samples/sec Loss 4.1907 LearningRate 0.0048 Epoch: 15 Global Step: 261010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:06,715-Speed 9188.38 samples/sec Loss 4.1633 LearningRate 0.0048 Epoch: 15 Global Step: 261020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:07,838-Speed 9127.15 samples/sec Loss 4.2448 LearningRate 0.0048 Epoch: 15 Global Step: 261030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:08,959-Speed 9137.93 samples/sec Loss 4.1464 LearningRate 0.0048 Epoch: 15 Global Step: 261040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:10,047-Speed 9415.50 samples/sec Loss 4.1141 LearningRate 0.0048 Epoch: 15 Global Step: 261050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:11,135-Speed 9421.73 samples/sec Loss 4.0719 LearningRate 0.0048 Epoch: 15 Global Step: 261060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:12,290-Speed 8869.82 samples/sec Loss 4.2580 LearningRate 0.0047 Epoch: 15 Global Step: 261070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:13,419-Speed 9074.84 samples/sec Loss 4.1905 LearningRate 0.0047 Epoch: 15 Global Step: 261080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:14,560-Speed 8981.56 samples/sec Loss 4.1861 LearningRate 0.0047 Epoch: 15 Global Step: 261090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:15,691-Speed 9055.52 samples/sec Loss 4.1187 LearningRate 0.0047 Epoch: 15 Global Step: 261100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:16,762-Speed 9572.52 samples/sec Loss 4.1618 LearningRate 0.0047 Epoch: 15 Global Step: 261110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:17,846-Speed 9448.53 samples/sec Loss 4.0536 LearningRate 0.0047 Epoch: 15 Global Step: 261120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:18,930-Speed 9453.28 samples/sec Loss 4.1021 LearningRate 0.0047 Epoch: 15 Global Step: 261130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:20,030-Speed 9313.29 samples/sec Loss 4.0349 LearningRate 0.0047 Epoch: 15 Global Step: 261140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:21,118-Speed 9422.20 samples/sec Loss 4.0915 LearningRate 0.0047 Epoch: 15 Global Step: 261150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:22,227-Speed 9240.84 samples/sec Loss 4.1120 LearningRate 0.0047 Epoch: 15 Global Step: 261160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:23,347-Speed 9140.89 samples/sec Loss 4.1576 LearningRate 0.0047 Epoch: 15 Global Step: 261170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:24,422-Speed 9532.14 samples/sec Loss 4.1552 LearningRate 0.0047 Epoch: 15 Global Step: 261180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:25,530-Speed 9248.25 samples/sec Loss 4.0101 LearningRate 0.0047 Epoch: 15 Global Step: 261190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:26,605-Speed 9532.78 samples/sec Loss 4.1543 LearningRate 0.0047 Epoch: 15 Global Step: 261200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:27,726-Speed 9143.72 samples/sec Loss 4.2264 LearningRate 0.0047 Epoch: 15 Global Step: 261210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:28,844-Speed 9167.80 samples/sec Loss 4.1589 LearningRate 0.0047 Epoch: 15 Global Step: 261220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:29,939-Speed 9355.65 samples/sec Loss 4.1476 LearningRate 0.0047 Epoch: 15 Global Step: 261230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:31,016-Speed 9507.71 samples/sec Loss 4.1191 LearningRate 0.0047 Epoch: 15 Global Step: 261240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:32,124-Speed 9254.42 samples/sec Loss 4.1291 LearningRate 0.0047 Epoch: 15 Global Step: 261250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:33,189-Speed 9618.01 samples/sec Loss 4.0743 LearningRate 0.0047 Epoch: 15 Global Step: 261260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:34,256-Speed 9605.90 samples/sec Loss 4.0997 LearningRate 0.0047 Epoch: 15 Global Step: 261270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:35,332-Speed 9524.28 samples/sec Loss 4.2596 LearningRate 0.0047 Epoch: 15 Global Step: 261280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:36,417-Speed 9436.30 samples/sec Loss 4.1589 LearningRate 0.0047 Epoch: 15 Global Step: 261290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:37,510-Speed 9380.50 samples/sec Loss 4.1861 LearningRate 0.0047 Epoch: 15 Global Step: 261300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:38,589-Speed 9496.10 samples/sec Loss 4.2021 LearningRate 0.0047 Epoch: 15 Global Step: 261310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:39,674-Speed 9435.83 samples/sec Loss 4.0891 LearningRate 0.0047 Epoch: 15 Global Step: 261320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:40,735-Speed 9660.60 samples/sec Loss 4.0638 LearningRate 0.0047 Epoch: 15 Global Step: 261330 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-04-11 22:05:41,845-Speed 9233.63 samples/sec Loss 4.0608 LearningRate 0.0047 Epoch: 15 Global Step: 261340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:42,961-Speed 9179.38 samples/sec Loss 4.1663 LearningRate 0.0047 Epoch: 15 Global Step: 261350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:44,104-Speed 8958.38 samples/sec Loss 4.1858 LearningRate 0.0047 Epoch: 15 Global Step: 261360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:45,189-Speed 9442.50 samples/sec Loss 4.1224 LearningRate 0.0047 Epoch: 15 Global Step: 261370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:46,315-Speed 9126.11 samples/sec Loss 4.0410 LearningRate 0.0047 Epoch: 15 Global Step: 261380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:47,371-Speed 9696.05 samples/sec Loss 4.1699 LearningRate 0.0047 Epoch: 15 Global Step: 261390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:48,491-Speed 9148.80 samples/sec Loss 4.1567 LearningRate 0.0047 Epoch: 15 Global Step: 261400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:49,595-Speed 9286.00 samples/sec Loss 4.0915 LearningRate 0.0047 Epoch: 15 Global Step: 261410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:50,680-Speed 9438.33 samples/sec Loss 4.1270 LearningRate 0.0047 Epoch: 15 Global Step: 261420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:51,831-Speed 8905.04 samples/sec Loss 4.1839 LearningRate 0.0047 Epoch: 15 Global Step: 261430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:52,908-Speed 9507.28 samples/sec Loss 4.1663 LearningRate 0.0047 Epoch: 15 Global Step: 261440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:54,008-Speed 9316.13 samples/sec Loss 4.1475 LearningRate 0.0047 Epoch: 15 Global Step: 261450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:05:55,134-Speed 9096.63 samples/sec Loss 4.1471 LearningRate 0.0047 Epoch: 15 Global Step: 261460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:56,215-Speed 9484.83 samples/sec Loss 4.2205 LearningRate 0.0047 Epoch: 15 Global Step: 261470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:57,335-Speed 9151.34 samples/sec Loss 4.1329 LearningRate 0.0047 Epoch: 15 Global Step: 261480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:58,442-Speed 9251.65 samples/sec Loss 4.1031 LearningRate 0.0047 Epoch: 15 Global Step: 261490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:05:59,548-Speed 9266.86 samples/sec Loss 4.1226 LearningRate 0.0047 Epoch: 15 Global Step: 261500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:00,628-Speed 9482.61 samples/sec Loss 4.1139 LearningRate 0.0047 Epoch: 15 Global Step: 261510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:01,769-Speed 8978.89 samples/sec Loss 4.1541 LearningRate 0.0047 Epoch: 15 Global Step: 261520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:02,879-Speed 9232.47 samples/sec Loss 4.1432 LearningRate 0.0047 Epoch: 15 Global Step: 261530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:04,005-Speed 9099.57 samples/sec Loss 4.1361 LearningRate 0.0047 Epoch: 15 Global Step: 261540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:05,123-Speed 9165.97 samples/sec Loss 4.1827 LearningRate 0.0047 Epoch: 15 Global Step: 261550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:06,238-Speed 9188.91 samples/sec Loss 4.2542 LearningRate 0.0047 Epoch: 15 Global Step: 261560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:07,331-Speed 9378.61 samples/sec Loss 4.2063 LearningRate 0.0047 Epoch: 15 Global Step: 261570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:08,443-Speed 9211.28 samples/sec Loss 4.1899 LearningRate 0.0047 Epoch: 15 Global Step: 261580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:09,542-Speed 9320.46 samples/sec Loss 4.1890 LearningRate 0.0047 Epoch: 15 Global Step: 261590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:10,651-Speed 9244.70 samples/sec Loss 4.1484 LearningRate 0.0047 Epoch: 15 Global Step: 261600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:11,709-Speed 9676.48 samples/sec Loss 4.0048 LearningRate 0.0047 Epoch: 15 Global Step: 261610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:12,831-Speed 9136.43 samples/sec Loss 4.1137 LearningRate 0.0047 Epoch: 15 Global Step: 261620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:13,957-Speed 9100.82 samples/sec Loss 4.1174 LearningRate 0.0047 Epoch: 15 Global Step: 261630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:15,040-Speed 9452.66 samples/sec Loss 4.2179 LearningRate 0.0047 Epoch: 15 Global Step: 261640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:16,138-Speed 9333.95 samples/sec Loss 4.1517 LearningRate 0.0047 Epoch: 15 Global Step: 261650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:17,207-Speed 9592.96 samples/sec Loss 4.3379 LearningRate 0.0047 Epoch: 15 Global Step: 261660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:18,301-Speed 9364.92 samples/sec Loss 4.0884 LearningRate 0.0047 Epoch: 15 Global Step: 261670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:19,365-Speed 9622.64 samples/sec Loss 4.2419 LearningRate 0.0047 Epoch: 15 Global Step: 261680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:20,435-Speed 9574.26 samples/sec Loss 4.1834 LearningRate 0.0047 Epoch: 15 Global Step: 261690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:21,528-Speed 9377.10 samples/sec Loss 4.1876 LearningRate 0.0047 Epoch: 15 Global Step: 261700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:22,646-Speed 9166.52 samples/sec Loss 4.1842 LearningRate 0.0047 Epoch: 15 Global Step: 261710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:23,773-Speed 9089.71 samples/sec Loss 4.1868 LearningRate 0.0047 Epoch: 15 Global Step: 261720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:24,893-Speed 9153.13 samples/sec Loss 4.1179 LearningRate 0.0047 Epoch: 15 Global Step: 261730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:25,946-Speed 9728.72 samples/sec Loss 4.1447 LearningRate 0.0047 Epoch: 15 Global Step: 261740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:27,062-Speed 9180.85 samples/sec Loss 4.1100 LearningRate 0.0047 Epoch: 15 Global Step: 261750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:28,161-Speed 9321.27 samples/sec Loss 4.1718 LearningRate 0.0047 Epoch: 15 Global Step: 261760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:29,295-Speed 9035.57 samples/sec Loss 4.0368 LearningRate 0.0047 Epoch: 15 Global Step: 261770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:30,377-Speed 9469.38 samples/sec Loss 4.1712 LearningRate 0.0047 Epoch: 15 Global Step: 261780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:31,510-Speed 9040.61 samples/sec Loss 4.0375 LearningRate 0.0047 Epoch: 15 Global Step: 261790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:32,579-Speed 9591.79 samples/sec Loss 4.1126 LearningRate 0.0047 Epoch: 15 Global Step: 261800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:33,685-Speed 9260.27 samples/sec Loss 4.0773 LearningRate 0.0047 Epoch: 15 Global Step: 261810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:34,780-Speed 9358.91 samples/sec Loss 4.1898 LearningRate 0.0047 Epoch: 15 Global Step: 261820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:35,934-Speed 8879.14 samples/sec Loss 4.1491 LearningRate 0.0047 Epoch: 15 Global Step: 261830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:37,014-Speed 9488.56 samples/sec Loss 4.1268 LearningRate 0.0046 Epoch: 15 Global Step: 261840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:38,104-Speed 9405.44 samples/sec Loss 4.1122 LearningRate 0.0046 Epoch: 15 Global Step: 261850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:39,207-Speed 9287.37 samples/sec Loss 4.0910 LearningRate 0.0046 Epoch: 15 Global Step: 261860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:40,331-Speed 9114.68 samples/sec Loss 4.1577 LearningRate 0.0046 Epoch: 15 Global Step: 261870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:41,450-Speed 9156.43 samples/sec Loss 4.1957 LearningRate 0.0046 Epoch: 15 Global Step: 261880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:42,552-Speed 9298.24 samples/sec Loss 4.1822 LearningRate 0.0046 Epoch: 15 Global Step: 261890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:06:43,665-Speed 9204.90 samples/sec Loss 4.1321 LearningRate 0.0046 Epoch: 15 Global Step: 261900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:44,745-Speed 9485.74 samples/sec Loss 4.0766 LearningRate 0.0046 Epoch: 15 Global Step: 261910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:45,829-Speed 9456.70 samples/sec Loss 4.0714 LearningRate 0.0046 Epoch: 15 Global Step: 261920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:46,905-Speed 9528.16 samples/sec Loss 4.1382 LearningRate 0.0046 Epoch: 15 Global Step: 261930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:47,990-Speed 9438.17 samples/sec Loss 4.2018 LearningRate 0.0046 Epoch: 15 Global Step: 261940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:49,097-Speed 9255.09 samples/sec Loss 4.1190 LearningRate 0.0046 Epoch: 15 Global Step: 261950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:50,208-Speed 9225.97 samples/sec Loss 4.1652 LearningRate 0.0046 Epoch: 15 Global Step: 261960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:51,326-Speed 9162.58 samples/sec Loss 4.1030 LearningRate 0.0046 Epoch: 15 Global Step: 261970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:52,389-Speed 9636.95 samples/sec Loss 4.0666 LearningRate 0.0046 Epoch: 15 Global Step: 261980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:53,459-Speed 9575.86 samples/sec Loss 4.1930 LearningRate 0.0046 Epoch: 15 Global Step: 261990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:06:54,559-Speed 9312.76 samples/sec Loss 4.1160 LearningRate 0.0046 Epoch: 15 Global Step: 262000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:07:16,558-[lfw][262000]XNorm: 7.071909 Training: 2022-04-11 22:07:16,559-[lfw][262000]Accuracy-Flip: 0.99650+-0.00302 Training: 2022-04-11 22:07:16,560-[lfw][262000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:07:41,988-[cfp_fp][262000]XNorm: 6.119181 Training: 2022-04-11 22:07:41,988-[cfp_fp][262000]Accuracy-Flip: 0.97171+-0.00873 Training: 2022-04-11 22:07:41,989-[cfp_fp][262000]Accuracy-Highest: 0.97171 Training: 2022-04-11 22:08:03,956-[agedb_30][262000]XNorm: 6.837060 Training: 2022-04-11 22:08:03,957-[agedb_30][262000]Accuracy-Flip: 0.97067+-0.00775 Training: 2022-04-11 22:08:03,957-[agedb_30][262000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:08:05,060-Speed 145.25 samples/sec Loss 4.1320 LearningRate 0.0046 Epoch: 15 Global Step: 262010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:06,118-Speed 9689.59 samples/sec Loss 4.1761 LearningRate 0.0046 Epoch: 15 Global Step: 262020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:07,219-Speed 9306.56 samples/sec Loss 4.1756 LearningRate 0.0046 Epoch: 15 Global Step: 262030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:08,323-Speed 9278.72 samples/sec Loss 4.1373 LearningRate 0.0046 Epoch: 15 Global Step: 262040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:09,451-Speed 9087.45 samples/sec Loss 4.0788 LearningRate 0.0046 Epoch: 15 Global Step: 262050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:10,535-Speed 9449.92 samples/sec Loss 4.1164 LearningRate 0.0046 Epoch: 15 Global Step: 262060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:11,595-Speed 9665.27 samples/sec Loss 4.0991 LearningRate 0.0046 Epoch: 15 Global Step: 262070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:12,692-Speed 9339.05 samples/sec Loss 4.1572 LearningRate 0.0046 Epoch: 15 Global Step: 262080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:13,777-Speed 9447.12 samples/sec Loss 4.2021 LearningRate 0.0046 Epoch: 15 Global Step: 262090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:14,874-Speed 9338.08 samples/sec Loss 4.1618 LearningRate 0.0046 Epoch: 15 Global Step: 262100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:15,989-Speed 9188.79 samples/sec Loss 4.1136 LearningRate 0.0046 Epoch: 15 Global Step: 262110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:17,106-Speed 9174.68 samples/sec Loss 4.0846 LearningRate 0.0046 Epoch: 15 Global Step: 262120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:18,201-Speed 9360.11 samples/sec Loss 4.1356 LearningRate 0.0046 Epoch: 15 Global Step: 262130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:19,279-Speed 9498.36 samples/sec Loss 4.1490 LearningRate 0.0046 Epoch: 15 Global Step: 262140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:20,391-Speed 9215.32 samples/sec Loss 4.0898 LearningRate 0.0046 Epoch: 15 Global Step: 262150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:21,554-Speed 8808.35 samples/sec Loss 4.1666 LearningRate 0.0046 Epoch: 15 Global Step: 262160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:22,629-Speed 9532.19 samples/sec Loss 4.1003 LearningRate 0.0046 Epoch: 15 Global Step: 262170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:23,694-Speed 9622.42 samples/sec Loss 4.1434 LearningRate 0.0046 Epoch: 15 Global Step: 262180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:24,810-Speed 9186.53 samples/sec Loss 4.2017 LearningRate 0.0046 Epoch: 15 Global Step: 262190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:25,908-Speed 9323.00 samples/sec Loss 4.1795 LearningRate 0.0046 Epoch: 15 Global Step: 262200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:26,972-Speed 9634.40 samples/sec Loss 4.1408 LearningRate 0.0046 Epoch: 15 Global Step: 262210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:28,070-Speed 9329.70 samples/sec Loss 4.1521 LearningRate 0.0046 Epoch: 15 Global Step: 262220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:29,180-Speed 9230.51 samples/sec Loss 4.1252 LearningRate 0.0046 Epoch: 15 Global Step: 262230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:30,280-Speed 9318.23 samples/sec Loss 4.1685 LearningRate 0.0046 Epoch: 15 Global Step: 262240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:31,357-Speed 9513.24 samples/sec Loss 4.1272 LearningRate 0.0046 Epoch: 15 Global Step: 262250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:32,444-Speed 9427.28 samples/sec Loss 4.2359 LearningRate 0.0046 Epoch: 15 Global Step: 262260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:33,541-Speed 9340.53 samples/sec Loss 4.1537 LearningRate 0.0046 Epoch: 15 Global Step: 262270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:34,608-Speed 9608.95 samples/sec Loss 4.1609 LearningRate 0.0046 Epoch: 15 Global Step: 262280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:35,713-Speed 9273.80 samples/sec Loss 4.0448 LearningRate 0.0046 Epoch: 15 Global Step: 262290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:36,832-Speed 9152.07 samples/sec Loss 4.1341 LearningRate 0.0046 Epoch: 15 Global Step: 262300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:37,975-Speed 8962.88 samples/sec Loss 4.1736 LearningRate 0.0046 Epoch: 15 Global Step: 262310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:39,116-Speed 8980.44 samples/sec Loss 4.1434 LearningRate 0.0046 Epoch: 15 Global Step: 262320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:40,236-Speed 9145.06 samples/sec Loss 4.0047 LearningRate 0.0046 Epoch: 15 Global Step: 262330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:41,361-Speed 9110.90 samples/sec Loss 4.2277 LearningRate 0.0046 Epoch: 15 Global Step: 262340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:42,453-Speed 9381.85 samples/sec Loss 4.1444 LearningRate 0.0046 Epoch: 15 Global Step: 262350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:43,539-Speed 9433.86 samples/sec Loss 4.1884 LearningRate 0.0046 Epoch: 15 Global Step: 262360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:44,651-Speed 9211.08 samples/sec Loss 4.1865 LearningRate 0.0046 Epoch: 15 Global Step: 262370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:45,739-Speed 9416.30 samples/sec Loss 4.1680 LearningRate 0.0046 Epoch: 15 Global Step: 262380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:46,842-Speed 9292.69 samples/sec Loss 4.0908 LearningRate 0.0046 Epoch: 15 Global Step: 262390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:48,011-Speed 8765.62 samples/sec Loss 4.1316 LearningRate 0.0046 Epoch: 15 Global Step: 262400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:49,103-Speed 9382.62 samples/sec Loss 4.1124 LearningRate 0.0046 Epoch: 15 Global Step: 262410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:50,179-Speed 9521.57 samples/sec Loss 4.1479 LearningRate 0.0046 Epoch: 15 Global Step: 262420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:51,280-Speed 9303.38 samples/sec Loss 4.2711 LearningRate 0.0046 Epoch: 15 Global Step: 262430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:52,403-Speed 9130.51 samples/sec Loss 4.1337 LearningRate 0.0046 Epoch: 15 Global Step: 262440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:53,506-Speed 9284.56 samples/sec Loss 4.1193 LearningRate 0.0046 Epoch: 15 Global Step: 262450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:54,635-Speed 9074.36 samples/sec Loss 4.1327 LearningRate 0.0046 Epoch: 15 Global Step: 262460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:55,766-Speed 9059.22 samples/sec Loss 4.1548 LearningRate 0.0046 Epoch: 15 Global Step: 262470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:56,866-Speed 9314.80 samples/sec Loss 4.1243 LearningRate 0.0046 Epoch: 15 Global Step: 262480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:08:57,947-Speed 9484.78 samples/sec Loss 4.1401 LearningRate 0.0046 Epoch: 15 Global Step: 262490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:08:59,046-Speed 9324.65 samples/sec Loss 4.1642 LearningRate 0.0046 Epoch: 15 Global Step: 262500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:00,173-Speed 9084.42 samples/sec Loss 4.1253 LearningRate 0.0046 Epoch: 15 Global Step: 262510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:01,271-Speed 9346.99 samples/sec Loss 4.1801 LearningRate 0.0046 Epoch: 15 Global Step: 262520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:02,400-Speed 9073.34 samples/sec Loss 4.1067 LearningRate 0.0046 Epoch: 15 Global Step: 262530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:03,541-Speed 8984.17 samples/sec Loss 4.1411 LearningRate 0.0046 Epoch: 15 Global Step: 262540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:04,632-Speed 9395.14 samples/sec Loss 4.2271 LearningRate 0.0046 Epoch: 15 Global Step: 262550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:05,738-Speed 9261.69 samples/sec Loss 4.1127 LearningRate 0.0046 Epoch: 15 Global Step: 262560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:06,847-Speed 9243.50 samples/sec Loss 4.1760 LearningRate 0.0046 Epoch: 15 Global Step: 262570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:07,901-Speed 9721.00 samples/sec Loss 4.1045 LearningRate 0.0046 Epoch: 15 Global Step: 262580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:09,020-Speed 9152.28 samples/sec Loss 4.1773 LearningRate 0.0046 Epoch: 15 Global Step: 262590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:10,154-Speed 9036.87 samples/sec Loss 4.1355 LearningRate 0.0046 Epoch: 15 Global Step: 262600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:11,265-Speed 9226.24 samples/sec Loss 4.2116 LearningRate 0.0046 Epoch: 15 Global Step: 262610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:12,363-Speed 9330.44 samples/sec Loss 4.1493 LearningRate 0.0045 Epoch: 15 Global Step: 262620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:13,475-Speed 9213.53 samples/sec Loss 4.1067 LearningRate 0.0045 Epoch: 15 Global Step: 262630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:14,651-Speed 8715.63 samples/sec Loss 4.1315 LearningRate 0.0045 Epoch: 15 Global Step: 262640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:15,815-Speed 8800.60 samples/sec Loss 4.2152 LearningRate 0.0045 Epoch: 15 Global Step: 262650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:16,953-Speed 9007.54 samples/sec Loss 4.1045 LearningRate 0.0045 Epoch: 15 Global Step: 262660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:18,072-Speed 9150.53 samples/sec Loss 4.1482 LearningRate 0.0045 Epoch: 15 Global Step: 262670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:19,162-Speed 9404.47 samples/sec Loss 4.1351 LearningRate 0.0045 Epoch: 15 Global Step: 262680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:20,234-Speed 9556.36 samples/sec Loss 4.1798 LearningRate 0.0045 Epoch: 15 Global Step: 262690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:21,345-Speed 9220.08 samples/sec Loss 4.2132 LearningRate 0.0045 Epoch: 15 Global Step: 262700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:22,437-Speed 9384.83 samples/sec Loss 4.1826 LearningRate 0.0045 Epoch: 15 Global Step: 262710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:23,544-Speed 9256.03 samples/sec Loss 4.1624 LearningRate 0.0045 Epoch: 15 Global Step: 262720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:24,627-Speed 9459.01 samples/sec Loss 4.2016 LearningRate 0.0045 Epoch: 15 Global Step: 262730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:25,806-Speed 8687.26 samples/sec Loss 4.3045 LearningRate 0.0045 Epoch: 15 Global Step: 262740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:26,931-Speed 9107.62 samples/sec Loss 4.0913 LearningRate 0.0045 Epoch: 15 Global Step: 262750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:28,094-Speed 8813.85 samples/sec Loss 4.1919 LearningRate 0.0045 Epoch: 15 Global Step: 262760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:29,220-Speed 9102.13 samples/sec Loss 4.1412 LearningRate 0.0045 Epoch: 15 Global Step: 262770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:30,344-Speed 9122.55 samples/sec Loss 4.2095 LearningRate 0.0045 Epoch: 15 Global Step: 262780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:31,421-Speed 9509.62 samples/sec Loss 4.1113 LearningRate 0.0045 Epoch: 15 Global Step: 262790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:32,581-Speed 8837.10 samples/sec Loss 4.1220 LearningRate 0.0045 Epoch: 15 Global Step: 262800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:33,716-Speed 9028.11 samples/sec Loss 4.0916 LearningRate 0.0045 Epoch: 15 Global Step: 262810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:34,841-Speed 9102.79 samples/sec Loss 4.2217 LearningRate 0.0045 Epoch: 15 Global Step: 262820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:35,968-Speed 9098.33 samples/sec Loss 4.1461 LearningRate 0.0045 Epoch: 15 Global Step: 262830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:37,068-Speed 9307.51 samples/sec Loss 4.1412 LearningRate 0.0045 Epoch: 15 Global Step: 262840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:09:38,177-Speed 9238.39 samples/sec Loss 4.0322 LearningRate 0.0045 Epoch: 15 Global Step: 262850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:39,241-Speed 9634.96 samples/sec Loss 4.1297 LearningRate 0.0045 Epoch: 15 Global Step: 262860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:40,354-Speed 9203.32 samples/sec Loss 4.2083 LearningRate 0.0045 Epoch: 15 Global Step: 262870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:41,433-Speed 9494.95 samples/sec Loss 4.1173 LearningRate 0.0045 Epoch: 15 Global Step: 262880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:42,535-Speed 9298.43 samples/sec Loss 4.1146 LearningRate 0.0045 Epoch: 15 Global Step: 262890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:43,661-Speed 9103.62 samples/sec Loss 4.2275 LearningRate 0.0045 Epoch: 15 Global Step: 262900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:44,791-Speed 9065.85 samples/sec Loss 4.1724 LearningRate 0.0045 Epoch: 15 Global Step: 262910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:45,837-Speed 9797.70 samples/sec Loss 4.2041 LearningRate 0.0045 Epoch: 15 Global Step: 262920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:46,930-Speed 9372.51 samples/sec Loss 4.1149 LearningRate 0.0045 Epoch: 15 Global Step: 262930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:09:48,028-Speed 9327.00 samples/sec Loss 4.1294 LearningRate 0.0045 Epoch: 15 Global Step: 262940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:09:49,117-Speed 9411.90 samples/sec Loss 4.1779 LearningRate 0.0045 Epoch: 15 Global Step: 262950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:09:50,242-Speed 9108.70 samples/sec Loss 4.1517 LearningRate 0.0045 Epoch: 15 Global Step: 262960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:09:51,385-Speed 8968.06 samples/sec Loss 4.1165 LearningRate 0.0045 Epoch: 15 Global Step: 262970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:09:52,472-Speed 9425.28 samples/sec Loss 4.0925 LearningRate 0.0045 Epoch: 15 Global Step: 262980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:09:53,550-Speed 9498.87 samples/sec Loss 4.0862 LearningRate 0.0045 Epoch: 15 Global Step: 262990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:09:54,594-Speed 9814.04 samples/sec Loss 4.1289 LearningRate 0.0045 Epoch: 15 Global Step: 263000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:09:55,717-Speed 9126.76 samples/sec Loss 4.1949 LearningRate 0.0045 Epoch: 15 Global Step: 263010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:09:56,858-Speed 8977.00 samples/sec Loss 4.0995 LearningRate 0.0045 Epoch: 15 Global Step: 263020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:09:58,030-Speed 8742.86 samples/sec Loss 4.1975 LearningRate 0.0045 Epoch: 15 Global Step: 263030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:09:59,156-Speed 9098.87 samples/sec Loss 4.1968 LearningRate 0.0045 Epoch: 15 Global Step: 263040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:00,257-Speed 9307.57 samples/sec Loss 4.1256 LearningRate 0.0045 Epoch: 15 Global Step: 263050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:01,321-Speed 9631.95 samples/sec Loss 4.1103 LearningRate 0.0045 Epoch: 15 Global Step: 263060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:02,380-Speed 9676.23 samples/sec Loss 4.1847 LearningRate 0.0045 Epoch: 15 Global Step: 263070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:03,435-Speed 9707.69 samples/sec Loss 4.0852 LearningRate 0.0045 Epoch: 15 Global Step: 263080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:04,544-Speed 9237.44 samples/sec Loss 4.2321 LearningRate 0.0045 Epoch: 15 Global Step: 263090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:05,630-Speed 9435.68 samples/sec Loss 4.1713 LearningRate 0.0045 Epoch: 15 Global Step: 263100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:06,743-Speed 9210.82 samples/sec Loss 4.2104 LearningRate 0.0045 Epoch: 15 Global Step: 263110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:07,808-Speed 9618.54 samples/sec Loss 4.1473 LearningRate 0.0045 Epoch: 15 Global Step: 263120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:08,863-Speed 9715.33 samples/sec Loss 4.2111 LearningRate 0.0045 Epoch: 15 Global Step: 263130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:09,944-Speed 9485.64 samples/sec Loss 4.1495 LearningRate 0.0045 Epoch: 15 Global Step: 263140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:11,050-Speed 9262.40 samples/sec Loss 4.1834 LearningRate 0.0045 Epoch: 15 Global Step: 263150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:12,154-Speed 9278.15 samples/sec Loss 4.1420 LearningRate 0.0045 Epoch: 15 Global Step: 263160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:13,264-Speed 9228.28 samples/sec Loss 4.2126 LearningRate 0.0045 Epoch: 15 Global Step: 263170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:14,343-Speed 9499.39 samples/sec Loss 4.1629 LearningRate 0.0045 Epoch: 15 Global Step: 263180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:15,411-Speed 9589.23 samples/sec Loss 4.1226 LearningRate 0.0045 Epoch: 15 Global Step: 263190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:16,495-Speed 9456.47 samples/sec Loss 4.1587 LearningRate 0.0045 Epoch: 15 Global Step: 263200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:17,569-Speed 9540.39 samples/sec Loss 4.1843 LearningRate 0.0045 Epoch: 15 Global Step: 263210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:18,690-Speed 9133.39 samples/sec Loss 4.1226 LearningRate 0.0045 Epoch: 15 Global Step: 263220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:19,754-Speed 9632.43 samples/sec Loss 4.1502 LearningRate 0.0045 Epoch: 15 Global Step: 263230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:20,863-Speed 9239.56 samples/sec Loss 4.1728 LearningRate 0.0045 Epoch: 15 Global Step: 263240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:22,001-Speed 9007.98 samples/sec Loss 4.1477 LearningRate 0.0045 Epoch: 15 Global Step: 263250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:23,107-Speed 9260.93 samples/sec Loss 4.1896 LearningRate 0.0045 Epoch: 15 Global Step: 263260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:24,233-Speed 9101.31 samples/sec Loss 4.1482 LearningRate 0.0045 Epoch: 15 Global Step: 263270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:25,334-Speed 9313.28 samples/sec Loss 4.1357 LearningRate 0.0045 Epoch: 15 Global Step: 263280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:26,450-Speed 9184.10 samples/sec Loss 4.1990 LearningRate 0.0045 Epoch: 15 Global Step: 263290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:27,527-Speed 9507.62 samples/sec Loss 4.2040 LearningRate 0.0045 Epoch: 15 Global Step: 263300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:28,651-Speed 9115.98 samples/sec Loss 4.1542 LearningRate 0.0045 Epoch: 15 Global Step: 263310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:29,718-Speed 9605.34 samples/sec Loss 4.0206 LearningRate 0.0045 Epoch: 15 Global Step: 263320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:30,795-Speed 9509.11 samples/sec Loss 4.1424 LearningRate 0.0045 Epoch: 15 Global Step: 263330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:31,864-Speed 9585.68 samples/sec Loss 4.1493 LearningRate 0.0045 Epoch: 15 Global Step: 263340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:32,973-Speed 9242.52 samples/sec Loss 4.2214 LearningRate 0.0045 Epoch: 15 Global Step: 263350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:34,169-Speed 8576.44 samples/sec Loss 4.1642 LearningRate 0.0045 Epoch: 15 Global Step: 263360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:35,297-Speed 9081.13 samples/sec Loss 4.1426 LearningRate 0.0045 Epoch: 15 Global Step: 263370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:36,380-Speed 9460.58 samples/sec Loss 4.0442 LearningRate 0.0045 Epoch: 15 Global Step: 263380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:37,458-Speed 9499.90 samples/sec Loss 4.2009 LearningRate 0.0045 Epoch: 15 Global Step: 263390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:38,587-Speed 9075.05 samples/sec Loss 4.1243 LearningRate 0.0045 Epoch: 15 Global Step: 263400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:39,655-Speed 9600.28 samples/sec Loss 4.1531 LearningRate 0.0044 Epoch: 15 Global Step: 263410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:40,781-Speed 9099.76 samples/sec Loss 4.2412 LearningRate 0.0044 Epoch: 15 Global Step: 263420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:41,830-Speed 9770.55 samples/sec Loss 4.1323 LearningRate 0.0044 Epoch: 15 Global Step: 263430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:42,999-Speed 8761.22 samples/sec Loss 4.0893 LearningRate 0.0044 Epoch: 15 Global Step: 263440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:44,130-Speed 9061.17 samples/sec Loss 4.0574 LearningRate 0.0044 Epoch: 15 Global Step: 263450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:45,225-Speed 9357.34 samples/sec Loss 4.1434 LearningRate 0.0044 Epoch: 15 Global Step: 263460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:46,305-Speed 9488.79 samples/sec Loss 4.2204 LearningRate 0.0044 Epoch: 15 Global Step: 263470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:47,359-Speed 9725.75 samples/sec Loss 4.0959 LearningRate 0.0044 Epoch: 15 Global Step: 263480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:10:48,435-Speed 9518.17 samples/sec Loss 4.1804 LearningRate 0.0044 Epoch: 15 Global Step: 263490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:49,570-Speed 9026.13 samples/sec Loss 4.1558 LearningRate 0.0044 Epoch: 15 Global Step: 263500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:50,692-Speed 9129.14 samples/sec Loss 4.1056 LearningRate 0.0044 Epoch: 15 Global Step: 263510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:51,751-Speed 9681.38 samples/sec Loss 4.0488 LearningRate 0.0044 Epoch: 15 Global Step: 263520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:52,823-Speed 9559.89 samples/sec Loss 4.2489 LearningRate 0.0044 Epoch: 15 Global Step: 263530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:53,938-Speed 9188.63 samples/sec Loss 4.2426 LearningRate 0.0044 Epoch: 15 Global Step: 263540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:55,098-Speed 8831.13 samples/sec Loss 4.2005 LearningRate 0.0044 Epoch: 15 Global Step: 263550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:56,199-Speed 9307.26 samples/sec Loss 4.2533 LearningRate 0.0044 Epoch: 15 Global Step: 263560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:57,311-Speed 9210.69 samples/sec Loss 4.0767 LearningRate 0.0044 Epoch: 15 Global Step: 263570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:58,382-Speed 9571.18 samples/sec Loss 4.0900 LearningRate 0.0044 Epoch: 15 Global Step: 263580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:10:59,472-Speed 9397.33 samples/sec Loss 4.1434 LearningRate 0.0044 Epoch: 15 Global Step: 263590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:11:00,518-Speed 9789.98 samples/sec Loss 4.1767 LearningRate 0.0044 Epoch: 15 Global Step: 263600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:01,672-Speed 8878.18 samples/sec Loss 4.2213 LearningRate 0.0044 Epoch: 15 Global Step: 263610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:02,820-Speed 8938.06 samples/sec Loss 4.1084 LearningRate 0.0044 Epoch: 15 Global Step: 263620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:03,916-Speed 9342.79 samples/sec Loss 4.2187 LearningRate 0.0044 Epoch: 15 Global Step: 263630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:05,007-Speed 9390.14 samples/sec Loss 4.1624 LearningRate 0.0044 Epoch: 15 Global Step: 263640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:06,122-Speed 9194.80 samples/sec Loss 4.1538 LearningRate 0.0044 Epoch: 15 Global Step: 263650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:07,239-Speed 9169.42 samples/sec Loss 4.1201 LearningRate 0.0044 Epoch: 15 Global Step: 263660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:08,336-Speed 9335.92 samples/sec Loss 4.1552 LearningRate 0.0044 Epoch: 15 Global Step: 263670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:09,414-Speed 9514.91 samples/sec Loss 4.1303 LearningRate 0.0044 Epoch: 15 Global Step: 263680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:10,529-Speed 9188.45 samples/sec Loss 4.2340 LearningRate 0.0044 Epoch: 15 Global Step: 263690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:11,682-Speed 8879.62 samples/sec Loss 4.1364 LearningRate 0.0044 Epoch: 15 Global Step: 263700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:11:12,777-Speed 9361.22 samples/sec Loss 4.1376 LearningRate 0.0044 Epoch: 15 Global Step: 263710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:11:13,908-Speed 9055.07 samples/sec Loss 4.1314 LearningRate 0.0044 Epoch: 15 Global Step: 263720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:11:15,063-Speed 8874.76 samples/sec Loss 4.1094 LearningRate 0.0044 Epoch: 15 Global Step: 263730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:11:16,175-Speed 9215.44 samples/sec Loss 4.2350 LearningRate 0.0044 Epoch: 15 Global Step: 263740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:17,233-Speed 9681.83 samples/sec Loss 4.2030 LearningRate 0.0044 Epoch: 15 Global Step: 263750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:18,341-Speed 9246.33 samples/sec Loss 4.2319 LearningRate 0.0044 Epoch: 15 Global Step: 263760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:19,448-Speed 9255.08 samples/sec Loss 4.2007 LearningRate 0.0044 Epoch: 15 Global Step: 263770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:20,540-Speed 9384.54 samples/sec Loss 4.0525 LearningRate 0.0044 Epoch: 15 Global Step: 263780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:21,632-Speed 9389.90 samples/sec Loss 4.1495 LearningRate 0.0044 Epoch: 15 Global Step: 263790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:22,699-Speed 9597.87 samples/sec Loss 4.1495 LearningRate 0.0044 Epoch: 15 Global Step: 263800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:23,770-Speed 9569.36 samples/sec Loss 4.1306 LearningRate 0.0044 Epoch: 15 Global Step: 263810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:24,913-Speed 8962.38 samples/sec Loss 4.0226 LearningRate 0.0044 Epoch: 15 Global Step: 263820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:25,995-Speed 9470.36 samples/sec Loss 4.2117 LearningRate 0.0044 Epoch: 15 Global Step: 263830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:27,089-Speed 9361.28 samples/sec Loss 4.1438 LearningRate 0.0044 Epoch: 15 Global Step: 263840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:11:28,180-Speed 9394.44 samples/sec Loss 4.1881 LearningRate 0.0044 Epoch: 15 Global Step: 263850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:11:29,304-Speed 9118.98 samples/sec Loss 4.0704 LearningRate 0.0044 Epoch: 15 Global Step: 263860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:11:30,366-Speed 9643.28 samples/sec Loss 4.2706 LearningRate 0.0044 Epoch: 15 Global Step: 263870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:11:31,427-Speed 9657.18 samples/sec Loss 4.1514 LearningRate 0.0044 Epoch: 15 Global Step: 263880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:32,501-Speed 9546.74 samples/sec Loss 4.1071 LearningRate 0.0044 Epoch: 15 Global Step: 263890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:33,608-Speed 9254.35 samples/sec Loss 4.1118 LearningRate 0.0044 Epoch: 15 Global Step: 263900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:34,727-Speed 9151.95 samples/sec Loss 4.1802 LearningRate 0.0044 Epoch: 15 Global Step: 263910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:35,821-Speed 9368.79 samples/sec Loss 4.1492 LearningRate 0.0044 Epoch: 15 Global Step: 263920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:36,909-Speed 9419.07 samples/sec Loss 4.1103 LearningRate 0.0044 Epoch: 15 Global Step: 263930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:38,006-Speed 9334.19 samples/sec Loss 4.1680 LearningRate 0.0044 Epoch: 15 Global Step: 263940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:39,107-Speed 9311.23 samples/sec Loss 4.0785 LearningRate 0.0044 Epoch: 15 Global Step: 263950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:40,203-Speed 9357.58 samples/sec Loss 4.1434 LearningRate 0.0044 Epoch: 15 Global Step: 263960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:41,262-Speed 9672.38 samples/sec Loss 4.1796 LearningRate 0.0044 Epoch: 15 Global Step: 263970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:42,364-Speed 9294.93 samples/sec Loss 4.0715 LearningRate 0.0044 Epoch: 15 Global Step: 263980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:43,523-Speed 8841.69 samples/sec Loss 4.1900 LearningRate 0.0044 Epoch: 15 Global Step: 263990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:11:44,664-Speed 8978.71 samples/sec Loss 4.2114 LearningRate 0.0044 Epoch: 15 Global Step: 264000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:12:06,622-[lfw][264000]XNorm: 7.037474 Training: 2022-04-11 22:12:06,623-[lfw][264000]Accuracy-Flip: 0.99650+-0.00293 Training: 2022-04-11 22:12:06,623-[lfw][264000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:12:32,021-[cfp_fp][264000]XNorm: 6.096041 Training: 2022-04-11 22:12:32,022-[cfp_fp][264000]Accuracy-Flip: 0.97043+-0.00756 Training: 2022-04-11 22:12:32,022-[cfp_fp][264000]Accuracy-Highest: 0.97171 Training: 2022-04-11 22:12:53,939-[agedb_30][264000]XNorm: 6.834484 Training: 2022-04-11 22:12:53,940-[agedb_30][264000]Accuracy-Flip: 0.97317+-0.00740 Training: 2022-04-11 22:12:53,940-[agedb_30][264000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:12:55,053-Speed 145.48 samples/sec Loss 4.1893 LearningRate 0.0044 Epoch: 15 Global Step: 264010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:12:56,171-Speed 9165.99 samples/sec Loss 4.0995 LearningRate 0.0044 Epoch: 15 Global Step: 264020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:12:57,292-Speed 9137.67 samples/sec Loss 4.1031 LearningRate 0.0044 Epoch: 15 Global Step: 264030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:12:58,396-Speed 9284.14 samples/sec Loss 4.1762 LearningRate 0.0044 Epoch: 15 Global Step: 264040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:12:59,494-Speed 9336.12 samples/sec Loss 4.0901 LearningRate 0.0044 Epoch: 15 Global Step: 264050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:00,584-Speed 9399.99 samples/sec Loss 4.1587 LearningRate 0.0044 Epoch: 15 Global Step: 264060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:01,678-Speed 9365.65 samples/sec Loss 4.1779 LearningRate 0.0044 Epoch: 15 Global Step: 264070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:02,775-Speed 9342.55 samples/sec Loss 4.1357 LearningRate 0.0044 Epoch: 15 Global Step: 264080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:03,834-Speed 9668.50 samples/sec Loss 4.1214 LearningRate 0.0044 Epoch: 15 Global Step: 264090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:04,901-Speed 9606.19 samples/sec Loss 4.1519 LearningRate 0.0044 Epoch: 15 Global Step: 264100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:06,009-Speed 9247.55 samples/sec Loss 4.1984 LearningRate 0.0044 Epoch: 15 Global Step: 264110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:07,085-Speed 9516.84 samples/sec Loss 4.1433 LearningRate 0.0044 Epoch: 15 Global Step: 264120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:08,181-Speed 9353.29 samples/sec Loss 4.0975 LearningRate 0.0044 Epoch: 15 Global Step: 264130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:09,284-Speed 9291.56 samples/sec Loss 4.0994 LearningRate 0.0044 Epoch: 15 Global Step: 264140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:10,402-Speed 9163.22 samples/sec Loss 4.1388 LearningRate 0.0044 Epoch: 15 Global Step: 264150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:11,543-Speed 8979.67 samples/sec Loss 4.1441 LearningRate 0.0044 Epoch: 15 Global Step: 264160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:12,658-Speed 9189.57 samples/sec Loss 4.1465 LearningRate 0.0044 Epoch: 15 Global Step: 264170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:13,729-Speed 9565.58 samples/sec Loss 4.0461 LearningRate 0.0044 Epoch: 15 Global Step: 264180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:14,819-Speed 9398.98 samples/sec Loss 4.1490 LearningRate 0.0044 Epoch: 15 Global Step: 264190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:15,950-Speed 9059.89 samples/sec Loss 4.1821 LearningRate 0.0043 Epoch: 15 Global Step: 264200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:17,029-Speed 9490.21 samples/sec Loss 4.2312 LearningRate 0.0043 Epoch: 15 Global Step: 264210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:18,142-Speed 9204.43 samples/sec Loss 4.1194 LearningRate 0.0043 Epoch: 15 Global Step: 264220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:19,221-Speed 9494.53 samples/sec Loss 4.1545 LearningRate 0.0043 Epoch: 15 Global Step: 264230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:20,293-Speed 9558.32 samples/sec Loss 4.1980 LearningRate 0.0043 Epoch: 15 Global Step: 264240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:21,435-Speed 8980.43 samples/sec Loss 4.1654 LearningRate 0.0043 Epoch: 15 Global Step: 264250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:22,574-Speed 8989.26 samples/sec Loss 4.2329 LearningRate 0.0043 Epoch: 15 Global Step: 264260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:23,674-Speed 9321.43 samples/sec Loss 4.1389 LearningRate 0.0043 Epoch: 15 Global Step: 264270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:24,851-Speed 8705.96 samples/sec Loss 4.1661 LearningRate 0.0043 Epoch: 15 Global Step: 264280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:25,916-Speed 9622.32 samples/sec Loss 4.1956 LearningRate 0.0043 Epoch: 15 Global Step: 264290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:27,026-Speed 9227.64 samples/sec Loss 4.1748 LearningRate 0.0043 Epoch: 15 Global Step: 264300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:28,116-Speed 9402.16 samples/sec Loss 4.2046 LearningRate 0.0043 Epoch: 15 Global Step: 264310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:29,221-Speed 9270.70 samples/sec Loss 4.2647 LearningRate 0.0043 Epoch: 15 Global Step: 264320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:30,288-Speed 9610.42 samples/sec Loss 4.1519 LearningRate 0.0043 Epoch: 15 Global Step: 264330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:31,416-Speed 9078.59 samples/sec Loss 4.1558 LearningRate 0.0043 Epoch: 15 Global Step: 264340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:32,525-Speed 9242.37 samples/sec Loss 4.0789 LearningRate 0.0043 Epoch: 15 Global Step: 264350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:33,659-Speed 9033.55 samples/sec Loss 4.0673 LearningRate 0.0043 Epoch: 15 Global Step: 264360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:34,748-Speed 9407.86 samples/sec Loss 4.0989 LearningRate 0.0043 Epoch: 15 Global Step: 264370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:13:35,815-Speed 9600.25 samples/sec Loss 4.1273 LearningRate 0.0043 Epoch: 15 Global Step: 264380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:36,969-Speed 8883.74 samples/sec Loss 4.1101 LearningRate 0.0043 Epoch: 15 Global Step: 264390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:38,056-Speed 9418.38 samples/sec Loss 4.1427 LearningRate 0.0043 Epoch: 15 Global Step: 264400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:13:39,174-Speed 9173.35 samples/sec Loss 4.1410 LearningRate 0.0043 Epoch: 15 Global Step: 264410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:13:40,231-Speed 9694.47 samples/sec Loss 4.1107 LearningRate 0.0043 Epoch: 15 Global Step: 264420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:13:41,336-Speed 9266.09 samples/sec Loss 4.1056 LearningRate 0.0043 Epoch: 15 Global Step: 264430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:13:42,512-Speed 8712.74 samples/sec Loss 4.1986 LearningRate 0.0043 Epoch: 15 Global Step: 264440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:13:43,630-Speed 9161.22 samples/sec Loss 4.1955 LearningRate 0.0043 Epoch: 15 Global Step: 264450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:13:44,745-Speed 9196.90 samples/sec Loss 4.2134 LearningRate 0.0043 Epoch: 15 Global Step: 264460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:13:45,878-Speed 9045.20 samples/sec Loss 4.2032 LearningRate 0.0043 Epoch: 15 Global Step: 264470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:13:47,030-Speed 8889.09 samples/sec Loss 4.1049 LearningRate 0.0043 Epoch: 15 Global Step: 264480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:13:48,096-Speed 9611.17 samples/sec Loss 4.1747 LearningRate 0.0043 Epoch: 15 Global Step: 264490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:13:49,220-Speed 9116.20 samples/sec Loss 4.1579 LearningRate 0.0043 Epoch: 15 Global Step: 264500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:50,355-Speed 9025.57 samples/sec Loss 4.3133 LearningRate 0.0043 Epoch: 15 Global Step: 264510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:51,454-Speed 9330.94 samples/sec Loss 4.0733 LearningRate 0.0043 Epoch: 15 Global Step: 264520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:52,570-Speed 9176.80 samples/sec Loss 4.1082 LearningRate 0.0043 Epoch: 15 Global Step: 264530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:53,666-Speed 9355.33 samples/sec Loss 4.1189 LearningRate 0.0043 Epoch: 15 Global Step: 264540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:54,751-Speed 9443.52 samples/sec Loss 4.1683 LearningRate 0.0043 Epoch: 15 Global Step: 264550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:55,811-Speed 9661.39 samples/sec Loss 4.1123 LearningRate 0.0043 Epoch: 15 Global Step: 264560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:56,891-Speed 9489.82 samples/sec Loss 4.1218 LearningRate 0.0043 Epoch: 15 Global Step: 264570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:57,984-Speed 9370.80 samples/sec Loss 4.0550 LearningRate 0.0043 Epoch: 15 Global Step: 264580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:13:59,107-Speed 9121.73 samples/sec Loss 4.1767 LearningRate 0.0043 Epoch: 15 Global Step: 264590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:00,213-Speed 9263.22 samples/sec Loss 4.0955 LearningRate 0.0043 Epoch: 15 Global Step: 264600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:01,337-Speed 9118.32 samples/sec Loss 4.1562 LearningRate 0.0043 Epoch: 15 Global Step: 264610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:02,429-Speed 9380.75 samples/sec Loss 4.0842 LearningRate 0.0043 Epoch: 15 Global Step: 264620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:03,514-Speed 9457.28 samples/sec Loss 4.1701 LearningRate 0.0043 Epoch: 15 Global Step: 264630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:04,650-Speed 9017.28 samples/sec Loss 4.1455 LearningRate 0.0043 Epoch: 15 Global Step: 264640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:05,736-Speed 9432.29 samples/sec Loss 4.1508 LearningRate 0.0043 Epoch: 15 Global Step: 264650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:06,835-Speed 9328.39 samples/sec Loss 4.1351 LearningRate 0.0043 Epoch: 15 Global Step: 264660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:07,936-Speed 9306.18 samples/sec Loss 4.1439 LearningRate 0.0043 Epoch: 15 Global Step: 264670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:09,040-Speed 9275.92 samples/sec Loss 4.2556 LearningRate 0.0043 Epoch: 15 Global Step: 264680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:10,152-Speed 9220.40 samples/sec Loss 4.0792 LearningRate 0.0043 Epoch: 15 Global Step: 264690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:11,231-Speed 9491.59 samples/sec Loss 4.1318 LearningRate 0.0043 Epoch: 15 Global Step: 264700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:12,367-Speed 9022.97 samples/sec Loss 4.2032 LearningRate 0.0043 Epoch: 15 Global Step: 264710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:13,469-Speed 9291.90 samples/sec Loss 4.2178 LearningRate 0.0043 Epoch: 15 Global Step: 264720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:14,621-Speed 8897.99 samples/sec Loss 4.1806 LearningRate 0.0043 Epoch: 15 Global Step: 264730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:15,777-Speed 8857.89 samples/sec Loss 4.1139 LearningRate 0.0043 Epoch: 15 Global Step: 264740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:16,885-Speed 9249.94 samples/sec Loss 4.1591 LearningRate 0.0043 Epoch: 15 Global Step: 264750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:17,970-Speed 9446.13 samples/sec Loss 4.0945 LearningRate 0.0043 Epoch: 15 Global Step: 264760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:19,035-Speed 9618.35 samples/sec Loss 4.1288 LearningRate 0.0043 Epoch: 15 Global Step: 264770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:20,153-Speed 9161.78 samples/sec Loss 4.1063 LearningRate 0.0043 Epoch: 15 Global Step: 264780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:21,281-Speed 9089.64 samples/sec Loss 4.1603 LearningRate 0.0043 Epoch: 15 Global Step: 264790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:22,434-Speed 8888.64 samples/sec Loss 4.1206 LearningRate 0.0043 Epoch: 15 Global Step: 264800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:23,558-Speed 9119.96 samples/sec Loss 4.1976 LearningRate 0.0043 Epoch: 15 Global Step: 264810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:24,712-Speed 8876.01 samples/sec Loss 4.1927 LearningRate 0.0043 Epoch: 15 Global Step: 264820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:25,786-Speed 9544.23 samples/sec Loss 4.1281 LearningRate 0.0043 Epoch: 15 Global Step: 264830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:26,851-Speed 9619.09 samples/sec Loss 4.1480 LearningRate 0.0043 Epoch: 15 Global Step: 264840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:27,956-Speed 9265.88 samples/sec Loss 4.0772 LearningRate 0.0043 Epoch: 15 Global Step: 264850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:29,046-Speed 9400.80 samples/sec Loss 4.2171 LearningRate 0.0043 Epoch: 15 Global Step: 264860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:30,134-Speed 9413.70 samples/sec Loss 4.1095 LearningRate 0.0043 Epoch: 15 Global Step: 264870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:31,202-Speed 9600.93 samples/sec Loss 4.1411 LearningRate 0.0043 Epoch: 15 Global Step: 264880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:32,302-Speed 9308.33 samples/sec Loss 4.2403 LearningRate 0.0043 Epoch: 15 Global Step: 264890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:33,432-Speed 9069.60 samples/sec Loss 4.1545 LearningRate 0.0043 Epoch: 15 Global Step: 264900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:34,543-Speed 9227.20 samples/sec Loss 4.0410 LearningRate 0.0043 Epoch: 15 Global Step: 264910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:35,674-Speed 9054.38 samples/sec Loss 4.2625 LearningRate 0.0043 Epoch: 15 Global Step: 264920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:36,759-Speed 9444.74 samples/sec Loss 4.1790 LearningRate 0.0043 Epoch: 15 Global Step: 264930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:37,896-Speed 9015.01 samples/sec Loss 4.1166 LearningRate 0.0043 Epoch: 15 Global Step: 264940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:39,026-Speed 9061.13 samples/sec Loss 4.0399 LearningRate 0.0043 Epoch: 15 Global Step: 264950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:40,110-Speed 9457.61 samples/sec Loss 4.1513 LearningRate 0.0043 Epoch: 15 Global Step: 264960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:41,181-Speed 9565.71 samples/sec Loss 4.1384 LearningRate 0.0043 Epoch: 15 Global Step: 264970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:42,250-Speed 9590.11 samples/sec Loss 4.1343 LearningRate 0.0043 Epoch: 15 Global Step: 264980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:43,349-Speed 9323.12 samples/sec Loss 4.1163 LearningRate 0.0043 Epoch: 15 Global Step: 264990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:44,499-Speed 8905.30 samples/sec Loss 4.1455 LearningRate 0.0043 Epoch: 15 Global Step: 265000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:45,588-Speed 9412.34 samples/sec Loss 4.1924 LearningRate 0.0042 Epoch: 15 Global Step: 265010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:46,668-Speed 9485.53 samples/sec Loss 4.2538 LearningRate 0.0042 Epoch: 15 Global Step: 265020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:47,771-Speed 9287.07 samples/sec Loss 4.1850 LearningRate 0.0042 Epoch: 15 Global Step: 265030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:48,867-Speed 9346.05 samples/sec Loss 4.0131 LearningRate 0.0042 Epoch: 15 Global Step: 265040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:50,075-Speed 8484.18 samples/sec Loss 4.1164 LearningRate 0.0042 Epoch: 15 Global Step: 265050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:14:51,144-Speed 9587.19 samples/sec Loss 4.1260 LearningRate 0.0042 Epoch: 15 Global Step: 265060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:52,305-Speed 8830.93 samples/sec Loss 4.0551 LearningRate 0.0042 Epoch: 15 Global Step: 265070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:53,390-Speed 9444.65 samples/sec Loss 4.1804 LearningRate 0.0042 Epoch: 15 Global Step: 265080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:54,471-Speed 9474.92 samples/sec Loss 4.1197 LearningRate 0.0042 Epoch: 15 Global Step: 265090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:55,647-Speed 8712.40 samples/sec Loss 4.2261 LearningRate 0.0042 Epoch: 15 Global Step: 265100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:56,738-Speed 9396.53 samples/sec Loss 4.1356 LearningRate 0.0042 Epoch: 15 Global Step: 265110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:57,845-Speed 9256.11 samples/sec Loss 4.0959 LearningRate 0.0042 Epoch: 15 Global Step: 265120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:14:58,941-Speed 9350.54 samples/sec Loss 4.1075 LearningRate 0.0042 Epoch: 15 Global Step: 265130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:00,064-Speed 9120.99 samples/sec Loss 4.0990 LearningRate 0.0042 Epoch: 15 Global Step: 265140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:01,181-Speed 9171.92 samples/sec Loss 4.1225 LearningRate 0.0042 Epoch: 15 Global Step: 265150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:02,259-Speed 9509.66 samples/sec Loss 4.2503 LearningRate 0.0042 Epoch: 15 Global Step: 265160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:03,396-Speed 9009.65 samples/sec Loss 4.1895 LearningRate 0.0042 Epoch: 15 Global Step: 265170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:04,520-Speed 9113.37 samples/sec Loss 4.1782 LearningRate 0.0042 Epoch: 15 Global Step: 265180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:05,611-Speed 9394.77 samples/sec Loss 4.1898 LearningRate 0.0042 Epoch: 15 Global Step: 265190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:06,684-Speed 9549.09 samples/sec Loss 4.0648 LearningRate 0.0042 Epoch: 15 Global Step: 265200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:07,771-Speed 9422.23 samples/sec Loss 4.0982 LearningRate 0.0042 Epoch: 15 Global Step: 265210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:08,915-Speed 8955.77 samples/sec Loss 4.1232 LearningRate 0.0042 Epoch: 15 Global Step: 265220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:10,050-Speed 9032.11 samples/sec Loss 4.1436 LearningRate 0.0042 Epoch: 15 Global Step: 265230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:11,194-Speed 8956.78 samples/sec Loss 4.2092 LearningRate 0.0042 Epoch: 15 Global Step: 265240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:12,314-Speed 9148.10 samples/sec Loss 4.1116 LearningRate 0.0042 Epoch: 15 Global Step: 265250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:13,386-Speed 9559.82 samples/sec Loss 4.1859 LearningRate 0.0042 Epoch: 15 Global Step: 265260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:14,507-Speed 9136.58 samples/sec Loss 4.2011 LearningRate 0.0042 Epoch: 15 Global Step: 265270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:15,643-Speed 9018.26 samples/sec Loss 4.2311 LearningRate 0.0042 Epoch: 15 Global Step: 265280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:16,736-Speed 9377.77 samples/sec Loss 4.1946 LearningRate 0.0042 Epoch: 15 Global Step: 265290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:17,860-Speed 9115.82 samples/sec Loss 4.1295 LearningRate 0.0042 Epoch: 15 Global Step: 265300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:18,993-Speed 9047.22 samples/sec Loss 4.1457 LearningRate 0.0042 Epoch: 15 Global Step: 265310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:20,101-Speed 9245.72 samples/sec Loss 4.1985 LearningRate 0.0042 Epoch: 15 Global Step: 265320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:21,241-Speed 8983.88 samples/sec Loss 4.2724 LearningRate 0.0042 Epoch: 15 Global Step: 265330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:22,363-Speed 9135.70 samples/sec Loss 4.1498 LearningRate 0.0042 Epoch: 15 Global Step: 265340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:23,486-Speed 9128.43 samples/sec Loss 4.1079 LearningRate 0.0042 Epoch: 15 Global Step: 265350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:24,598-Speed 9209.49 samples/sec Loss 4.0634 LearningRate 0.0042 Epoch: 15 Global Step: 265360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:25,688-Speed 9406.17 samples/sec Loss 4.1635 LearningRate 0.0042 Epoch: 15 Global Step: 265370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:26,829-Speed 8976.74 samples/sec Loss 4.0955 LearningRate 0.0042 Epoch: 15 Global Step: 265380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:27,911-Speed 9471.34 samples/sec Loss 4.1836 LearningRate 0.0042 Epoch: 15 Global Step: 265390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:29,012-Speed 9308.49 samples/sec Loss 4.0530 LearningRate 0.0042 Epoch: 15 Global Step: 265400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:30,110-Speed 9330.21 samples/sec Loss 4.1177 LearningRate 0.0042 Epoch: 15 Global Step: 265410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:31,202-Speed 9380.19 samples/sec Loss 4.0719 LearningRate 0.0042 Epoch: 15 Global Step: 265420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:32,305-Speed 9283.76 samples/sec Loss 4.2075 LearningRate 0.0042 Epoch: 15 Global Step: 265430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:33,389-Speed 9456.08 samples/sec Loss 4.0588 LearningRate 0.0042 Epoch: 15 Global Step: 265440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:34,508-Speed 9159.18 samples/sec Loss 4.1341 LearningRate 0.0042 Epoch: 15 Global Step: 265450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:35,597-Speed 9415.58 samples/sec Loss 4.1735 LearningRate 0.0042 Epoch: 15 Global Step: 265460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:36,681-Speed 9446.63 samples/sec Loss 4.1872 LearningRate 0.0042 Epoch: 15 Global Step: 265470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:37,777-Speed 9349.97 samples/sec Loss 4.0625 LearningRate 0.0042 Epoch: 15 Global Step: 265480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:38,838-Speed 9660.61 samples/sec Loss 4.1127 LearningRate 0.0042 Epoch: 15 Global Step: 265490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:39,943-Speed 9273.96 samples/sec Loss 4.0943 LearningRate 0.0042 Epoch: 15 Global Step: 265500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:41,039-Speed 9349.53 samples/sec Loss 4.1629 LearningRate 0.0042 Epoch: 15 Global Step: 265510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:42,143-Speed 9276.94 samples/sec Loss 4.1038 LearningRate 0.0042 Epoch: 15 Global Step: 265520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:43,264-Speed 9140.69 samples/sec Loss 4.0961 LearningRate 0.0042 Epoch: 15 Global Step: 265530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:44,372-Speed 9248.42 samples/sec Loss 4.2485 LearningRate 0.0042 Epoch: 15 Global Step: 265540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:45,441-Speed 9586.11 samples/sec Loss 4.1068 LearningRate 0.0042 Epoch: 15 Global Step: 265550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:46,577-Speed 9018.95 samples/sec Loss 4.1024 LearningRate 0.0042 Epoch: 15 Global Step: 265560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:47,681-Speed 9281.15 samples/sec Loss 4.1694 LearningRate 0.0042 Epoch: 15 Global Step: 265570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:15:48,806-Speed 9103.90 samples/sec Loss 4.2849 LearningRate 0.0042 Epoch: 15 Global Step: 265580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:49,856-Speed 9758.40 samples/sec Loss 4.0875 LearningRate 0.0042 Epoch: 15 Global Step: 265590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:50,951-Speed 9358.26 samples/sec Loss 4.1173 LearningRate 0.0042 Epoch: 15 Global Step: 265600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:52,047-Speed 9352.27 samples/sec Loss 4.2499 LearningRate 0.0042 Epoch: 15 Global Step: 265610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:53,172-Speed 9103.84 samples/sec Loss 4.1262 LearningRate 0.0042 Epoch: 15 Global Step: 265620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:54,282-Speed 9238.32 samples/sec Loss 4.1612 LearningRate 0.0042 Epoch: 15 Global Step: 265630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:55,395-Speed 9201.74 samples/sec Loss 4.0946 LearningRate 0.0042 Epoch: 15 Global Step: 265640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:56,462-Speed 9606.53 samples/sec Loss 4.1007 LearningRate 0.0042 Epoch: 15 Global Step: 265650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:57,551-Speed 9404.09 samples/sec Loss 4.1833 LearningRate 0.0042 Epoch: 15 Global Step: 265660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:58,663-Speed 9214.15 samples/sec Loss 4.1527 LearningRate 0.0042 Epoch: 15 Global Step: 265670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:15:59,787-Speed 9117.88 samples/sec Loss 4.2777 LearningRate 0.0042 Epoch: 15 Global Step: 265680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:00,867-Speed 9483.10 samples/sec Loss 4.0660 LearningRate 0.0042 Epoch: 15 Global Step: 265690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:01,919-Speed 9743.40 samples/sec Loss 4.1456 LearningRate 0.0042 Epoch: 15 Global Step: 265700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:03,017-Speed 9331.87 samples/sec Loss 4.1215 LearningRate 0.0042 Epoch: 15 Global Step: 265710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:04,131-Speed 9198.02 samples/sec Loss 4.1088 LearningRate 0.0042 Epoch: 15 Global Step: 265720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:05,227-Speed 9350.71 samples/sec Loss 4.2200 LearningRate 0.0042 Epoch: 15 Global Step: 265730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:06,382-Speed 8867.82 samples/sec Loss 4.1171 LearningRate 0.0042 Epoch: 15 Global Step: 265740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:07,514-Speed 9054.13 samples/sec Loss 4.0721 LearningRate 0.0042 Epoch: 15 Global Step: 265750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:08,708-Speed 8577.26 samples/sec Loss 4.1364 LearningRate 0.0042 Epoch: 15 Global Step: 265760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:09,816-Speed 9247.07 samples/sec Loss 4.1051 LearningRate 0.0042 Epoch: 15 Global Step: 265770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:10,947-Speed 9060.14 samples/sec Loss 4.0878 LearningRate 0.0042 Epoch: 15 Global Step: 265780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:12,095-Speed 8926.94 samples/sec Loss 4.1185 LearningRate 0.0042 Epoch: 15 Global Step: 265790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:13,198-Speed 9289.56 samples/sec Loss 4.1021 LearningRate 0.0042 Epoch: 15 Global Step: 265800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:14,248-Speed 9766.76 samples/sec Loss 4.1071 LearningRate 0.0042 Epoch: 15 Global Step: 265810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:15,365-Speed 9170.96 samples/sec Loss 4.0705 LearningRate 0.0041 Epoch: 15 Global Step: 265820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:16,446-Speed 9473.40 samples/sec Loss 4.1637 LearningRate 0.0041 Epoch: 15 Global Step: 265830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:17,552-Speed 9268.55 samples/sec Loss 4.0943 LearningRate 0.0041 Epoch: 15 Global Step: 265840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:18,666-Speed 9194.79 samples/sec Loss 4.1410 LearningRate 0.0041 Epoch: 15 Global Step: 265850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:19,777-Speed 9221.24 samples/sec Loss 4.2074 LearningRate 0.0041 Epoch: 15 Global Step: 265860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:20,912-Speed 9025.59 samples/sec Loss 4.1886 LearningRate 0.0041 Epoch: 15 Global Step: 265870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:22,029-Speed 9175.86 samples/sec Loss 4.1583 LearningRate 0.0041 Epoch: 15 Global Step: 265880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:23,110-Speed 9477.58 samples/sec Loss 4.1177 LearningRate 0.0041 Epoch: 15 Global Step: 265890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:24,200-Speed 9401.48 samples/sec Loss 4.1755 LearningRate 0.0041 Epoch: 15 Global Step: 265900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:25,272-Speed 9555.32 samples/sec Loss 4.1077 LearningRate 0.0041 Epoch: 15 Global Step: 265910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:26,332-Speed 9666.82 samples/sec Loss 4.1401 LearningRate 0.0041 Epoch: 15 Global Step: 265920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:27,428-Speed 9346.47 samples/sec Loss 4.2049 LearningRate 0.0041 Epoch: 15 Global Step: 265930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:28,545-Speed 9173.17 samples/sec Loss 4.1658 LearningRate 0.0041 Epoch: 15 Global Step: 265940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:29,656-Speed 9223.88 samples/sec Loss 4.1282 LearningRate 0.0041 Epoch: 15 Global Step: 265950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:30,786-Speed 9060.34 samples/sec Loss 4.2069 LearningRate 0.0041 Epoch: 15 Global Step: 265960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:31,919-Speed 9045.22 samples/sec Loss 4.1727 LearningRate 0.0041 Epoch: 15 Global Step: 265970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:33,005-Speed 9441.85 samples/sec Loss 4.1132 LearningRate 0.0041 Epoch: 15 Global Step: 265980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:16:34,066-Speed 9656.45 samples/sec Loss 4.1071 LearningRate 0.0041 Epoch: 15 Global Step: 265990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:35,208-Speed 8976.20 samples/sec Loss 4.1506 LearningRate 0.0041 Epoch: 15 Global Step: 266000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:16:57,100-[lfw][266000]XNorm: 7.006951 Training: 2022-04-11 22:16:57,101-[lfw][266000]Accuracy-Flip: 0.99683+-0.00283 Training: 2022-04-11 22:16:57,102-[lfw][266000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:17:22,445-[cfp_fp][266000]XNorm: 6.055339 Training: 2022-04-11 22:17:22,446-[cfp_fp][266000]Accuracy-Flip: 0.97100+-0.00694 Training: 2022-04-11 22:17:22,446-[cfp_fp][266000]Accuracy-Highest: 0.97171 Training: 2022-04-11 22:17:44,293-[agedb_30][266000]XNorm: 6.813855 Training: 2022-04-11 22:17:44,294-[agedb_30][266000]Accuracy-Flip: 0.97167+-0.00872 Training: 2022-04-11 22:17:44,295-[agedb_30][266000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:17:45,388-Speed 145.91 samples/sec Loss 4.1075 LearningRate 0.0041 Epoch: 15 Global Step: 266010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:17:46,480-Speed 9383.09 samples/sec Loss 4.1934 LearningRate 0.0041 Epoch: 15 Global Step: 266020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:17:47,603-Speed 9124.33 samples/sec Loss 4.0943 LearningRate 0.0041 Epoch: 15 Global Step: 266030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:17:48,717-Speed 9199.94 samples/sec Loss 4.1732 LearningRate 0.0041 Epoch: 15 Global Step: 266040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:17:49,786-Speed 9577.52 samples/sec Loss 4.1551 LearningRate 0.0041 Epoch: 15 Global Step: 266050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:17:50,886-Speed 9314.41 samples/sec Loss 4.1110 LearningRate 0.0041 Epoch: 15 Global Step: 266060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:17:52,003-Speed 9178.48 samples/sec Loss 4.1208 LearningRate 0.0041 Epoch: 15 Global Step: 266070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:17:53,123-Speed 9151.23 samples/sec Loss 4.1562 LearningRate 0.0041 Epoch: 15 Global Step: 266080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:17:54,230-Speed 9253.25 samples/sec Loss 4.1668 LearningRate 0.0041 Epoch: 15 Global Step: 266090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:17:55,287-Speed 9696.28 samples/sec Loss 4.1518 LearningRate 0.0041 Epoch: 15 Global Step: 266100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:17:56,405-Speed 9167.93 samples/sec Loss 4.1353 LearningRate 0.0041 Epoch: 15 Global Step: 266110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:17:57,556-Speed 8897.74 samples/sec Loss 4.0304 LearningRate 0.0041 Epoch: 15 Global Step: 266120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:17:58,652-Speed 9350.17 samples/sec Loss 4.1835 LearningRate 0.0041 Epoch: 15 Global Step: 266130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:17:59,731-Speed 9490.74 samples/sec Loss 4.0632 LearningRate 0.0041 Epoch: 15 Global Step: 266140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:00,800-Speed 9586.42 samples/sec Loss 4.0399 LearningRate 0.0041 Epoch: 15 Global Step: 266150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:01,920-Speed 9146.52 samples/sec Loss 4.1012 LearningRate 0.0041 Epoch: 15 Global Step: 266160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:03,033-Speed 9211.76 samples/sec Loss 4.0853 LearningRate 0.0041 Epoch: 15 Global Step: 266170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:04,107-Speed 9541.28 samples/sec Loss 3.9800 LearningRate 0.0041 Epoch: 15 Global Step: 266180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:05,174-Speed 9606.73 samples/sec Loss 4.0734 LearningRate 0.0041 Epoch: 15 Global Step: 266190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:18:06,264-Speed 9391.16 samples/sec Loss 4.1504 LearningRate 0.0041 Epoch: 15 Global Step: 266200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:07,388-Speed 9115.73 samples/sec Loss 4.1372 LearningRate 0.0041 Epoch: 15 Global Step: 266210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:08,525-Speed 9015.98 samples/sec Loss 4.2733 LearningRate 0.0041 Epoch: 15 Global Step: 266220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:09,628-Speed 9293.61 samples/sec Loss 4.0681 LearningRate 0.0041 Epoch: 15 Global Step: 266230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:10,718-Speed 9394.79 samples/sec Loss 4.0755 LearningRate 0.0041 Epoch: 15 Global Step: 266240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:11,845-Speed 9091.22 samples/sec Loss 4.0489 LearningRate 0.0041 Epoch: 15 Global Step: 266250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:12,984-Speed 8996.64 samples/sec Loss 4.1521 LearningRate 0.0041 Epoch: 15 Global Step: 266260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:14,070-Speed 9437.67 samples/sec Loss 4.2069 LearningRate 0.0041 Epoch: 15 Global Step: 266270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:15,197-Speed 9090.34 samples/sec Loss 4.1976 LearningRate 0.0041 Epoch: 15 Global Step: 266280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:16,351-Speed 8882.34 samples/sec Loss 4.0505 LearningRate 0.0041 Epoch: 15 Global Step: 266290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:17,440-Speed 9404.35 samples/sec Loss 4.1648 LearningRate 0.0041 Epoch: 15 Global Step: 266300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:18,534-Speed 9370.29 samples/sec Loss 4.1091 LearningRate 0.0041 Epoch: 15 Global Step: 266310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:19,636-Speed 9293.08 samples/sec Loss 4.0986 LearningRate 0.0041 Epoch: 15 Global Step: 266320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:20,764-Speed 9084.26 samples/sec Loss 4.1222 LearningRate 0.0041 Epoch: 15 Global Step: 266330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:21,897-Speed 9043.15 samples/sec Loss 4.1613 LearningRate 0.0041 Epoch: 15 Global Step: 266340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:22,988-Speed 9393.87 samples/sec Loss 4.1397 LearningRate 0.0041 Epoch: 15 Global Step: 266350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:24,081-Speed 9376.12 samples/sec Loss 4.0905 LearningRate 0.0041 Epoch: 15 Global Step: 266360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:25,207-Speed 9099.24 samples/sec Loss 4.0964 LearningRate 0.0041 Epoch: 15 Global Step: 266370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:26,340-Speed 9043.74 samples/sec Loss 4.0510 LearningRate 0.0041 Epoch: 15 Global Step: 266380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:27,447-Speed 9256.09 samples/sec Loss 4.1837 LearningRate 0.0041 Epoch: 15 Global Step: 266390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:28,507-Speed 9661.56 samples/sec Loss 4.1417 LearningRate 0.0041 Epoch: 15 Global Step: 266400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:18:29,595-Speed 9418.18 samples/sec Loss 4.2412 LearningRate 0.0041 Epoch: 15 Global Step: 266410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:18:30,677-Speed 9469.04 samples/sec Loss 4.1622 LearningRate 0.0041 Epoch: 15 Global Step: 266420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:18:31,776-Speed 9327.21 samples/sec Loss 4.1399 LearningRate 0.0041 Epoch: 15 Global Step: 266430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:32,893-Speed 9171.71 samples/sec Loss 4.0882 LearningRate 0.0041 Epoch: 15 Global Step: 266440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:34,001-Speed 9240.16 samples/sec Loss 4.1376 LearningRate 0.0041 Epoch: 15 Global Step: 266450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:35,132-Speed 9063.72 samples/sec Loss 4.0486 LearningRate 0.0041 Epoch: 15 Global Step: 266460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:36,265-Speed 9038.66 samples/sec Loss 4.0978 LearningRate 0.0041 Epoch: 15 Global Step: 266470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:37,416-Speed 8908.99 samples/sec Loss 4.1428 LearningRate 0.0041 Epoch: 15 Global Step: 266480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:38,505-Speed 9412.64 samples/sec Loss 4.1427 LearningRate 0.0041 Epoch: 15 Global Step: 266490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:39,646-Speed 8973.88 samples/sec Loss 4.1696 LearningRate 0.0041 Epoch: 15 Global Step: 266500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:40,760-Speed 9200.98 samples/sec Loss 4.0982 LearningRate 0.0041 Epoch: 15 Global Step: 266510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:41,865-Speed 9270.93 samples/sec Loss 4.2387 LearningRate 0.0041 Epoch: 15 Global Step: 266520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:42,946-Speed 9478.46 samples/sec Loss 4.1320 LearningRate 0.0041 Epoch: 15 Global Step: 266530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:18:44,049-Speed 9290.57 samples/sec Loss 4.0544 LearningRate 0.0041 Epoch: 15 Global Step: 266540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:18:45,136-Speed 9430.60 samples/sec Loss 4.1210 LearningRate 0.0041 Epoch: 15 Global Step: 266550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:18:46,222-Speed 9430.48 samples/sec Loss 4.1263 LearningRate 0.0041 Epoch: 15 Global Step: 266560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:18:47,339-Speed 9175.51 samples/sec Loss 4.1886 LearningRate 0.0041 Epoch: 15 Global Step: 266570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:48,470-Speed 9061.20 samples/sec Loss 4.1411 LearningRate 0.0041 Epoch: 15 Global Step: 266580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:49,618-Speed 8924.28 samples/sec Loss 4.1744 LearningRate 0.0041 Epoch: 15 Global Step: 266590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:50,728-Speed 9228.46 samples/sec Loss 4.1087 LearningRate 0.0041 Epoch: 15 Global Step: 266600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:51,881-Speed 8886.66 samples/sec Loss 4.0789 LearningRate 0.0041 Epoch: 15 Global Step: 266610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:53,003-Speed 9128.28 samples/sec Loss 4.2258 LearningRate 0.0041 Epoch: 15 Global Step: 266620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:54,203-Speed 8533.86 samples/sec Loss 4.1610 LearningRate 0.0041 Epoch: 15 Global Step: 266630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:55,340-Speed 9015.92 samples/sec Loss 4.1930 LearningRate 0.0041 Epoch: 15 Global Step: 266640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:56,403-Speed 9639.04 samples/sec Loss 4.1807 LearningRate 0.0040 Epoch: 15 Global Step: 266650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:57,511-Speed 9253.62 samples/sec Loss 4.1656 LearningRate 0.0040 Epoch: 15 Global Step: 266660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:18:58,559-Speed 9771.84 samples/sec Loss 4.1813 LearningRate 0.0040 Epoch: 15 Global Step: 266670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:18:59,669-Speed 9234.10 samples/sec Loss 4.1714 LearningRate 0.0040 Epoch: 15 Global Step: 266680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:00,804-Speed 9027.01 samples/sec Loss 4.1796 LearningRate 0.0040 Epoch: 15 Global Step: 266690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:01,897-Speed 9372.98 samples/sec Loss 4.1464 LearningRate 0.0040 Epoch: 15 Global Step: 266700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:03,040-Speed 8968.22 samples/sec Loss 4.0767 LearningRate 0.0040 Epoch: 15 Global Step: 266710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:04,153-Speed 9202.52 samples/sec Loss 4.1898 LearningRate 0.0040 Epoch: 15 Global Step: 266720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:05,265-Speed 9216.53 samples/sec Loss 4.1904 LearningRate 0.0040 Epoch: 15 Global Step: 266730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:06,351-Speed 9432.95 samples/sec Loss 4.1095 LearningRate 0.0040 Epoch: 15 Global Step: 266740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:07,420-Speed 9582.20 samples/sec Loss 4.0918 LearningRate 0.0040 Epoch: 15 Global Step: 266750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:08,527-Speed 9254.80 samples/sec Loss 4.1245 LearningRate 0.0040 Epoch: 15 Global Step: 266760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:09,617-Speed 9398.77 samples/sec Loss 4.1662 LearningRate 0.0040 Epoch: 15 Global Step: 266770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:10,721-Speed 9286.22 samples/sec Loss 4.2179 LearningRate 0.0040 Epoch: 15 Global Step: 266780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:19:11,784-Speed 9639.07 samples/sec Loss 4.2051 LearningRate 0.0040 Epoch: 15 Global Step: 266790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:12,900-Speed 9178.19 samples/sec Loss 4.1484 LearningRate 0.0040 Epoch: 15 Global Step: 266800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:14,034-Speed 9041.06 samples/sec Loss 4.1254 LearningRate 0.0040 Epoch: 15 Global Step: 266810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:15,134-Speed 9311.05 samples/sec Loss 4.1321 LearningRate 0.0040 Epoch: 15 Global Step: 266820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:16,303-Speed 8763.74 samples/sec Loss 4.1853 LearningRate 0.0040 Epoch: 15 Global Step: 266830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:17,449-Speed 8938.60 samples/sec Loss 4.1861 LearningRate 0.0040 Epoch: 15 Global Step: 266840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:18,584-Speed 9028.03 samples/sec Loss 4.1086 LearningRate 0.0040 Epoch: 15 Global Step: 266850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:19,651-Speed 9606.68 samples/sec Loss 4.1343 LearningRate 0.0040 Epoch: 15 Global Step: 266860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:20,783-Speed 9058.20 samples/sec Loss 4.1344 LearningRate 0.0040 Epoch: 15 Global Step: 266870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:21,877-Speed 9359.89 samples/sec Loss 4.1072 LearningRate 0.0040 Epoch: 15 Global Step: 266880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:22,960-Speed 9469.98 samples/sec Loss 4.1193 LearningRate 0.0040 Epoch: 15 Global Step: 266890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:24,050-Speed 9398.67 samples/sec Loss 4.1791 LearningRate 0.0040 Epoch: 15 Global Step: 266900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:25,121-Speed 9567.22 samples/sec Loss 4.1474 LearningRate 0.0040 Epoch: 15 Global Step: 266910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:26,193-Speed 9556.71 samples/sec Loss 4.1073 LearningRate 0.0040 Epoch: 15 Global Step: 266920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:27,306-Speed 9198.44 samples/sec Loss 4.1398 LearningRate 0.0040 Epoch: 15 Global Step: 266930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:28,474-Speed 8778.25 samples/sec Loss 4.1389 LearningRate 0.0040 Epoch: 15 Global Step: 266940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:29,564-Speed 9398.91 samples/sec Loss 4.2253 LearningRate 0.0040 Epoch: 15 Global Step: 266950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:30,644-Speed 9484.75 samples/sec Loss 4.0878 LearningRate 0.0040 Epoch: 15 Global Step: 266960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:31,754-Speed 9229.90 samples/sec Loss 4.1675 LearningRate 0.0040 Epoch: 15 Global Step: 266970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:32,854-Speed 9312.76 samples/sec Loss 4.0989 LearningRate 0.0040 Epoch: 15 Global Step: 266980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:33,994-Speed 8989.46 samples/sec Loss 4.1621 LearningRate 0.0040 Epoch: 15 Global Step: 266990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:35,084-Speed 9401.00 samples/sec Loss 4.1039 LearningRate 0.0040 Epoch: 15 Global Step: 267000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:36,231-Speed 8931.70 samples/sec Loss 4.1194 LearningRate 0.0040 Epoch: 15 Global Step: 267010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:37,322-Speed 9398.22 samples/sec Loss 4.1317 LearningRate 0.0040 Epoch: 15 Global Step: 267020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:38,423-Speed 9303.55 samples/sec Loss 4.0983 LearningRate 0.0040 Epoch: 15 Global Step: 267030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:39,538-Speed 9187.87 samples/sec Loss 4.0749 LearningRate 0.0040 Epoch: 15 Global Step: 267040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:19:40,892-Speed 7566.51 samples/sec Loss 4.1784 LearningRate 0.0040 Epoch: 15 Global Step: 267050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:10,659-Speed 344.02 samples/sec Loss 3.9493 LearningRate 0.0040 Epoch: 16 Global Step: 267060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:12,294-Speed 6267.48 samples/sec Loss 3.6054 LearningRate 0.0040 Epoch: 16 Global Step: 267070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:13,823-Speed 6702.70 samples/sec Loss 3.5812 LearningRate 0.0040 Epoch: 16 Global Step: 267080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:14,907-Speed 9445.91 samples/sec Loss 3.6582 LearningRate 0.0040 Epoch: 16 Global Step: 267090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:20:15,999-Speed 9382.66 samples/sec Loss 3.6699 LearningRate 0.0040 Epoch: 16 Global Step: 267100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:20:17,325-Speed 7727.40 samples/sec Loss 3.6341 LearningRate 0.0040 Epoch: 16 Global Step: 267110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:20:18,840-Speed 6765.19 samples/sec Loss 3.6860 LearningRate 0.0040 Epoch: 16 Global Step: 267120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:20:19,975-Speed 9023.27 samples/sec Loss 3.6697 LearningRate 0.0040 Epoch: 16 Global Step: 267130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:21,081-Speed 9264.68 samples/sec Loss 3.5985 LearningRate 0.0040 Epoch: 16 Global Step: 267140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:22,225-Speed 8958.41 samples/sec Loss 3.6188 LearningRate 0.0040 Epoch: 16 Global Step: 267150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:23,374-Speed 8919.89 samples/sec Loss 3.5744 LearningRate 0.0040 Epoch: 16 Global Step: 267160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:24,524-Speed 8910.78 samples/sec Loss 3.6237 LearningRate 0.0040 Epoch: 16 Global Step: 267170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:25,657-Speed 9039.72 samples/sec Loss 3.6814 LearningRate 0.0040 Epoch: 16 Global Step: 267180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:26,788-Speed 9063.32 samples/sec Loss 3.6738 LearningRate 0.0040 Epoch: 16 Global Step: 267190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:27,848-Speed 9665.14 samples/sec Loss 3.6188 LearningRate 0.0040 Epoch: 16 Global Step: 267200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:28,948-Speed 9312.16 samples/sec Loss 3.5732 LearningRate 0.0040 Epoch: 16 Global Step: 267210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:30,056-Speed 9246.68 samples/sec Loss 3.5086 LearningRate 0.0040 Epoch: 16 Global Step: 267220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:31,134-Speed 9512.04 samples/sec Loss 3.6493 LearningRate 0.0040 Epoch: 16 Global Step: 267230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:20:32,217-Speed 9460.98 samples/sec Loss 3.5087 LearningRate 0.0040 Epoch: 16 Global Step: 267240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:20:33,334-Speed 9167.45 samples/sec Loss 3.6728 LearningRate 0.0040 Epoch: 16 Global Step: 267250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:34,404-Speed 9575.39 samples/sec Loss 3.6031 LearningRate 0.0040 Epoch: 16 Global Step: 267260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:35,503-Speed 9326.74 samples/sec Loss 3.5924 LearningRate 0.0040 Epoch: 16 Global Step: 267270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:36,570-Speed 9602.34 samples/sec Loss 3.6689 LearningRate 0.0040 Epoch: 16 Global Step: 267280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:37,632-Speed 9649.59 samples/sec Loss 3.5509 LearningRate 0.0040 Epoch: 16 Global Step: 267290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:38,679-Speed 9780.07 samples/sec Loss 3.6419 LearningRate 0.0040 Epoch: 16 Global Step: 267300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:39,841-Speed 8817.90 samples/sec Loss 3.6128 LearningRate 0.0040 Epoch: 16 Global Step: 267310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:40,971-Speed 9070.01 samples/sec Loss 3.6122 LearningRate 0.0040 Epoch: 16 Global Step: 267320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:42,050-Speed 9494.45 samples/sec Loss 3.6645 LearningRate 0.0040 Epoch: 16 Global Step: 267330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:43,200-Speed 8910.72 samples/sec Loss 3.6698 LearningRate 0.0040 Epoch: 16 Global Step: 267340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:44,270-Speed 9577.56 samples/sec Loss 3.6970 LearningRate 0.0040 Epoch: 16 Global Step: 267350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:20:45,344-Speed 9534.87 samples/sec Loss 3.5769 LearningRate 0.0040 Epoch: 16 Global Step: 267360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:46,460-Speed 9185.79 samples/sec Loss 3.6390 LearningRate 0.0040 Epoch: 16 Global Step: 267370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:47,508-Speed 9771.21 samples/sec Loss 3.5766 LearningRate 0.0040 Epoch: 16 Global Step: 267380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:48,597-Speed 9446.54 samples/sec Loss 3.5938 LearningRate 0.0040 Epoch: 16 Global Step: 267390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:49,882-Speed 7976.92 samples/sec Loss 3.6429 LearningRate 0.0040 Epoch: 16 Global Step: 267400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:50,953-Speed 9569.94 samples/sec Loss 3.7269 LearningRate 0.0040 Epoch: 16 Global Step: 267410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:52,200-Speed 8210.34 samples/sec Loss 3.7118 LearningRate 0.0040 Epoch: 16 Global Step: 267420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:53,465-Speed 8101.95 samples/sec Loss 3.7172 LearningRate 0.0040 Epoch: 16 Global Step: 267430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:54,703-Speed 8275.04 samples/sec Loss 3.6109 LearningRate 0.0040 Epoch: 16 Global Step: 267440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:55,964-Speed 8130.22 samples/sec Loss 3.5983 LearningRate 0.0040 Epoch: 16 Global Step: 267450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:57,092-Speed 9082.85 samples/sec Loss 3.5538 LearningRate 0.0040 Epoch: 16 Global Step: 267460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:58,342-Speed 8196.41 samples/sec Loss 3.5540 LearningRate 0.0040 Epoch: 16 Global Step: 267470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:20:59,442-Speed 9317.45 samples/sec Loss 3.5814 LearningRate 0.0039 Epoch: 16 Global Step: 267480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:00,694-Speed 8176.92 samples/sec Loss 3.6305 LearningRate 0.0039 Epoch: 16 Global Step: 267490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:01,778-Speed 9452.32 samples/sec Loss 3.6871 LearningRate 0.0039 Epoch: 16 Global Step: 267500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:02,843-Speed 9628.64 samples/sec Loss 3.7257 LearningRate 0.0039 Epoch: 16 Global Step: 267510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:04,003-Speed 8826.23 samples/sec Loss 3.6128 LearningRate 0.0039 Epoch: 16 Global Step: 267520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:05,087-Speed 9451.15 samples/sec Loss 3.6883 LearningRate 0.0039 Epoch: 16 Global Step: 267530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:06,225-Speed 9003.27 samples/sec Loss 3.7533 LearningRate 0.0039 Epoch: 16 Global Step: 267540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:07,312-Speed 9427.74 samples/sec Loss 3.6707 LearningRate 0.0039 Epoch: 16 Global Step: 267550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:08,364-Speed 9748.96 samples/sec Loss 3.6257 LearningRate 0.0039 Epoch: 16 Global Step: 267560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:09,486-Speed 9128.66 samples/sec Loss 3.5473 LearningRate 0.0039 Epoch: 16 Global Step: 267570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:10,586-Speed 9316.52 samples/sec Loss 3.5770 LearningRate 0.0039 Epoch: 16 Global Step: 267580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:11,745-Speed 8843.17 samples/sec Loss 3.6495 LearningRate 0.0039 Epoch: 16 Global Step: 267590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:12,859-Speed 9198.83 samples/sec Loss 3.6373 LearningRate 0.0039 Epoch: 16 Global Step: 267600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:13,959-Speed 9314.89 samples/sec Loss 3.6672 LearningRate 0.0039 Epoch: 16 Global Step: 267610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:15,031-Speed 9554.42 samples/sec Loss 3.6610 LearningRate 0.0039 Epoch: 16 Global Step: 267620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:16,118-Speed 9424.58 samples/sec Loss 3.7166 LearningRate 0.0039 Epoch: 16 Global Step: 267630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:17,219-Speed 9305.26 samples/sec Loss 3.7498 LearningRate 0.0039 Epoch: 16 Global Step: 267640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:18,310-Speed 9393.60 samples/sec Loss 3.6745 LearningRate 0.0039 Epoch: 16 Global Step: 267650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:21:19,421-Speed 9223.23 samples/sec Loss 3.6788 LearningRate 0.0039 Epoch: 16 Global Step: 267660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:21:20,542-Speed 9136.06 samples/sec Loss 3.6986 LearningRate 0.0039 Epoch: 16 Global Step: 267670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:21:21,630-Speed 9417.51 samples/sec Loss 3.6936 LearningRate 0.0039 Epoch: 16 Global Step: 267680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:21:22,741-Speed 9227.28 samples/sec Loss 3.6708 LearningRate 0.0039 Epoch: 16 Global Step: 267690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:21:23,825-Speed 9451.89 samples/sec Loss 3.6503 LearningRate 0.0039 Epoch: 16 Global Step: 267700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:21:24,918-Speed 9372.63 samples/sec Loss 3.5940 LearningRate 0.0039 Epoch: 16 Global Step: 267710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:21:25,994-Speed 9526.72 samples/sec Loss 3.6352 LearningRate 0.0039 Epoch: 16 Global Step: 267720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:21:27,109-Speed 9182.33 samples/sec Loss 3.6705 LearningRate 0.0039 Epoch: 16 Global Step: 267730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:21:28,206-Speed 9343.51 samples/sec Loss 3.5971 LearningRate 0.0039 Epoch: 16 Global Step: 267740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:21:29,318-Speed 9216.04 samples/sec Loss 3.6193 LearningRate 0.0039 Epoch: 16 Global Step: 267750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:30,426-Speed 9246.97 samples/sec Loss 3.7624 LearningRate 0.0039 Epoch: 16 Global Step: 267760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:31,490-Speed 9626.87 samples/sec Loss 3.5838 LearningRate 0.0039 Epoch: 16 Global Step: 267770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:32,583-Speed 9375.08 samples/sec Loss 3.6306 LearningRate 0.0039 Epoch: 16 Global Step: 267780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:33,685-Speed 9297.74 samples/sec Loss 3.6230 LearningRate 0.0039 Epoch: 16 Global Step: 267790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:34,834-Speed 8934.17 samples/sec Loss 3.7017 LearningRate 0.0039 Epoch: 16 Global Step: 267800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:35,887-Speed 9734.29 samples/sec Loss 3.7104 LearningRate 0.0039 Epoch: 16 Global Step: 267810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:37,002-Speed 9193.69 samples/sec Loss 3.6720 LearningRate 0.0039 Epoch: 16 Global Step: 267820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:38,117-Speed 9188.84 samples/sec Loss 3.7567 LearningRate 0.0039 Epoch: 16 Global Step: 267830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:39,279-Speed 8811.48 samples/sec Loss 3.7040 LearningRate 0.0039 Epoch: 16 Global Step: 267840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:40,384-Speed 9276.89 samples/sec Loss 3.6629 LearningRate 0.0039 Epoch: 16 Global Step: 267850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:41,460-Speed 9523.40 samples/sec Loss 3.7161 LearningRate 0.0039 Epoch: 16 Global Step: 267860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:42,553-Speed 9374.18 samples/sec Loss 3.6981 LearningRate 0.0039 Epoch: 16 Global Step: 267870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:43,639-Speed 9426.57 samples/sec Loss 3.6026 LearningRate 0.0039 Epoch: 16 Global Step: 267880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:44,729-Speed 9398.93 samples/sec Loss 3.5815 LearningRate 0.0039 Epoch: 16 Global Step: 267890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:45,878-Speed 8919.77 samples/sec Loss 3.5763 LearningRate 0.0039 Epoch: 16 Global Step: 267900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:46,955-Speed 9517.95 samples/sec Loss 3.6471 LearningRate 0.0039 Epoch: 16 Global Step: 267910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:48,046-Speed 9390.06 samples/sec Loss 3.6993 LearningRate 0.0039 Epoch: 16 Global Step: 267920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:49,151-Speed 9273.47 samples/sec Loss 3.6521 LearningRate 0.0039 Epoch: 16 Global Step: 267930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:50,271-Speed 9149.45 samples/sec Loss 3.6199 LearningRate 0.0039 Epoch: 16 Global Step: 267940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:51,356-Speed 9445.83 samples/sec Loss 3.5820 LearningRate 0.0039 Epoch: 16 Global Step: 267950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:21:52,425-Speed 9583.67 samples/sec Loss 3.6767 LearningRate 0.0039 Epoch: 16 Global Step: 267960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:53,509-Speed 9452.07 samples/sec Loss 3.7047 LearningRate 0.0039 Epoch: 16 Global Step: 267970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:54,612-Speed 9289.68 samples/sec Loss 3.6309 LearningRate 0.0039 Epoch: 16 Global Step: 267980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:55,762-Speed 8910.83 samples/sec Loss 3.6498 LearningRate 0.0039 Epoch: 16 Global Step: 267990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:21:56,851-Speed 9411.03 samples/sec Loss 3.6609 LearningRate 0.0039 Epoch: 16 Global Step: 268000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:22:18,703-[lfw][268000]XNorm: 6.989682 Training: 2022-04-11 22:22:18,704-[lfw][268000]Accuracy-Flip: 0.99617+-0.00289 Training: 2022-04-11 22:22:18,705-[lfw][268000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:22:43,936-[cfp_fp][268000]XNorm: 6.071566 Training: 2022-04-11 22:22:43,937-[cfp_fp][268000]Accuracy-Flip: 0.96886+-0.00755 Training: 2022-04-11 22:22:43,937-[cfp_fp][268000]Accuracy-Highest: 0.97171 Training: 2022-04-11 22:23:05,989-[agedb_30][268000]XNorm: 6.805370 Training: 2022-04-11 22:23:05,990-[agedb_30][268000]Accuracy-Flip: 0.97033+-0.00939 Training: 2022-04-11 22:23:05,990-[agedb_30][268000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:23:07,085-Speed 145.80 samples/sec Loss 3.7482 LearningRate 0.0039 Epoch: 16 Global Step: 268010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:08,149-Speed 9630.71 samples/sec Loss 3.6823 LearningRate 0.0039 Epoch: 16 Global Step: 268020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:09,273-Speed 9117.04 samples/sec Loss 3.7584 LearningRate 0.0039 Epoch: 16 Global Step: 268030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:10,371-Speed 9328.94 samples/sec Loss 3.6488 LearningRate 0.0039 Epoch: 16 Global Step: 268040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:11,438-Speed 9608.43 samples/sec Loss 3.7521 LearningRate 0.0039 Epoch: 16 Global Step: 268050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:12,498-Speed 9661.46 samples/sec Loss 3.6556 LearningRate 0.0039 Epoch: 16 Global Step: 268060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:13,620-Speed 9134.87 samples/sec Loss 3.7516 LearningRate 0.0039 Epoch: 16 Global Step: 268070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:14,716-Speed 9348.28 samples/sec Loss 3.6733 LearningRate 0.0039 Epoch: 16 Global Step: 268080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:15,804-Speed 9421.37 samples/sec Loss 3.6169 LearningRate 0.0039 Epoch: 16 Global Step: 268090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:16,936-Speed 9047.90 samples/sec Loss 3.7066 LearningRate 0.0039 Epoch: 16 Global Step: 268100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:18,046-Speed 9231.22 samples/sec Loss 3.6899 LearningRate 0.0039 Epoch: 16 Global Step: 268110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:19,237-Speed 8606.21 samples/sec Loss 3.6690 LearningRate 0.0039 Epoch: 16 Global Step: 268120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:20,379-Speed 8971.52 samples/sec Loss 3.6637 LearningRate 0.0039 Epoch: 16 Global Step: 268130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:21,504-Speed 9104.89 samples/sec Loss 3.6395 LearningRate 0.0039 Epoch: 16 Global Step: 268140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:22,623-Speed 9152.32 samples/sec Loss 3.5817 LearningRate 0.0039 Epoch: 16 Global Step: 268150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:23,723-Speed 9321.78 samples/sec Loss 3.6937 LearningRate 0.0039 Epoch: 16 Global Step: 268160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:24,834-Speed 9220.61 samples/sec Loss 3.7418 LearningRate 0.0039 Epoch: 16 Global Step: 268170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:25,946-Speed 9218.03 samples/sec Loss 3.6292 LearningRate 0.0039 Epoch: 16 Global Step: 268180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:27,024-Speed 9500.30 samples/sec Loss 3.6293 LearningRate 0.0039 Epoch: 16 Global Step: 268190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:28,134-Speed 9228.63 samples/sec Loss 3.6014 LearningRate 0.0039 Epoch: 16 Global Step: 268200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:29,220-Speed 9441.03 samples/sec Loss 3.6575 LearningRate 0.0039 Epoch: 16 Global Step: 268210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:30,307-Speed 9427.99 samples/sec Loss 3.7083 LearningRate 0.0039 Epoch: 16 Global Step: 268220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:31,407-Speed 9310.80 samples/sec Loss 3.6235 LearningRate 0.0039 Epoch: 16 Global Step: 268230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:32,519-Speed 9213.24 samples/sec Loss 3.6961 LearningRate 0.0039 Epoch: 16 Global Step: 268240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:33,662-Speed 8963.54 samples/sec Loss 3.6503 LearningRate 0.0039 Epoch: 16 Global Step: 268250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:34,797-Speed 9031.85 samples/sec Loss 3.7624 LearningRate 0.0039 Epoch: 16 Global Step: 268260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:35,884-Speed 9424.06 samples/sec Loss 3.6829 LearningRate 0.0039 Epoch: 16 Global Step: 268270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:36,939-Speed 9716.07 samples/sec Loss 3.6919 LearningRate 0.0039 Epoch: 16 Global Step: 268280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:38,071-Speed 9049.39 samples/sec Loss 3.6502 LearningRate 0.0039 Epoch: 16 Global Step: 268290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:39,223-Speed 8892.55 samples/sec Loss 3.7225 LearningRate 0.0039 Epoch: 16 Global Step: 268300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:40,329-Speed 9266.24 samples/sec Loss 3.7417 LearningRate 0.0039 Epoch: 16 Global Step: 268310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:41,434-Speed 9282.90 samples/sec Loss 3.7295 LearningRate 0.0038 Epoch: 16 Global Step: 268320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:42,571-Speed 9010.29 samples/sec Loss 3.6087 LearningRate 0.0038 Epoch: 16 Global Step: 268330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:43,705-Speed 9039.20 samples/sec Loss 3.6676 LearningRate 0.0038 Epoch: 16 Global Step: 268340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:44,752-Speed 9777.99 samples/sec Loss 3.7352 LearningRate 0.0038 Epoch: 16 Global Step: 268350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:45,882-Speed 9073.17 samples/sec Loss 3.7195 LearningRate 0.0038 Epoch: 16 Global Step: 268360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:46,987-Speed 9268.26 samples/sec Loss 3.7029 LearningRate 0.0038 Epoch: 16 Global Step: 268370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:48,125-Speed 9003.02 samples/sec Loss 3.6104 LearningRate 0.0038 Epoch: 16 Global Step: 268380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:49,291-Speed 8789.61 samples/sec Loss 3.6707 LearningRate 0.0038 Epoch: 16 Global Step: 268390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:50,396-Speed 9274.35 samples/sec Loss 3.6793 LearningRate 0.0038 Epoch: 16 Global Step: 268400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:51,472-Speed 9520.66 samples/sec Loss 3.6299 LearningRate 0.0038 Epoch: 16 Global Step: 268410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:52,536-Speed 9633.03 samples/sec Loss 3.6080 LearningRate 0.0038 Epoch: 16 Global Step: 268420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:53,632-Speed 9353.76 samples/sec Loss 3.6706 LearningRate 0.0038 Epoch: 16 Global Step: 268430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:54,673-Speed 9844.96 samples/sec Loss 3.6445 LearningRate 0.0038 Epoch: 16 Global Step: 268440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:55,786-Speed 9202.57 samples/sec Loss 3.6506 LearningRate 0.0038 Epoch: 16 Global Step: 268450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:56,943-Speed 8852.50 samples/sec Loss 3.7181 LearningRate 0.0038 Epoch: 16 Global Step: 268460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:23:58,018-Speed 9536.59 samples/sec Loss 3.6886 LearningRate 0.0038 Epoch: 16 Global Step: 268470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:23:59,110-Speed 9382.29 samples/sec Loss 3.6856 LearningRate 0.0038 Epoch: 16 Global Step: 268480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:00,218-Speed 9248.56 samples/sec Loss 3.7411 LearningRate 0.0038 Epoch: 16 Global Step: 268490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:01,352-Speed 9031.40 samples/sec Loss 3.6548 LearningRate 0.0038 Epoch: 16 Global Step: 268500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:02,459-Speed 9255.97 samples/sec Loss 3.7071 LearningRate 0.0038 Epoch: 16 Global Step: 268510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:03,583-Speed 9113.89 samples/sec Loss 3.6546 LearningRate 0.0038 Epoch: 16 Global Step: 268520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:04,739-Speed 8864.71 samples/sec Loss 3.7191 LearningRate 0.0038 Epoch: 16 Global Step: 268530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:05,848-Speed 9239.67 samples/sec Loss 3.6957 LearningRate 0.0038 Epoch: 16 Global Step: 268540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:06,957-Speed 9237.05 samples/sec Loss 3.6626 LearningRate 0.0038 Epoch: 16 Global Step: 268550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:08,041-Speed 9454.98 samples/sec Loss 3.6201 LearningRate 0.0038 Epoch: 16 Global Step: 268560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:09,151-Speed 9225.90 samples/sec Loss 3.6823 LearningRate 0.0038 Epoch: 16 Global Step: 268570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:24:10,252-Speed 9302.64 samples/sec Loss 3.6582 LearningRate 0.0038 Epoch: 16 Global Step: 268580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:24:11,370-Speed 9170.06 samples/sec Loss 3.6976 LearningRate 0.0038 Epoch: 16 Global Step: 268590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:12,439-Speed 9591.58 samples/sec Loss 3.7354 LearningRate 0.0038 Epoch: 16 Global Step: 268600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:13,563-Speed 9116.30 samples/sec Loss 3.6796 LearningRate 0.0038 Epoch: 16 Global Step: 268610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:14,660-Speed 9338.28 samples/sec Loss 3.6723 LearningRate 0.0038 Epoch: 16 Global Step: 268620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:15,747-Speed 9428.12 samples/sec Loss 3.6752 LearningRate 0.0038 Epoch: 16 Global Step: 268630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:16,869-Speed 9126.11 samples/sec Loss 3.6714 LearningRate 0.0038 Epoch: 16 Global Step: 268640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:18,016-Speed 8931.48 samples/sec Loss 3.7143 LearningRate 0.0038 Epoch: 16 Global Step: 268650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:19,157-Speed 8983.18 samples/sec Loss 3.7587 LearningRate 0.0038 Epoch: 16 Global Step: 268660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:20,264-Speed 9250.93 samples/sec Loss 3.6930 LearningRate 0.0038 Epoch: 16 Global Step: 268670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:21,327-Speed 9640.23 samples/sec Loss 3.6962 LearningRate 0.0038 Epoch: 16 Global Step: 268680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:22,431-Speed 9284.86 samples/sec Loss 3.7732 LearningRate 0.0038 Epoch: 16 Global Step: 268690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:24:23,557-Speed 9098.50 samples/sec Loss 3.6586 LearningRate 0.0038 Epoch: 16 Global Step: 268700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:24,658-Speed 9312.89 samples/sec Loss 3.7049 LearningRate 0.0038 Epoch: 16 Global Step: 268710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:25,753-Speed 9359.29 samples/sec Loss 3.6657 LearningRate 0.0038 Epoch: 16 Global Step: 268720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:26,831-Speed 9499.69 samples/sec Loss 3.6659 LearningRate 0.0038 Epoch: 16 Global Step: 268730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:27,897-Speed 9609.58 samples/sec Loss 3.6584 LearningRate 0.0038 Epoch: 16 Global Step: 268740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:28,964-Speed 9613.24 samples/sec Loss 3.6667 LearningRate 0.0038 Epoch: 16 Global Step: 268750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:30,087-Speed 9123.38 samples/sec Loss 3.6711 LearningRate 0.0038 Epoch: 16 Global Step: 268760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:31,162-Speed 9532.63 samples/sec Loss 3.7220 LearningRate 0.0038 Epoch: 16 Global Step: 268770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:32,296-Speed 9037.07 samples/sec Loss 3.7496 LearningRate 0.0038 Epoch: 16 Global Step: 268780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:33,444-Speed 8922.84 samples/sec Loss 3.7102 LearningRate 0.0038 Epoch: 16 Global Step: 268790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:34,555-Speed 9225.06 samples/sec Loss 3.7154 LearningRate 0.0038 Epoch: 16 Global Step: 268800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:24:35,635-Speed 9481.15 samples/sec Loss 3.6738 LearningRate 0.0038 Epoch: 16 Global Step: 268810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:36,719-Speed 9455.22 samples/sec Loss 3.6787 LearningRate 0.0038 Epoch: 16 Global Step: 268820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:37,827-Speed 9240.31 samples/sec Loss 3.6304 LearningRate 0.0038 Epoch: 16 Global Step: 268830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:38,904-Speed 9517.99 samples/sec Loss 3.7307 LearningRate 0.0038 Epoch: 16 Global Step: 268840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:39,982-Speed 9501.43 samples/sec Loss 3.7222 LearningRate 0.0038 Epoch: 16 Global Step: 268850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:41,132-Speed 8912.47 samples/sec Loss 3.7487 LearningRate 0.0038 Epoch: 16 Global Step: 268860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:42,240-Speed 9247.58 samples/sec Loss 3.7127 LearningRate 0.0038 Epoch: 16 Global Step: 268870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:43,361-Speed 9143.75 samples/sec Loss 3.8183 LearningRate 0.0038 Epoch: 16 Global Step: 268880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:44,489-Speed 9082.50 samples/sec Loss 3.7010 LearningRate 0.0038 Epoch: 16 Global Step: 268890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:45,554-Speed 9618.39 samples/sec Loss 3.6774 LearningRate 0.0038 Epoch: 16 Global Step: 268900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:46,631-Speed 9520.35 samples/sec Loss 3.7305 LearningRate 0.0038 Epoch: 16 Global Step: 268910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:24:47,738-Speed 9248.58 samples/sec Loss 3.6867 LearningRate 0.0038 Epoch: 16 Global Step: 268920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:24:48,817-Speed 9501.49 samples/sec Loss 3.7083 LearningRate 0.0038 Epoch: 16 Global Step: 268930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:24:49,887-Speed 9569.21 samples/sec Loss 3.6518 LearningRate 0.0038 Epoch: 16 Global Step: 268940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:50,999-Speed 9218.82 samples/sec Loss 3.7461 LearningRate 0.0038 Epoch: 16 Global Step: 268950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:52,106-Speed 9250.69 samples/sec Loss 3.6462 LearningRate 0.0038 Epoch: 16 Global Step: 268960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:53,169-Speed 9641.08 samples/sec Loss 3.6521 LearningRate 0.0038 Epoch: 16 Global Step: 268970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:54,235-Speed 9619.43 samples/sec Loss 3.6941 LearningRate 0.0038 Epoch: 16 Global Step: 268980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:55,303-Speed 9584.64 samples/sec Loss 3.7565 LearningRate 0.0038 Epoch: 16 Global Step: 268990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:56,405-Speed 9301.81 samples/sec Loss 3.7141 LearningRate 0.0038 Epoch: 16 Global Step: 269000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:57,513-Speed 9246.59 samples/sec Loss 3.7609 LearningRate 0.0038 Epoch: 16 Global Step: 269010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:58,641-Speed 9091.04 samples/sec Loss 3.7159 LearningRate 0.0038 Epoch: 16 Global Step: 269020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:24:59,808-Speed 8777.80 samples/sec Loss 3.6781 LearningRate 0.0038 Epoch: 16 Global Step: 269030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:00,898-Speed 9393.57 samples/sec Loss 3.6971 LearningRate 0.0038 Epoch: 16 Global Step: 269040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:25:01,998-Speed 9318.92 samples/sec Loss 3.6452 LearningRate 0.0038 Epoch: 16 Global Step: 269050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:03,166-Speed 8773.73 samples/sec Loss 3.8042 LearningRate 0.0038 Epoch: 16 Global Step: 269060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:04,258-Speed 9375.69 samples/sec Loss 3.6688 LearningRate 0.0038 Epoch: 16 Global Step: 269070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:05,350-Speed 9388.65 samples/sec Loss 3.7647 LearningRate 0.0038 Epoch: 16 Global Step: 269080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:06,456-Speed 9270.18 samples/sec Loss 3.7349 LearningRate 0.0038 Epoch: 16 Global Step: 269090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:07,563-Speed 9257.88 samples/sec Loss 3.7298 LearningRate 0.0038 Epoch: 16 Global Step: 269100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:08,659-Speed 9345.65 samples/sec Loss 3.6841 LearningRate 0.0038 Epoch: 16 Global Step: 269110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:09,790-Speed 9055.86 samples/sec Loss 3.7431 LearningRate 0.0038 Epoch: 16 Global Step: 269120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:10,864-Speed 9540.90 samples/sec Loss 3.6871 LearningRate 0.0038 Epoch: 16 Global Step: 269130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:11,951-Speed 9423.11 samples/sec Loss 3.7416 LearningRate 0.0038 Epoch: 16 Global Step: 269140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:13,009-Speed 9689.01 samples/sec Loss 3.6720 LearningRate 0.0038 Epoch: 16 Global Step: 269150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:25:14,078-Speed 9582.48 samples/sec Loss 3.7294 LearningRate 0.0038 Epoch: 16 Global Step: 269160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:25:15,138-Speed 9664.54 samples/sec Loss 3.7633 LearningRate 0.0038 Epoch: 16 Global Step: 269170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:16,222-Speed 9457.56 samples/sec Loss 3.7472 LearningRate 0.0037 Epoch: 16 Global Step: 269180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:17,323-Speed 9305.15 samples/sec Loss 3.7140 LearningRate 0.0037 Epoch: 16 Global Step: 269190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:18,416-Speed 9378.35 samples/sec Loss 3.8256 LearningRate 0.0037 Epoch: 16 Global Step: 269200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:19,528-Speed 9212.31 samples/sec Loss 3.6983 LearningRate 0.0037 Epoch: 16 Global Step: 269210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:20,674-Speed 8938.40 samples/sec Loss 3.6381 LearningRate 0.0037 Epoch: 16 Global Step: 269220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:21,769-Speed 9357.48 samples/sec Loss 3.7596 LearningRate 0.0037 Epoch: 16 Global Step: 269230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:22,944-Speed 8720.67 samples/sec Loss 3.6838 LearningRate 0.0037 Epoch: 16 Global Step: 269240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:24,095-Speed 8902.97 samples/sec Loss 3.8393 LearningRate 0.0037 Epoch: 16 Global Step: 269250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:25,186-Speed 9395.04 samples/sec Loss 3.7833 LearningRate 0.0037 Epoch: 16 Global Step: 269260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:26,299-Speed 9199.50 samples/sec Loss 3.7298 LearningRate 0.0037 Epoch: 16 Global Step: 269270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:25:27,387-Speed 9416.40 samples/sec Loss 3.6678 LearningRate 0.0037 Epoch: 16 Global Step: 269280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:28,450-Speed 9639.24 samples/sec Loss 3.7084 LearningRate 0.0037 Epoch: 16 Global Step: 269290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:29,567-Speed 9183.23 samples/sec Loss 3.6787 LearningRate 0.0037 Epoch: 16 Global Step: 269300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:30,641-Speed 9537.57 samples/sec Loss 3.7162 LearningRate 0.0037 Epoch: 16 Global Step: 269310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:31,729-Speed 9416.63 samples/sec Loss 3.7572 LearningRate 0.0037 Epoch: 16 Global Step: 269320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:32,869-Speed 8984.31 samples/sec Loss 3.7627 LearningRate 0.0037 Epoch: 16 Global Step: 269330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:33,982-Speed 9207.03 samples/sec Loss 3.8166 LearningRate 0.0037 Epoch: 16 Global Step: 269340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:35,027-Speed 9803.65 samples/sec Loss 3.7090 LearningRate 0.0037 Epoch: 16 Global Step: 269350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:36,105-Speed 9503.62 samples/sec Loss 3.6927 LearningRate 0.0037 Epoch: 16 Global Step: 269360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:25:37,240-Speed 9030.48 samples/sec Loss 3.7358 LearningRate 0.0037 Epoch: 16 Global Step: 269370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:25:38,386-Speed 8935.93 samples/sec Loss 3.7449 LearningRate 0.0037 Epoch: 16 Global Step: 269380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:25:39,494-Speed 9254.06 samples/sec Loss 3.6354 LearningRate 0.0037 Epoch: 16 Global Step: 269390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:25:40,640-Speed 8936.53 samples/sec Loss 3.7713 LearningRate 0.0037 Epoch: 16 Global Step: 269400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:25:41,780-Speed 8996.90 samples/sec Loss 3.7715 LearningRate 0.0037 Epoch: 16 Global Step: 269410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:25:42,886-Speed 9261.90 samples/sec Loss 3.6895 LearningRate 0.0037 Epoch: 16 Global Step: 269420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:25:44,029-Speed 8966.16 samples/sec Loss 3.7661 LearningRate 0.0037 Epoch: 16 Global Step: 269430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:25:45,136-Speed 9248.51 samples/sec Loss 3.7692 LearningRate 0.0037 Epoch: 16 Global Step: 269440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:25:46,264-Speed 9089.50 samples/sec Loss 3.8183 LearningRate 0.0037 Epoch: 16 Global Step: 269450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:25:47,405-Speed 8972.95 samples/sec Loss 3.7446 LearningRate 0.0037 Epoch: 16 Global Step: 269460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:48,485-Speed 9497.78 samples/sec Loss 3.7376 LearningRate 0.0037 Epoch: 16 Global Step: 269470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:49,639-Speed 8876.32 samples/sec Loss 3.6996 LearningRate 0.0037 Epoch: 16 Global Step: 269480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:50,728-Speed 9409.83 samples/sec Loss 3.8155 LearningRate 0.0037 Epoch: 16 Global Step: 269490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:51,809-Speed 9472.33 samples/sec Loss 3.7249 LearningRate 0.0037 Epoch: 16 Global Step: 269500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:52,904-Speed 9355.75 samples/sec Loss 3.7278 LearningRate 0.0037 Epoch: 16 Global Step: 269510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:54,014-Speed 9236.69 samples/sec Loss 3.8200 LearningRate 0.0037 Epoch: 16 Global Step: 269520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:55,132-Speed 9157.85 samples/sec Loss 3.7629 LearningRate 0.0037 Epoch: 16 Global Step: 269530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:56,215-Speed 9463.10 samples/sec Loss 3.7222 LearningRate 0.0037 Epoch: 16 Global Step: 269540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:57,324-Speed 9239.27 samples/sec Loss 3.7268 LearningRate 0.0037 Epoch: 16 Global Step: 269550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:25:58,442-Speed 9167.36 samples/sec Loss 3.8195 LearningRate 0.0037 Epoch: 16 Global Step: 269560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:25:59,592-Speed 8913.86 samples/sec Loss 3.7529 LearningRate 0.0037 Epoch: 16 Global Step: 269570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:00,703-Speed 9224.23 samples/sec Loss 3.7918 LearningRate 0.0037 Epoch: 16 Global Step: 269580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:01,825-Speed 9130.68 samples/sec Loss 3.7403 LearningRate 0.0037 Epoch: 16 Global Step: 269590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:02,923-Speed 9326.32 samples/sec Loss 3.7191 LearningRate 0.0037 Epoch: 16 Global Step: 269600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:04,014-Speed 9392.49 samples/sec Loss 3.6929 LearningRate 0.0037 Epoch: 16 Global Step: 269610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:05,131-Speed 9172.19 samples/sec Loss 3.7927 LearningRate 0.0037 Epoch: 16 Global Step: 269620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:06,300-Speed 8766.79 samples/sec Loss 3.7343 LearningRate 0.0037 Epoch: 16 Global Step: 269630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:07,408-Speed 9248.21 samples/sec Loss 3.7466 LearningRate 0.0037 Epoch: 16 Global Step: 269640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:08,507-Speed 9323.01 samples/sec Loss 3.8456 LearningRate 0.0037 Epoch: 16 Global Step: 269650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:09,620-Speed 9207.20 samples/sec Loss 3.7755 LearningRate 0.0037 Epoch: 16 Global Step: 269660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:10,763-Speed 8962.39 samples/sec Loss 3.7878 LearningRate 0.0037 Epoch: 16 Global Step: 269670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:11,860-Speed 9337.53 samples/sec Loss 3.7529 LearningRate 0.0037 Epoch: 16 Global Step: 269680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:12,953-Speed 9373.27 samples/sec Loss 3.7848 LearningRate 0.0037 Epoch: 16 Global Step: 269690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:14,012-Speed 9672.50 samples/sec Loss 3.7791 LearningRate 0.0037 Epoch: 16 Global Step: 269700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:15,093-Speed 9477.93 samples/sec Loss 3.7714 LearningRate 0.0037 Epoch: 16 Global Step: 269710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:16,181-Speed 9424.65 samples/sec Loss 3.7460 LearningRate 0.0037 Epoch: 16 Global Step: 269720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:17,279-Speed 9330.04 samples/sec Loss 3.8549 LearningRate 0.0037 Epoch: 16 Global Step: 269730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:18,441-Speed 8818.94 samples/sec Loss 3.8213 LearningRate 0.0037 Epoch: 16 Global Step: 269740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:19,565-Speed 9117.64 samples/sec Loss 3.6917 LearningRate 0.0037 Epoch: 16 Global Step: 269750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:20,639-Speed 9543.12 samples/sec Loss 3.7425 LearningRate 0.0037 Epoch: 16 Global Step: 269760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:21,812-Speed 8733.62 samples/sec Loss 3.7692 LearningRate 0.0037 Epoch: 16 Global Step: 269770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:22,907-Speed 9354.64 samples/sec Loss 3.8524 LearningRate 0.0037 Epoch: 16 Global Step: 269780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:23,983-Speed 9523.05 samples/sec Loss 3.7325 LearningRate 0.0037 Epoch: 16 Global Step: 269790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:25,129-Speed 8940.76 samples/sec Loss 3.8416 LearningRate 0.0037 Epoch: 16 Global Step: 269800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:26,263-Speed 9030.54 samples/sec Loss 3.7436 LearningRate 0.0037 Epoch: 16 Global Step: 269810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:27,372-Speed 9237.35 samples/sec Loss 3.7804 LearningRate 0.0037 Epoch: 16 Global Step: 269820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:28,476-Speed 9283.45 samples/sec Loss 3.7936 LearningRate 0.0037 Epoch: 16 Global Step: 269830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:29,534-Speed 9682.07 samples/sec Loss 3.7604 LearningRate 0.0037 Epoch: 16 Global Step: 269840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:30,617-Speed 9461.72 samples/sec Loss 3.7596 LearningRate 0.0037 Epoch: 16 Global Step: 269850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:31,706-Speed 9409.15 samples/sec Loss 3.7071 LearningRate 0.0037 Epoch: 16 Global Step: 269860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:32,815-Speed 9235.46 samples/sec Loss 3.7340 LearningRate 0.0037 Epoch: 16 Global Step: 269870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:33,962-Speed 8933.74 samples/sec Loss 3.7533 LearningRate 0.0037 Epoch: 16 Global Step: 269880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:35,067-Speed 9270.00 samples/sec Loss 3.7495 LearningRate 0.0037 Epoch: 16 Global Step: 269890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:36,199-Speed 9059.93 samples/sec Loss 3.6887 LearningRate 0.0037 Epoch: 16 Global Step: 269900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:37,326-Speed 9096.22 samples/sec Loss 3.7212 LearningRate 0.0037 Epoch: 16 Global Step: 269910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:38,547-Speed 8390.07 samples/sec Loss 3.8235 LearningRate 0.0037 Epoch: 16 Global Step: 269920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:26:39,662-Speed 9189.29 samples/sec Loss 3.7562 LearningRate 0.0037 Epoch: 16 Global Step: 269930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:40,733-Speed 9567.34 samples/sec Loss 3.7803 LearningRate 0.0037 Epoch: 16 Global Step: 269940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:41,844-Speed 9221.74 samples/sec Loss 3.6939 LearningRate 0.0037 Epoch: 16 Global Step: 269950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:42,896-Speed 9735.53 samples/sec Loss 3.8077 LearningRate 0.0037 Epoch: 16 Global Step: 269960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:44,029-Speed 9043.19 samples/sec Loss 3.7732 LearningRate 0.0037 Epoch: 16 Global Step: 269970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:45,119-Speed 9394.88 samples/sec Loss 3.7726 LearningRate 0.0037 Epoch: 16 Global Step: 269980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:46,229-Speed 9237.21 samples/sec Loss 3.8637 LearningRate 0.0037 Epoch: 16 Global Step: 269990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:26:47,299-Speed 9573.91 samples/sec Loss 3.7602 LearningRate 0.0037 Epoch: 16 Global Step: 270000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:27:09,298-[lfw][270000]XNorm: 6.974490 Training: 2022-04-11 22:27:09,298-[lfw][270000]Accuracy-Flip: 0.99733+-0.00309 Training: 2022-04-11 22:27:09,299-[lfw][270000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:27:34,696-[cfp_fp][270000]XNorm: 6.065904 Training: 2022-04-11 22:27:34,697-[cfp_fp][270000]Accuracy-Flip: 0.97071+-0.00662 Training: 2022-04-11 22:27:34,698-[cfp_fp][270000]Accuracy-Highest: 0.97171 Training: 2022-04-11 22:27:56,661-[agedb_30][270000]XNorm: 6.788168 Training: 2022-04-11 22:27:56,661-[agedb_30][270000]Accuracy-Flip: 0.97033+-0.00927 Training: 2022-04-11 22:27:56,662-[agedb_30][270000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:27:57,747-Speed 145.36 samples/sec Loss 3.8149 LearningRate 0.0037 Epoch: 16 Global Step: 270010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:27:58,833-Speed 9440.52 samples/sec Loss 3.7148 LearningRate 0.0037 Epoch: 16 Global Step: 270020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:27:59,962-Speed 9077.05 samples/sec Loss 3.8257 LearningRate 0.0037 Epoch: 16 Global Step: 270030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:01,075-Speed 9199.18 samples/sec Loss 3.7350 LearningRate 0.0037 Epoch: 16 Global Step: 270040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:02,200-Speed 9108.56 samples/sec Loss 3.7002 LearningRate 0.0036 Epoch: 16 Global Step: 270050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:03,271-Speed 9572.63 samples/sec Loss 3.7090 LearningRate 0.0036 Epoch: 16 Global Step: 270060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:04,334-Speed 9636.56 samples/sec Loss 3.8054 LearningRate 0.0036 Epoch: 16 Global Step: 270070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:05,437-Speed 9282.53 samples/sec Loss 3.7716 LearningRate 0.0036 Epoch: 16 Global Step: 270080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:06,551-Speed 9202.08 samples/sec Loss 3.7054 LearningRate 0.0036 Epoch: 16 Global Step: 270090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:07,683-Speed 9047.41 samples/sec Loss 3.7252 LearningRate 0.0036 Epoch: 16 Global Step: 270100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:08,810-Speed 9093.83 samples/sec Loss 3.7246 LearningRate 0.0036 Epoch: 16 Global Step: 270110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:09,981-Speed 8749.25 samples/sec Loss 3.7079 LearningRate 0.0036 Epoch: 16 Global Step: 270120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:11,036-Speed 9718.73 samples/sec Loss 3.7224 LearningRate 0.0036 Epoch: 16 Global Step: 270130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:12,160-Speed 9109.01 samples/sec Loss 3.7191 LearningRate 0.0036 Epoch: 16 Global Step: 270140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:13,293-Speed 9043.05 samples/sec Loss 3.7228 LearningRate 0.0036 Epoch: 16 Global Step: 270150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:14,376-Speed 9458.95 samples/sec Loss 3.7662 LearningRate 0.0036 Epoch: 16 Global Step: 270160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:15,497-Speed 9146.42 samples/sec Loss 3.7924 LearningRate 0.0036 Epoch: 16 Global Step: 270170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:16,595-Speed 9327.49 samples/sec Loss 3.7294 LearningRate 0.0036 Epoch: 16 Global Step: 270180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:17,712-Speed 9174.98 samples/sec Loss 3.7588 LearningRate 0.0036 Epoch: 16 Global Step: 270190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:18,834-Speed 9132.91 samples/sec Loss 3.7433 LearningRate 0.0036 Epoch: 16 Global Step: 270200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:19,976-Speed 8969.93 samples/sec Loss 3.7599 LearningRate 0.0036 Epoch: 16 Global Step: 270210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:21,083-Speed 9255.63 samples/sec Loss 3.8189 LearningRate 0.0036 Epoch: 16 Global Step: 270220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:22,202-Speed 9156.06 samples/sec Loss 3.8621 LearningRate 0.0036 Epoch: 16 Global Step: 270230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:23,321-Speed 9158.54 samples/sec Loss 3.7541 LearningRate 0.0036 Epoch: 16 Global Step: 270240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:24,390-Speed 9581.88 samples/sec Loss 3.7814 LearningRate 0.0036 Epoch: 16 Global Step: 270250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:25,534-Speed 8955.07 samples/sec Loss 3.7190 LearningRate 0.0036 Epoch: 16 Global Step: 270260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:26,649-Speed 9196.34 samples/sec Loss 3.7589 LearningRate 0.0036 Epoch: 16 Global Step: 270270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:27,749-Speed 9315.14 samples/sec Loss 3.7053 LearningRate 0.0036 Epoch: 16 Global Step: 270280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:28,854-Speed 9279.34 samples/sec Loss 3.7319 LearningRate 0.0036 Epoch: 16 Global Step: 270290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:29,923-Speed 9580.82 samples/sec Loss 3.7336 LearningRate 0.0036 Epoch: 16 Global Step: 270300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:30,986-Speed 9633.93 samples/sec Loss 3.6802 LearningRate 0.0036 Epoch: 16 Global Step: 270310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:32,065-Speed 9510.26 samples/sec Loss 3.8150 LearningRate 0.0036 Epoch: 16 Global Step: 270320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:33,188-Speed 9124.09 samples/sec Loss 3.6975 LearningRate 0.0036 Epoch: 16 Global Step: 270330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:34,291-Speed 9290.19 samples/sec Loss 3.6208 LearningRate 0.0036 Epoch: 16 Global Step: 270340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:35,372-Speed 9473.23 samples/sec Loss 3.7273 LearningRate 0.0036 Epoch: 16 Global Step: 270350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:36,439-Speed 9604.94 samples/sec Loss 3.7332 LearningRate 0.0036 Epoch: 16 Global Step: 270360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:37,535-Speed 9349.06 samples/sec Loss 3.7708 LearningRate 0.0036 Epoch: 16 Global Step: 270370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:38,631-Speed 9345.35 samples/sec Loss 3.7418 LearningRate 0.0036 Epoch: 16 Global Step: 270380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:39,741-Speed 9232.09 samples/sec Loss 3.8493 LearningRate 0.0036 Epoch: 16 Global Step: 270390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:40,845-Speed 9280.06 samples/sec Loss 3.7536 LearningRate 0.0036 Epoch: 16 Global Step: 270400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:41,934-Speed 9413.58 samples/sec Loss 3.8024 LearningRate 0.0036 Epoch: 16 Global Step: 270410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:43,002-Speed 9587.33 samples/sec Loss 3.7284 LearningRate 0.0036 Epoch: 16 Global Step: 270420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:44,111-Speed 9243.12 samples/sec Loss 3.7558 LearningRate 0.0036 Epoch: 16 Global Step: 270430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:45,182-Speed 9563.92 samples/sec Loss 3.7909 LearningRate 0.0036 Epoch: 16 Global Step: 270440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:46,305-Speed 9134.51 samples/sec Loss 3.7300 LearningRate 0.0036 Epoch: 16 Global Step: 270450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:47,402-Speed 9335.79 samples/sec Loss 3.7618 LearningRate 0.0036 Epoch: 16 Global Step: 270460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:48,513-Speed 9219.35 samples/sec Loss 3.7921 LearningRate 0.0036 Epoch: 16 Global Step: 270470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:49,595-Speed 9472.15 samples/sec Loss 3.7663 LearningRate 0.0036 Epoch: 16 Global Step: 270480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:28:50,716-Speed 9136.81 samples/sec Loss 3.8227 LearningRate 0.0036 Epoch: 16 Global Step: 270490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:51,828-Speed 9214.83 samples/sec Loss 3.7820 LearningRate 0.0036 Epoch: 16 Global Step: 270500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:52,972-Speed 8952.76 samples/sec Loss 3.7390 LearningRate 0.0036 Epoch: 16 Global Step: 270510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:54,080-Speed 9248.40 samples/sec Loss 3.8210 LearningRate 0.0036 Epoch: 16 Global Step: 270520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:55,198-Speed 9162.98 samples/sec Loss 3.8352 LearningRate 0.0036 Epoch: 16 Global Step: 270530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:56,328-Speed 9067.54 samples/sec Loss 3.7252 LearningRate 0.0036 Epoch: 16 Global Step: 270540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:57,455-Speed 9093.84 samples/sec Loss 3.7246 LearningRate 0.0036 Epoch: 16 Global Step: 270550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:58,583-Speed 9086.16 samples/sec Loss 3.7668 LearningRate 0.0036 Epoch: 16 Global Step: 270560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:28:59,671-Speed 9414.43 samples/sec Loss 3.7715 LearningRate 0.0036 Epoch: 16 Global Step: 270570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:00,808-Speed 9006.37 samples/sec Loss 3.7754 LearningRate 0.0036 Epoch: 16 Global Step: 270580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:01,906-Speed 9335.19 samples/sec Loss 3.7381 LearningRate 0.0036 Epoch: 16 Global Step: 270590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:03,010-Speed 9285.81 samples/sec Loss 3.7176 LearningRate 0.0036 Epoch: 16 Global Step: 270600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:04,098-Speed 9414.01 samples/sec Loss 3.7662 LearningRate 0.0036 Epoch: 16 Global Step: 270610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:05,211-Speed 9209.28 samples/sec Loss 3.7417 LearningRate 0.0036 Epoch: 16 Global Step: 270620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:06,263-Speed 9733.96 samples/sec Loss 3.8239 LearningRate 0.0036 Epoch: 16 Global Step: 270630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:07,353-Speed 9398.59 samples/sec Loss 3.7439 LearningRate 0.0036 Epoch: 16 Global Step: 270640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:08,445-Speed 9382.14 samples/sec Loss 3.7114 LearningRate 0.0036 Epoch: 16 Global Step: 270650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:09,616-Speed 8758.08 samples/sec Loss 3.7403 LearningRate 0.0036 Epoch: 16 Global Step: 270660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:10,728-Speed 9206.26 samples/sec Loss 3.7934 LearningRate 0.0036 Epoch: 16 Global Step: 270670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:11,838-Speed 9234.08 samples/sec Loss 3.7286 LearningRate 0.0036 Epoch: 16 Global Step: 270680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:12,928-Speed 9403.61 samples/sec Loss 3.7785 LearningRate 0.0036 Epoch: 16 Global Step: 270690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:14,048-Speed 9146.62 samples/sec Loss 3.7435 LearningRate 0.0036 Epoch: 16 Global Step: 270700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:15,172-Speed 9109.62 samples/sec Loss 3.7938 LearningRate 0.0036 Epoch: 16 Global Step: 270710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:16,275-Speed 9295.80 samples/sec Loss 3.7711 LearningRate 0.0036 Epoch: 16 Global Step: 270720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:17,402-Speed 9096.41 samples/sec Loss 3.7115 LearningRate 0.0036 Epoch: 16 Global Step: 270730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:18,531-Speed 9071.99 samples/sec Loss 3.7779 LearningRate 0.0036 Epoch: 16 Global Step: 270740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:19,643-Speed 9213.95 samples/sec Loss 3.8081 LearningRate 0.0036 Epoch: 16 Global Step: 270750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:20,783-Speed 8990.69 samples/sec Loss 3.7675 LearningRate 0.0036 Epoch: 16 Global Step: 270760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:21,870-Speed 9425.05 samples/sec Loss 3.7769 LearningRate 0.0036 Epoch: 16 Global Step: 270770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:22,963-Speed 9372.53 samples/sec Loss 3.8610 LearningRate 0.0036 Epoch: 16 Global Step: 270780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:24,069-Speed 9263.86 samples/sec Loss 3.7633 LearningRate 0.0036 Epoch: 16 Global Step: 270790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:25,146-Speed 9513.56 samples/sec Loss 3.7298 LearningRate 0.0036 Epoch: 16 Global Step: 270800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:26,232-Speed 9436.54 samples/sec Loss 3.8113 LearningRate 0.0036 Epoch: 16 Global Step: 270810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:29:27,376-Speed 8955.40 samples/sec Loss 3.7931 LearningRate 0.0036 Epoch: 16 Global Step: 270820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:29:28,496-Speed 9155.32 samples/sec Loss 3.8192 LearningRate 0.0036 Epoch: 16 Global Step: 270830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:29:29,605-Speed 9237.83 samples/sec Loss 3.8020 LearningRate 0.0036 Epoch: 16 Global Step: 270840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:29:30,771-Speed 8787.25 samples/sec Loss 3.7166 LearningRate 0.0036 Epoch: 16 Global Step: 270850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:29:31,910-Speed 8993.75 samples/sec Loss 3.8414 LearningRate 0.0036 Epoch: 16 Global Step: 270860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:29:33,054-Speed 8952.21 samples/sec Loss 3.7300 LearningRate 0.0036 Epoch: 16 Global Step: 270870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:29:34,162-Speed 9253.47 samples/sec Loss 3.6696 LearningRate 0.0036 Epoch: 16 Global Step: 270880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:29:35,234-Speed 9550.95 samples/sec Loss 3.7411 LearningRate 0.0036 Epoch: 16 Global Step: 270890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:29:36,321-Speed 9425.98 samples/sec Loss 3.7932 LearningRate 0.0036 Epoch: 16 Global Step: 270900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 22:29:37,403-Speed 9471.90 samples/sec Loss 3.7742 LearningRate 0.0036 Epoch: 16 Global Step: 270910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:38,512-Speed 9237.80 samples/sec Loss 3.7879 LearningRate 0.0036 Epoch: 16 Global Step: 270920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:39,624-Speed 9220.13 samples/sec Loss 3.7376 LearningRate 0.0035 Epoch: 16 Global Step: 270930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:40,753-Speed 9071.59 samples/sec Loss 3.7996 LearningRate 0.0035 Epoch: 16 Global Step: 270940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:41,861-Speed 9254.05 samples/sec Loss 3.8484 LearningRate 0.0035 Epoch: 16 Global Step: 270950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:42,968-Speed 9251.82 samples/sec Loss 3.6939 LearningRate 0.0035 Epoch: 16 Global Step: 270960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:44,063-Speed 9357.00 samples/sec Loss 3.7680 LearningRate 0.0035 Epoch: 16 Global Step: 270970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:45,180-Speed 9172.29 samples/sec Loss 3.7643 LearningRate 0.0035 Epoch: 16 Global Step: 270980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:46,326-Speed 8943.50 samples/sec Loss 3.7551 LearningRate 0.0035 Epoch: 16 Global Step: 270990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:47,434-Speed 9249.05 samples/sec Loss 3.6823 LearningRate 0.0035 Epoch: 16 Global Step: 271000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:48,571-Speed 9009.57 samples/sec Loss 3.8059 LearningRate 0.0035 Epoch: 16 Global Step: 271010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:49,689-Speed 9164.94 samples/sec Loss 3.7992 LearningRate 0.0035 Epoch: 16 Global Step: 271020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:29:50,839-Speed 8909.13 samples/sec Loss 3.7645 LearningRate 0.0035 Epoch: 16 Global Step: 271030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:51,994-Speed 8871.97 samples/sec Loss 3.8547 LearningRate 0.0035 Epoch: 16 Global Step: 271040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:53,108-Speed 9191.79 samples/sec Loss 3.8010 LearningRate 0.0035 Epoch: 16 Global Step: 271050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:54,288-Speed 8686.41 samples/sec Loss 3.7973 LearningRate 0.0035 Epoch: 16 Global Step: 271060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:55,408-Speed 9147.66 samples/sec Loss 3.8074 LearningRate 0.0035 Epoch: 16 Global Step: 271070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:56,538-Speed 9067.36 samples/sec Loss 3.8058 LearningRate 0.0035 Epoch: 16 Global Step: 271080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:57,696-Speed 8855.29 samples/sec Loss 3.7831 LearningRate 0.0035 Epoch: 16 Global Step: 271090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:58,791-Speed 9361.29 samples/sec Loss 3.7226 LearningRate 0.0035 Epoch: 16 Global Step: 271100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:29:59,873-Speed 9475.54 samples/sec Loss 3.7685 LearningRate 0.0035 Epoch: 16 Global Step: 271110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:01,029-Speed 8858.89 samples/sec Loss 3.7917 LearningRate 0.0035 Epoch: 16 Global Step: 271120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:02,157-Speed 9084.94 samples/sec Loss 3.7999 LearningRate 0.0035 Epoch: 16 Global Step: 271130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:30:03,286-Speed 9077.13 samples/sec Loss 3.8475 LearningRate 0.0035 Epoch: 16 Global Step: 271140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:30:04,388-Speed 9295.20 samples/sec Loss 3.8303 LearningRate 0.0035 Epoch: 16 Global Step: 271150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:05,442-Speed 9713.75 samples/sec Loss 3.8208 LearningRate 0.0035 Epoch: 16 Global Step: 271160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:06,517-Speed 9539.29 samples/sec Loss 3.7575 LearningRate 0.0035 Epoch: 16 Global Step: 271170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:07,627-Speed 9228.92 samples/sec Loss 3.7496 LearningRate 0.0035 Epoch: 16 Global Step: 271180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:08,698-Speed 9565.35 samples/sec Loss 3.7967 LearningRate 0.0035 Epoch: 16 Global Step: 271190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:09,803-Speed 9276.55 samples/sec Loss 3.7505 LearningRate 0.0035 Epoch: 16 Global Step: 271200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:10,885-Speed 9469.28 samples/sec Loss 3.9895 LearningRate 0.0035 Epoch: 16 Global Step: 271210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:12,034-Speed 8913.62 samples/sec Loss 3.8036 LearningRate 0.0035 Epoch: 16 Global Step: 271220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:13,151-Speed 9176.09 samples/sec Loss 3.8413 LearningRate 0.0035 Epoch: 16 Global Step: 271230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:14,300-Speed 8918.78 samples/sec Loss 3.8127 LearningRate 0.0035 Epoch: 16 Global Step: 271240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:15,395-Speed 9352.27 samples/sec Loss 3.8088 LearningRate 0.0035 Epoch: 16 Global Step: 271250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:30:16,536-Speed 8987.10 samples/sec Loss 3.7100 LearningRate 0.0035 Epoch: 16 Global Step: 271260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:30:17,635-Speed 9329.14 samples/sec Loss 3.7880 LearningRate 0.0035 Epoch: 16 Global Step: 271270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:18,704-Speed 9579.88 samples/sec Loss 3.7829 LearningRate 0.0035 Epoch: 16 Global Step: 271280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:19,785-Speed 9479.73 samples/sec Loss 3.7633 LearningRate 0.0035 Epoch: 16 Global Step: 271290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:20,896-Speed 9222.33 samples/sec Loss 3.8060 LearningRate 0.0035 Epoch: 16 Global Step: 271300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:22,043-Speed 8933.75 samples/sec Loss 3.7459 LearningRate 0.0035 Epoch: 16 Global Step: 271310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:23,126-Speed 9461.33 samples/sec Loss 3.7987 LearningRate 0.0035 Epoch: 16 Global Step: 271320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:24,253-Speed 9093.37 samples/sec Loss 3.8032 LearningRate 0.0035 Epoch: 16 Global Step: 271330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:25,303-Speed 9753.07 samples/sec Loss 3.8103 LearningRate 0.0035 Epoch: 16 Global Step: 271340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:26,466-Speed 8813.36 samples/sec Loss 3.8236 LearningRate 0.0035 Epoch: 16 Global Step: 271350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:27,534-Speed 9587.57 samples/sec Loss 3.7172 LearningRate 0.0035 Epoch: 16 Global Step: 271360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:28,687-Speed 8892.80 samples/sec Loss 3.7532 LearningRate 0.0035 Epoch: 16 Global Step: 271370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:30:29,784-Speed 9335.28 samples/sec Loss 3.7912 LearningRate 0.0035 Epoch: 16 Global Step: 271380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:30,864-Speed 9490.69 samples/sec Loss 3.8101 LearningRate 0.0035 Epoch: 16 Global Step: 271390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:31,965-Speed 9304.09 samples/sec Loss 3.7651 LearningRate 0.0035 Epoch: 16 Global Step: 271400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:33,087-Speed 9130.01 samples/sec Loss 3.7411 LearningRate 0.0035 Epoch: 16 Global Step: 271410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:34,228-Speed 8979.93 samples/sec Loss 3.8093 LearningRate 0.0035 Epoch: 16 Global Step: 271420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:35,366-Speed 9003.99 samples/sec Loss 3.8686 LearningRate 0.0035 Epoch: 16 Global Step: 271430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:36,492-Speed 9100.74 samples/sec Loss 3.8195 LearningRate 0.0035 Epoch: 16 Global Step: 271440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:37,636-Speed 8956.09 samples/sec Loss 3.7419 LearningRate 0.0035 Epoch: 16 Global Step: 271450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:38,713-Speed 9510.98 samples/sec Loss 3.7689 LearningRate 0.0035 Epoch: 16 Global Step: 271460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:39,799-Speed 9439.70 samples/sec Loss 3.7754 LearningRate 0.0035 Epoch: 16 Global Step: 271470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:40,870-Speed 9571.67 samples/sec Loss 3.7186 LearningRate 0.0035 Epoch: 16 Global Step: 271480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:41,944-Speed 9539.21 samples/sec Loss 3.7338 LearningRate 0.0035 Epoch: 16 Global Step: 271490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:43,025-Speed 9473.06 samples/sec Loss 3.8450 LearningRate 0.0035 Epoch: 16 Global Step: 271500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:44,077-Speed 9745.49 samples/sec Loss 3.7613 LearningRate 0.0035 Epoch: 16 Global Step: 271510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:45,160-Speed 9457.61 samples/sec Loss 3.7928 LearningRate 0.0035 Epoch: 16 Global Step: 271520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:46,258-Speed 9333.44 samples/sec Loss 3.8363 LearningRate 0.0035 Epoch: 16 Global Step: 271530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:47,352-Speed 9374.06 samples/sec Loss 3.7445 LearningRate 0.0035 Epoch: 16 Global Step: 271540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:48,462-Speed 9226.70 samples/sec Loss 3.7110 LearningRate 0.0035 Epoch: 16 Global Step: 271550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:49,564-Speed 9293.22 samples/sec Loss 3.8174 LearningRate 0.0035 Epoch: 16 Global Step: 271560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:50,646-Speed 9470.45 samples/sec Loss 3.8109 LearningRate 0.0035 Epoch: 16 Global Step: 271570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:51,759-Speed 9211.08 samples/sec Loss 3.8207 LearningRate 0.0035 Epoch: 16 Global Step: 271580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:30:52,875-Speed 9178.71 samples/sec Loss 3.8252 LearningRate 0.0035 Epoch: 16 Global Step: 271590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:30:53,943-Speed 9593.52 samples/sec Loss 3.7589 LearningRate 0.0035 Epoch: 16 Global Step: 271600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:55,140-Speed 8558.27 samples/sec Loss 3.8290 LearningRate 0.0035 Epoch: 16 Global Step: 271610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:56,260-Speed 9147.84 samples/sec Loss 3.8261 LearningRate 0.0035 Epoch: 16 Global Step: 271620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:57,418-Speed 8846.29 samples/sec Loss 3.7414 LearningRate 0.0035 Epoch: 16 Global Step: 271630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:58,534-Speed 9190.08 samples/sec Loss 3.8151 LearningRate 0.0035 Epoch: 16 Global Step: 271640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:30:59,670-Speed 9016.70 samples/sec Loss 3.8092 LearningRate 0.0035 Epoch: 16 Global Step: 271650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:00,729-Speed 9675.91 samples/sec Loss 3.7729 LearningRate 0.0035 Epoch: 16 Global Step: 271660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:01,784-Speed 9713.51 samples/sec Loss 3.8089 LearningRate 0.0035 Epoch: 16 Global Step: 271670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:02,884-Speed 9307.54 samples/sec Loss 3.8180 LearningRate 0.0035 Epoch: 16 Global Step: 271680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:03,975-Speed 9393.38 samples/sec Loss 3.7942 LearningRate 0.0035 Epoch: 16 Global Step: 271690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:05,111-Speed 9016.55 samples/sec Loss 3.7153 LearningRate 0.0035 Epoch: 16 Global Step: 271700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:31:06,179-Speed 9593.96 samples/sec Loss 3.7697 LearningRate 0.0035 Epoch: 16 Global Step: 271710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:31:07,313-Speed 9036.71 samples/sec Loss 3.8141 LearningRate 0.0035 Epoch: 16 Global Step: 271720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:31:08,467-Speed 8883.21 samples/sec Loss 3.9448 LearningRate 0.0035 Epoch: 16 Global Step: 271730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:31:09,532-Speed 9624.63 samples/sec Loss 3.8077 LearningRate 0.0035 Epoch: 16 Global Step: 271740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:10,615-Speed 9456.66 samples/sec Loss 3.8211 LearningRate 0.0035 Epoch: 16 Global Step: 271750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:11,723-Speed 9253.76 samples/sec Loss 3.8186 LearningRate 0.0035 Epoch: 16 Global Step: 271760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:12,831-Speed 9242.14 samples/sec Loss 3.7501 LearningRate 0.0035 Epoch: 16 Global Step: 271770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:13,895-Speed 9633.13 samples/sec Loss 3.8717 LearningRate 0.0035 Epoch: 16 Global Step: 271780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:14,988-Speed 9372.49 samples/sec Loss 3.8704 LearningRate 0.0035 Epoch: 16 Global Step: 271790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:16,083-Speed 9358.17 samples/sec Loss 3.8036 LearningRate 0.0035 Epoch: 16 Global Step: 271800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:17,193-Speed 9232.15 samples/sec Loss 3.7588 LearningRate 0.0035 Epoch: 16 Global Step: 271810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:18,308-Speed 9187.09 samples/sec Loss 3.7904 LearningRate 0.0034 Epoch: 16 Global Step: 271820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:19,406-Speed 9331.33 samples/sec Loss 3.9174 LearningRate 0.0034 Epoch: 16 Global Step: 271830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:20,525-Speed 9156.81 samples/sec Loss 3.8256 LearningRate 0.0034 Epoch: 16 Global Step: 271840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:31:21,614-Speed 9408.86 samples/sec Loss 3.8507 LearningRate 0.0034 Epoch: 16 Global Step: 271850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:31:22,684-Speed 9575.21 samples/sec Loss 3.8252 LearningRate 0.0034 Epoch: 16 Global Step: 271860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:31:23,795-Speed 9217.78 samples/sec Loss 3.8345 LearningRate 0.0034 Epoch: 16 Global Step: 271870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 22:31:24,913-Speed 9166.59 samples/sec Loss 3.7948 LearningRate 0.0034 Epoch: 16 Global Step: 271880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:25,975-Speed 9644.30 samples/sec Loss 3.7619 LearningRate 0.0034 Epoch: 16 Global Step: 271890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 22:31:27,057-Speed 9470.26 samples/sec Loss 3.7708 LearningRate 0.0034 Epoch: 16 Global Step: 271900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:31:28,225-Speed 8774.77 samples/sec Loss 3.7733 LearningRate 0.0034 Epoch: 16 Global Step: 271910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:31:29,352-Speed 9090.13 samples/sec Loss 3.7868 LearningRate 0.0034 Epoch: 16 Global Step: 271920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:31:30,437-Speed 9448.42 samples/sec Loss 3.8361 LearningRate 0.0034 Epoch: 16 Global Step: 271930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:31:31,514-Speed 9512.36 samples/sec Loss 3.8092 LearningRate 0.0034 Epoch: 16 Global Step: 271940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:31:32,624-Speed 9231.43 samples/sec Loss 3.7939 LearningRate 0.0034 Epoch: 16 Global Step: 271950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:31:33,712-Speed 9420.86 samples/sec Loss 3.9309 LearningRate 0.0034 Epoch: 16 Global Step: 271960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:31:34,803-Speed 9390.84 samples/sec Loss 3.6996 LearningRate 0.0034 Epoch: 16 Global Step: 271970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:31:35,906-Speed 9288.91 samples/sec Loss 3.7556 LearningRate 0.0034 Epoch: 16 Global Step: 271980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:31:37,011-Speed 9269.31 samples/sec Loss 3.7674 LearningRate 0.0034 Epoch: 16 Global Step: 271990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:31:38,085-Speed 9535.03 samples/sec Loss 3.7546 LearningRate 0.0034 Epoch: 16 Global Step: 272000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:32:00,167-[lfw][272000]XNorm: 6.936082 Training: 2022-04-11 22:32:00,168-[lfw][272000]Accuracy-Flip: 0.99683+-0.00311 Training: 2022-04-11 22:32:00,168-[lfw][272000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:32:25,642-[cfp_fp][272000]XNorm: 6.039087 Training: 2022-04-11 22:32:25,643-[cfp_fp][272000]Accuracy-Flip: 0.97029+-0.00887 Training: 2022-04-11 22:32:25,643-[cfp_fp][272000]Accuracy-Highest: 0.97171 Training: 2022-04-11 22:32:47,668-[agedb_30][272000]XNorm: 6.750198 Training: 2022-04-11 22:32:47,669-[agedb_30][272000]Accuracy-Flip: 0.97150+-0.00880 Training: 2022-04-11 22:32:47,669-[agedb_30][272000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:32:48,794-Speed 144.82 samples/sec Loss 3.7791 LearningRate 0.0034 Epoch: 16 Global Step: 272010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:32:49,884-Speed 9397.35 samples/sec Loss 3.7800 LearningRate 0.0034 Epoch: 16 Global Step: 272020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:32:51,013-Speed 9080.67 samples/sec Loss 3.7325 LearningRate 0.0034 Epoch: 16 Global Step: 272030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:32:52,145-Speed 9044.80 samples/sec Loss 3.8428 LearningRate 0.0034 Epoch: 16 Global Step: 272040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:32:53,229-Speed 9453.48 samples/sec Loss 3.8092 LearningRate 0.0034 Epoch: 16 Global Step: 272050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:32:54,341-Speed 9210.05 samples/sec Loss 3.8569 LearningRate 0.0034 Epoch: 16 Global Step: 272060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:32:55,427-Speed 9439.86 samples/sec Loss 3.7896 LearningRate 0.0034 Epoch: 16 Global Step: 272070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:32:56,548-Speed 9143.55 samples/sec Loss 3.7917 LearningRate 0.0034 Epoch: 16 Global Step: 272080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:32:57,726-Speed 8695.97 samples/sec Loss 3.7697 LearningRate 0.0034 Epoch: 16 Global Step: 272090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:32:58,860-Speed 9032.99 samples/sec Loss 3.8176 LearningRate 0.0034 Epoch: 16 Global Step: 272100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:32:59,979-Speed 9160.52 samples/sec Loss 3.8448 LearningRate 0.0034 Epoch: 16 Global Step: 272110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:01,085-Speed 9262.05 samples/sec Loss 3.8098 LearningRate 0.0034 Epoch: 16 Global Step: 272120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:02,167-Speed 9467.37 samples/sec Loss 3.8119 LearningRate 0.0034 Epoch: 16 Global Step: 272130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:03,292-Speed 9105.38 samples/sec Loss 3.8286 LearningRate 0.0034 Epoch: 16 Global Step: 272140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:04,443-Speed 8905.55 samples/sec Loss 3.7243 LearningRate 0.0034 Epoch: 16 Global Step: 272150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:05,577-Speed 9032.36 samples/sec Loss 3.8244 LearningRate 0.0034 Epoch: 16 Global Step: 272160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:06,686-Speed 9239.36 samples/sec Loss 3.8111 LearningRate 0.0034 Epoch: 16 Global Step: 272170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:07,801-Speed 9189.19 samples/sec Loss 3.7669 LearningRate 0.0034 Epoch: 16 Global Step: 272180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:08,910-Speed 9237.09 samples/sec Loss 3.8070 LearningRate 0.0034 Epoch: 16 Global Step: 272190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:09,990-Speed 9489.82 samples/sec Loss 3.8460 LearningRate 0.0034 Epoch: 16 Global Step: 272200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:11,082-Speed 9387.10 samples/sec Loss 3.8978 LearningRate 0.0034 Epoch: 16 Global Step: 272210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:12,207-Speed 9106.21 samples/sec Loss 3.8799 LearningRate 0.0034 Epoch: 16 Global Step: 272220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:13,350-Speed 8963.29 samples/sec Loss 3.8452 LearningRate 0.0034 Epoch: 16 Global Step: 272230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:14,492-Speed 8972.69 samples/sec Loss 3.7783 LearningRate 0.0034 Epoch: 16 Global Step: 272240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:15,602-Speed 9231.75 samples/sec Loss 3.8779 LearningRate 0.0034 Epoch: 16 Global Step: 272250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:16,723-Speed 9141.81 samples/sec Loss 3.8734 LearningRate 0.0034 Epoch: 16 Global Step: 272260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:17,804-Speed 9483.80 samples/sec Loss 3.7947 LearningRate 0.0034 Epoch: 16 Global Step: 272270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:18,965-Speed 8823.96 samples/sec Loss 3.8151 LearningRate 0.0034 Epoch: 16 Global Step: 272280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:20,083-Speed 9159.68 samples/sec Loss 3.7444 LearningRate 0.0034 Epoch: 16 Global Step: 272290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:21,201-Speed 9164.42 samples/sec Loss 3.7155 LearningRate 0.0034 Epoch: 16 Global Step: 272300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:22,321-Speed 9154.27 samples/sec Loss 3.8495 LearningRate 0.0034 Epoch: 16 Global Step: 272310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:23,434-Speed 9204.85 samples/sec Loss 3.8571 LearningRate 0.0034 Epoch: 16 Global Step: 272320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:24,510-Speed 9524.71 samples/sec Loss 3.7675 LearningRate 0.0034 Epoch: 16 Global Step: 272330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:25,635-Speed 9106.85 samples/sec Loss 3.8138 LearningRate 0.0034 Epoch: 16 Global Step: 272340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:26,721-Speed 9433.14 samples/sec Loss 3.7456 LearningRate 0.0034 Epoch: 16 Global Step: 272350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:27,850-Speed 9075.96 samples/sec Loss 3.6946 LearningRate 0.0034 Epoch: 16 Global Step: 272360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:28,959-Speed 9241.86 samples/sec Loss 3.8668 LearningRate 0.0034 Epoch: 16 Global Step: 272370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:30,022-Speed 9638.19 samples/sec Loss 3.8044 LearningRate 0.0034 Epoch: 16 Global Step: 272380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:31,111-Speed 9408.14 samples/sec Loss 3.7887 LearningRate 0.0034 Epoch: 16 Global Step: 272390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:32,222-Speed 9227.83 samples/sec Loss 3.9154 LearningRate 0.0034 Epoch: 16 Global Step: 272400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:33,381-Speed 8833.67 samples/sec Loss 3.8039 LearningRate 0.0034 Epoch: 16 Global Step: 272410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:34,450-Speed 9588.88 samples/sec Loss 3.8658 LearningRate 0.0034 Epoch: 16 Global Step: 272420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:35,546-Speed 9350.57 samples/sec Loss 3.8460 LearningRate 0.0034 Epoch: 16 Global Step: 272430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:36,663-Speed 9169.99 samples/sec Loss 3.7799 LearningRate 0.0034 Epoch: 16 Global Step: 272440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:37,803-Speed 8987.99 samples/sec Loss 3.7885 LearningRate 0.0034 Epoch: 16 Global Step: 272450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:38,924-Speed 9141.24 samples/sec Loss 3.7894 LearningRate 0.0034 Epoch: 16 Global Step: 272460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:40,032-Speed 9249.69 samples/sec Loss 3.7441 LearningRate 0.0034 Epoch: 16 Global Step: 272470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:41,160-Speed 9084.90 samples/sec Loss 3.6951 LearningRate 0.0034 Epoch: 16 Global Step: 272480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:42,226-Speed 9610.33 samples/sec Loss 3.7101 LearningRate 0.0034 Epoch: 16 Global Step: 272490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:43,319-Speed 9370.45 samples/sec Loss 3.7658 LearningRate 0.0034 Epoch: 16 Global Step: 272500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:44,415-Speed 9358.21 samples/sec Loss 3.7748 LearningRate 0.0034 Epoch: 16 Global Step: 272510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:45,499-Speed 9443.87 samples/sec Loss 3.7408 LearningRate 0.0034 Epoch: 16 Global Step: 272520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:46,615-Speed 9184.05 samples/sec Loss 3.8157 LearningRate 0.0034 Epoch: 16 Global Step: 272530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:47,740-Speed 9102.25 samples/sec Loss 3.8210 LearningRate 0.0034 Epoch: 16 Global Step: 272540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:48,870-Speed 9071.90 samples/sec Loss 3.8433 LearningRate 0.0034 Epoch: 16 Global Step: 272550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:49,993-Speed 9122.06 samples/sec Loss 3.8965 LearningRate 0.0034 Epoch: 16 Global Step: 272560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:33:51,109-Speed 9185.63 samples/sec Loss 3.8782 LearningRate 0.0034 Epoch: 16 Global Step: 272570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:52,258-Speed 8919.44 samples/sec Loss 3.8077 LearningRate 0.0034 Epoch: 16 Global Step: 272580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:53,405-Speed 8932.54 samples/sec Loss 3.7452 LearningRate 0.0034 Epoch: 16 Global Step: 272590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:54,519-Speed 9197.97 samples/sec Loss 3.9018 LearningRate 0.0034 Epoch: 16 Global Step: 272600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:55,611-Speed 9385.03 samples/sec Loss 3.8516 LearningRate 0.0034 Epoch: 16 Global Step: 272610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:56,781-Speed 8753.58 samples/sec Loss 3.8418 LearningRate 0.0034 Epoch: 16 Global Step: 272620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:57,906-Speed 9105.22 samples/sec Loss 3.7747 LearningRate 0.0034 Epoch: 16 Global Step: 272630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:33:59,044-Speed 9002.43 samples/sec Loss 3.8492 LearningRate 0.0034 Epoch: 16 Global Step: 272640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:00,158-Speed 9197.02 samples/sec Loss 3.8197 LearningRate 0.0034 Epoch: 16 Global Step: 272650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:01,275-Speed 9173.23 samples/sec Loss 3.8283 LearningRate 0.0034 Epoch: 16 Global Step: 272660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:02,362-Speed 9428.49 samples/sec Loss 3.8503 LearningRate 0.0034 Epoch: 16 Global Step: 272670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:03,492-Speed 9070.67 samples/sec Loss 3.7931 LearningRate 0.0034 Epoch: 16 Global Step: 272680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:04,607-Speed 9184.54 samples/sec Loss 3.8314 LearningRate 0.0034 Epoch: 16 Global Step: 272690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:05,747-Speed 8993.15 samples/sec Loss 3.7808 LearningRate 0.0034 Epoch: 16 Global Step: 272700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:06,843-Speed 9345.92 samples/sec Loss 3.7610 LearningRate 0.0034 Epoch: 16 Global Step: 272710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:07,918-Speed 9531.62 samples/sec Loss 3.8051 LearningRate 0.0034 Epoch: 16 Global Step: 272720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:09,018-Speed 9311.58 samples/sec Loss 3.7978 LearningRate 0.0033 Epoch: 16 Global Step: 272730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:10,149-Speed 9061.08 samples/sec Loss 3.7142 LearningRate 0.0033 Epoch: 16 Global Step: 272740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:11,235-Speed 9431.97 samples/sec Loss 3.7978 LearningRate 0.0033 Epoch: 16 Global Step: 272750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:12,439-Speed 8510.08 samples/sec Loss 3.8352 LearningRate 0.0033 Epoch: 16 Global Step: 272760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:13,546-Speed 9259.31 samples/sec Loss 3.8039 LearningRate 0.0033 Epoch: 16 Global Step: 272770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:14,677-Speed 9061.59 samples/sec Loss 3.8212 LearningRate 0.0033 Epoch: 16 Global Step: 272780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:15,781-Speed 9278.45 samples/sec Loss 3.8104 LearningRate 0.0033 Epoch: 16 Global Step: 272790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:16,946-Speed 8790.96 samples/sec Loss 3.8230 LearningRate 0.0033 Epoch: 16 Global Step: 272800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:18,059-Speed 9209.85 samples/sec Loss 3.8519 LearningRate 0.0033 Epoch: 16 Global Step: 272810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:19,185-Speed 9101.45 samples/sec Loss 3.8131 LearningRate 0.0033 Epoch: 16 Global Step: 272820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:20,261-Speed 9516.82 samples/sec Loss 3.7817 LearningRate 0.0033 Epoch: 16 Global Step: 272830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:21,426-Speed 8797.24 samples/sec Loss 3.8443 LearningRate 0.0033 Epoch: 16 Global Step: 272840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:22,522-Speed 9355.89 samples/sec Loss 3.9326 LearningRate 0.0033 Epoch: 16 Global Step: 272850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:23,706-Speed 8651.04 samples/sec Loss 3.8368 LearningRate 0.0033 Epoch: 16 Global Step: 272860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:24,816-Speed 9231.05 samples/sec Loss 3.7488 LearningRate 0.0033 Epoch: 16 Global Step: 272870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:25,913-Speed 9334.93 samples/sec Loss 3.8189 LearningRate 0.0033 Epoch: 16 Global Step: 272880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:27,031-Speed 9167.82 samples/sec Loss 3.7768 LearningRate 0.0033 Epoch: 16 Global Step: 272890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:28,164-Speed 9042.93 samples/sec Loss 3.8289 LearningRate 0.0033 Epoch: 16 Global Step: 272900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:29,273-Speed 9237.07 samples/sec Loss 3.8223 LearningRate 0.0033 Epoch: 16 Global Step: 272910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:30,377-Speed 9284.62 samples/sec Loss 3.7604 LearningRate 0.0033 Epoch: 16 Global Step: 272920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:31,531-Speed 8885.05 samples/sec Loss 3.8744 LearningRate 0.0033 Epoch: 16 Global Step: 272930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:32,653-Speed 9132.45 samples/sec Loss 3.7476 LearningRate 0.0033 Epoch: 16 Global Step: 272940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:33,769-Speed 9194.97 samples/sec Loss 3.8737 LearningRate 0.0033 Epoch: 16 Global Step: 272950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:34,889-Speed 9150.06 samples/sec Loss 3.8233 LearningRate 0.0033 Epoch: 16 Global Step: 272960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:35,983-Speed 9365.99 samples/sec Loss 3.9168 LearningRate 0.0033 Epoch: 16 Global Step: 272970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:37,065-Speed 9471.50 samples/sec Loss 3.7787 LearningRate 0.0033 Epoch: 16 Global Step: 272980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:38,140-Speed 9530.26 samples/sec Loss 3.8313 LearningRate 0.0033 Epoch: 16 Global Step: 272990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:39,334-Speed 8579.42 samples/sec Loss 3.8905 LearningRate 0.0033 Epoch: 16 Global Step: 273000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:40,393-Speed 9672.51 samples/sec Loss 3.7193 LearningRate 0.0033 Epoch: 16 Global Step: 273010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:41,540-Speed 8931.47 samples/sec Loss 3.8071 LearningRate 0.0033 Epoch: 16 Global Step: 273020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:42,666-Speed 9100.93 samples/sec Loss 3.7703 LearningRate 0.0033 Epoch: 16 Global Step: 273030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:43,788-Speed 9131.36 samples/sec Loss 3.8740 LearningRate 0.0033 Epoch: 16 Global Step: 273040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:44,949-Speed 8826.77 samples/sec Loss 3.8666 LearningRate 0.0033 Epoch: 16 Global Step: 273050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:46,099-Speed 8910.02 samples/sec Loss 3.8305 LearningRate 0.0033 Epoch: 16 Global Step: 273060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:47,216-Speed 9172.82 samples/sec Loss 3.7998 LearningRate 0.0033 Epoch: 16 Global Step: 273070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:48,321-Speed 9273.42 samples/sec Loss 3.8185 LearningRate 0.0033 Epoch: 16 Global Step: 273080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:49,433-Speed 9213.14 samples/sec Loss 3.8107 LearningRate 0.0033 Epoch: 16 Global Step: 273090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:34:50,517-Speed 9455.42 samples/sec Loss 3.8036 LearningRate 0.0033 Epoch: 16 Global Step: 273100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:51,611-Speed 9363.28 samples/sec Loss 3.7348 LearningRate 0.0033 Epoch: 16 Global Step: 273110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:52,712-Speed 9316.46 samples/sec Loss 3.8412 LearningRate 0.0033 Epoch: 16 Global Step: 273120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:53,795-Speed 9456.24 samples/sec Loss 3.8156 LearningRate 0.0033 Epoch: 16 Global Step: 273130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:54,870-Speed 9533.54 samples/sec Loss 3.7223 LearningRate 0.0033 Epoch: 16 Global Step: 273140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:56,053-Speed 8658.16 samples/sec Loss 3.7075 LearningRate 0.0033 Epoch: 16 Global Step: 273150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:57,213-Speed 8834.34 samples/sec Loss 3.8389 LearningRate 0.0033 Epoch: 16 Global Step: 273160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:58,306-Speed 9375.84 samples/sec Loss 3.8895 LearningRate 0.0033 Epoch: 16 Global Step: 273170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:34:59,423-Speed 9175.64 samples/sec Loss 3.7560 LearningRate 0.0033 Epoch: 16 Global Step: 273180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:00,519-Speed 9348.27 samples/sec Loss 3.8329 LearningRate 0.0033 Epoch: 16 Global Step: 273190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:01,674-Speed 8866.17 samples/sec Loss 3.7767 LearningRate 0.0033 Epoch: 16 Global Step: 273200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:02,802-Speed 9086.70 samples/sec Loss 3.8474 LearningRate 0.0033 Epoch: 16 Global Step: 273210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:03,895-Speed 9372.88 samples/sec Loss 3.7995 LearningRate 0.0033 Epoch: 16 Global Step: 273220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:05,018-Speed 9124.51 samples/sec Loss 3.7858 LearningRate 0.0033 Epoch: 16 Global Step: 273230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:06,180-Speed 8816.83 samples/sec Loss 3.8758 LearningRate 0.0033 Epoch: 16 Global Step: 273240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:07,296-Speed 9183.36 samples/sec Loss 3.7206 LearningRate 0.0033 Epoch: 16 Global Step: 273250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:08,443-Speed 8933.58 samples/sec Loss 3.8128 LearningRate 0.0033 Epoch: 16 Global Step: 273260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:09,604-Speed 8820.76 samples/sec Loss 3.8735 LearningRate 0.0033 Epoch: 16 Global Step: 273270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:10,696-Speed 9380.48 samples/sec Loss 3.8377 LearningRate 0.0033 Epoch: 16 Global Step: 273280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:11,834-Speed 9005.02 samples/sec Loss 3.7971 LearningRate 0.0033 Epoch: 16 Global Step: 273290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:12,968-Speed 9033.11 samples/sec Loss 3.8359 LearningRate 0.0033 Epoch: 16 Global Step: 273300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:14,101-Speed 9042.69 samples/sec Loss 3.8081 LearningRate 0.0033 Epoch: 16 Global Step: 273310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:15,214-Speed 9214.85 samples/sec Loss 3.7573 LearningRate 0.0033 Epoch: 16 Global Step: 273320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:16,317-Speed 9289.00 samples/sec Loss 3.8139 LearningRate 0.0033 Epoch: 16 Global Step: 273330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:17,474-Speed 8855.88 samples/sec Loss 3.8613 LearningRate 0.0033 Epoch: 16 Global Step: 273340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:18,623-Speed 8909.91 samples/sec Loss 3.7964 LearningRate 0.0033 Epoch: 16 Global Step: 273350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:19,713-Speed 9406.47 samples/sec Loss 3.9223 LearningRate 0.0033 Epoch: 16 Global Step: 273360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:20,793-Speed 9480.32 samples/sec Loss 3.8334 LearningRate 0.0033 Epoch: 16 Global Step: 273370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:21,936-Speed 8966.43 samples/sec Loss 3.8694 LearningRate 0.0033 Epoch: 16 Global Step: 273380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:23,054-Speed 9164.22 samples/sec Loss 3.7634 LearningRate 0.0033 Epoch: 16 Global Step: 273390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:24,171-Speed 9173.25 samples/sec Loss 3.8142 LearningRate 0.0033 Epoch: 16 Global Step: 273400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:25,307-Speed 9017.89 samples/sec Loss 3.8976 LearningRate 0.0033 Epoch: 16 Global Step: 273410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:26,401-Speed 9369.87 samples/sec Loss 3.8463 LearningRate 0.0033 Epoch: 16 Global Step: 273420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:27,513-Speed 9218.94 samples/sec Loss 3.8586 LearningRate 0.0033 Epoch: 16 Global Step: 273430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:28,581-Speed 9587.36 samples/sec Loss 3.7175 LearningRate 0.0033 Epoch: 16 Global Step: 273440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:29,662-Speed 9482.61 samples/sec Loss 3.7942 LearningRate 0.0033 Epoch: 16 Global Step: 273450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:30,747-Speed 9437.38 samples/sec Loss 3.8568 LearningRate 0.0033 Epoch: 16 Global Step: 273460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:31,849-Speed 9317.78 samples/sec Loss 3.8397 LearningRate 0.0033 Epoch: 16 Global Step: 273470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:32,949-Speed 9316.23 samples/sec Loss 3.8012 LearningRate 0.0033 Epoch: 16 Global Step: 273480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:34,017-Speed 9592.15 samples/sec Loss 3.8636 LearningRate 0.0033 Epoch: 16 Global Step: 273490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:35,139-Speed 9132.03 samples/sec Loss 3.8707 LearningRate 0.0033 Epoch: 16 Global Step: 273500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:36,233-Speed 9360.66 samples/sec Loss 3.7988 LearningRate 0.0033 Epoch: 16 Global Step: 273510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:37,337-Speed 9286.47 samples/sec Loss 3.7757 LearningRate 0.0033 Epoch: 16 Global Step: 273520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:38,518-Speed 8676.92 samples/sec Loss 3.8050 LearningRate 0.0033 Epoch: 16 Global Step: 273530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:39,591-Speed 9548.48 samples/sec Loss 3.7909 LearningRate 0.0033 Epoch: 16 Global Step: 273540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:40,637-Speed 9792.50 samples/sec Loss 3.8548 LearningRate 0.0033 Epoch: 16 Global Step: 273550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:41,753-Speed 9181.53 samples/sec Loss 3.8213 LearningRate 0.0033 Epoch: 16 Global Step: 273560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:42,864-Speed 9218.93 samples/sec Loss 3.8094 LearningRate 0.0033 Epoch: 16 Global Step: 273570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:43,975-Speed 9229.35 samples/sec Loss 3.8649 LearningRate 0.0033 Epoch: 16 Global Step: 273580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:45,049-Speed 9544.32 samples/sec Loss 3.8024 LearningRate 0.0033 Epoch: 16 Global Step: 273590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:46,120-Speed 9568.22 samples/sec Loss 3.7618 LearningRate 0.0033 Epoch: 16 Global Step: 273600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:35:47,203-Speed 9463.52 samples/sec Loss 3.7505 LearningRate 0.0033 Epoch: 16 Global Step: 273610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:48,292-Speed 9408.59 samples/sec Loss 3.7672 LearningRate 0.0033 Epoch: 16 Global Step: 273620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:49,396-Speed 9279.36 samples/sec Loss 3.7473 LearningRate 0.0033 Epoch: 16 Global Step: 273630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:50,588-Speed 8597.17 samples/sec Loss 3.7689 LearningRate 0.0032 Epoch: 16 Global Step: 273640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:51,728-Speed 8989.95 samples/sec Loss 3.8494 LearningRate 0.0032 Epoch: 16 Global Step: 273650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:52,875-Speed 8931.83 samples/sec Loss 3.8320 LearningRate 0.0032 Epoch: 16 Global Step: 273660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:54,057-Speed 8663.15 samples/sec Loss 3.8896 LearningRate 0.0032 Epoch: 16 Global Step: 273670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:55,229-Speed 8748.31 samples/sec Loss 3.8016 LearningRate 0.0032 Epoch: 16 Global Step: 273680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:56,349-Speed 9143.38 samples/sec Loss 3.7956 LearningRate 0.0032 Epoch: 16 Global Step: 273690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:57,458-Speed 9239.60 samples/sec Loss 3.7329 LearningRate 0.0032 Epoch: 16 Global Step: 273700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:58,515-Speed 9693.09 samples/sec Loss 3.8371 LearningRate 0.0032 Epoch: 16 Global Step: 273710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:35:59,687-Speed 8746.64 samples/sec Loss 3.8010 LearningRate 0.0032 Epoch: 16 Global Step: 273720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:00,821-Speed 9031.84 samples/sec Loss 3.8885 LearningRate 0.0032 Epoch: 16 Global Step: 273730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:01,953-Speed 9051.61 samples/sec Loss 3.9324 LearningRate 0.0032 Epoch: 16 Global Step: 273740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:03,051-Speed 9336.19 samples/sec Loss 3.8389 LearningRate 0.0032 Epoch: 16 Global Step: 273750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:04,171-Speed 9156.05 samples/sec Loss 3.8474 LearningRate 0.0032 Epoch: 16 Global Step: 273760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:05,335-Speed 8797.78 samples/sec Loss 3.9005 LearningRate 0.0032 Epoch: 16 Global Step: 273770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:06,443-Speed 9245.16 samples/sec Loss 3.7495 LearningRate 0.0032 Epoch: 16 Global Step: 273780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:07,575-Speed 9052.01 samples/sec Loss 3.8211 LearningRate 0.0032 Epoch: 16 Global Step: 273790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:08,693-Speed 9168.72 samples/sec Loss 3.8035 LearningRate 0.0032 Epoch: 16 Global Step: 273800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:09,799-Speed 9258.36 samples/sec Loss 3.7353 LearningRate 0.0032 Epoch: 16 Global Step: 273810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:36:10,875-Speed 9523.97 samples/sec Loss 3.7721 LearningRate 0.0032 Epoch: 16 Global Step: 273820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:36:11,977-Speed 9296.15 samples/sec Loss 3.7303 LearningRate 0.0032 Epoch: 16 Global Step: 273830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:13,112-Speed 9032.70 samples/sec Loss 3.8234 LearningRate 0.0032 Epoch: 16 Global Step: 273840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:14,220-Speed 9245.09 samples/sec Loss 3.8370 LearningRate 0.0032 Epoch: 16 Global Step: 273850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:15,337-Speed 9172.78 samples/sec Loss 3.8008 LearningRate 0.0032 Epoch: 16 Global Step: 273860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:16,459-Speed 9132.51 samples/sec Loss 3.8309 LearningRate 0.0032 Epoch: 16 Global Step: 273870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:17,535-Speed 9525.46 samples/sec Loss 3.7907 LearningRate 0.0032 Epoch: 16 Global Step: 273880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:18,670-Speed 9022.27 samples/sec Loss 3.8052 LearningRate 0.0032 Epoch: 16 Global Step: 273890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:19,788-Speed 9166.86 samples/sec Loss 3.8122 LearningRate 0.0032 Epoch: 16 Global Step: 273900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:20,897-Speed 9240.68 samples/sec Loss 3.8468 LearningRate 0.0032 Epoch: 16 Global Step: 273910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:21,996-Speed 9319.51 samples/sec Loss 3.8558 LearningRate 0.0032 Epoch: 16 Global Step: 273920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:36:23,091-Speed 9360.94 samples/sec Loss 3.8098 LearningRate 0.0032 Epoch: 16 Global Step: 273930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:36:24,170-Speed 9494.62 samples/sec Loss 3.7908 LearningRate 0.0032 Epoch: 16 Global Step: 273940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:36:25,274-Speed 9279.56 samples/sec Loss 3.7114 LearningRate 0.0032 Epoch: 16 Global Step: 273950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:36:26,397-Speed 9126.78 samples/sec Loss 3.7893 LearningRate 0.0032 Epoch: 16 Global Step: 273960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:36:27,492-Speed 9355.54 samples/sec Loss 3.8596 LearningRate 0.0032 Epoch: 16 Global Step: 273970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:36:28,595-Speed 9287.34 samples/sec Loss 3.8369 LearningRate 0.0032 Epoch: 16 Global Step: 273980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:36:29,719-Speed 9114.43 samples/sec Loss 3.8530 LearningRate 0.0032 Epoch: 16 Global Step: 273990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:36:30,858-Speed 8999.62 samples/sec Loss 3.8803 LearningRate 0.0032 Epoch: 16 Global Step: 274000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:36:53,035-[lfw][274000]XNorm: 6.892952 Training: 2022-04-11 22:36:53,035-[lfw][274000]Accuracy-Flip: 0.99717+-0.00308 Training: 2022-04-11 22:36:53,036-[lfw][274000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:37:18,574-[cfp_fp][274000]XNorm: 5.968948 Training: 2022-04-11 22:37:18,574-[cfp_fp][274000]Accuracy-Flip: 0.97157+-0.00794 Training: 2022-04-11 22:37:18,575-[cfp_fp][274000]Accuracy-Highest: 0.97171 Training: 2022-04-11 22:37:40,607-[agedb_30][274000]XNorm: 6.717413 Training: 2022-04-11 22:37:40,607-[agedb_30][274000]Accuracy-Flip: 0.97050+-0.00966 Training: 2022-04-11 22:37:40,608-[agedb_30][274000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:37:41,737-Speed 144.47 samples/sec Loss 3.8143 LearningRate 0.0032 Epoch: 16 Global Step: 274010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:37:42,781-Speed 9816.50 samples/sec Loss 3.8650 LearningRate 0.0032 Epoch: 16 Global Step: 274020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:43,889-Speed 9241.46 samples/sec Loss 3.8661 LearningRate 0.0032 Epoch: 16 Global Step: 274030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:44,999-Speed 9233.86 samples/sec Loss 3.8201 LearningRate 0.0032 Epoch: 16 Global Step: 274040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:46,059-Speed 9669.37 samples/sec Loss 3.8441 LearningRate 0.0032 Epoch: 16 Global Step: 274050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:47,163-Speed 9280.33 samples/sec Loss 3.9227 LearningRate 0.0032 Epoch: 16 Global Step: 274060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:48,242-Speed 9493.11 samples/sec Loss 3.8145 LearningRate 0.0032 Epoch: 16 Global Step: 274070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:49,342-Speed 9321.49 samples/sec Loss 3.7796 LearningRate 0.0032 Epoch: 16 Global Step: 274080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:50,482-Speed 8984.54 samples/sec Loss 3.8085 LearningRate 0.0032 Epoch: 16 Global Step: 274090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:51,580-Speed 9333.55 samples/sec Loss 3.8614 LearningRate 0.0032 Epoch: 16 Global Step: 274100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:52,712-Speed 9052.88 samples/sec Loss 3.8404 LearningRate 0.0032 Epoch: 16 Global Step: 274110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:53,848-Speed 9019.00 samples/sec Loss 3.8200 LearningRate 0.0032 Epoch: 16 Global Step: 274120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:37:55,045-Speed 8557.22 samples/sec Loss 3.8289 LearningRate 0.0032 Epoch: 16 Global Step: 274130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:37:56,136-Speed 9392.05 samples/sec Loss 3.8610 LearningRate 0.0032 Epoch: 16 Global Step: 274140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:57,220-Speed 9454.37 samples/sec Loss 3.8743 LearningRate 0.0032 Epoch: 16 Global Step: 274150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:58,295-Speed 9531.09 samples/sec Loss 3.8398 LearningRate 0.0032 Epoch: 16 Global Step: 274160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:37:59,451-Speed 8859.64 samples/sec Loss 3.8392 LearningRate 0.0032 Epoch: 16 Global Step: 274170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:00,617-Speed 8785.15 samples/sec Loss 3.8053 LearningRate 0.0032 Epoch: 16 Global Step: 274180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:01,696-Speed 9501.99 samples/sec Loss 3.8567 LearningRate 0.0032 Epoch: 16 Global Step: 274190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:02,812-Speed 9177.42 samples/sec Loss 3.7856 LearningRate 0.0032 Epoch: 16 Global Step: 274200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:03,980-Speed 8779.22 samples/sec Loss 3.8758 LearningRate 0.0032 Epoch: 16 Global Step: 274210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:05,109-Speed 9070.61 samples/sec Loss 3.8674 LearningRate 0.0032 Epoch: 16 Global Step: 274220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:06,240-Speed 9056.00 samples/sec Loss 3.8508 LearningRate 0.0032 Epoch: 16 Global Step: 274230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:07,360-Speed 9153.32 samples/sec Loss 3.8552 LearningRate 0.0032 Epoch: 16 Global Step: 274240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:08,523-Speed 8805.01 samples/sec Loss 3.8379 LearningRate 0.0032 Epoch: 16 Global Step: 274250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:09,649-Speed 9102.77 samples/sec Loss 3.8648 LearningRate 0.0032 Epoch: 16 Global Step: 274260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:10,776-Speed 9095.27 samples/sec Loss 3.8749 LearningRate 0.0032 Epoch: 16 Global Step: 274270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:11,909-Speed 9042.16 samples/sec Loss 3.7311 LearningRate 0.0032 Epoch: 16 Global Step: 274280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:13,026-Speed 9170.97 samples/sec Loss 3.8184 LearningRate 0.0032 Epoch: 16 Global Step: 274290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:14,175-Speed 8916.96 samples/sec Loss 3.8390 LearningRate 0.0032 Epoch: 16 Global Step: 274300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:15,271-Speed 9350.34 samples/sec Loss 3.8519 LearningRate 0.0032 Epoch: 16 Global Step: 274310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:16,419-Speed 8926.70 samples/sec Loss 3.8068 LearningRate 0.0032 Epoch: 16 Global Step: 274320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:17,522-Speed 9284.12 samples/sec Loss 3.8112 LearningRate 0.0032 Epoch: 16 Global Step: 274330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:18,667-Speed 8955.10 samples/sec Loss 3.8785 LearningRate 0.0032 Epoch: 16 Global Step: 274340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:19,799-Speed 9047.14 samples/sec Loss 3.8197 LearningRate 0.0032 Epoch: 16 Global Step: 274350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:20,882-Speed 9458.34 samples/sec Loss 3.7957 LearningRate 0.0032 Epoch: 16 Global Step: 274360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:22,056-Speed 8732.93 samples/sec Loss 3.8744 LearningRate 0.0032 Epoch: 16 Global Step: 274370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:23,189-Speed 9036.51 samples/sec Loss 3.9533 LearningRate 0.0032 Epoch: 16 Global Step: 274380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:24,364-Speed 8724.47 samples/sec Loss 3.8657 LearningRate 0.0032 Epoch: 16 Global Step: 274390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:25,480-Speed 9178.13 samples/sec Loss 3.8245 LearningRate 0.0032 Epoch: 16 Global Step: 274400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:26,574-Speed 9366.07 samples/sec Loss 3.8712 LearningRate 0.0032 Epoch: 16 Global Step: 274410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:27,653-Speed 9494.56 samples/sec Loss 3.8717 LearningRate 0.0032 Epoch: 16 Global Step: 274420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:28,726-Speed 9553.69 samples/sec Loss 3.8179 LearningRate 0.0032 Epoch: 16 Global Step: 274430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:29,834-Speed 9247.89 samples/sec Loss 3.8553 LearningRate 0.0032 Epoch: 16 Global Step: 274440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:30,943-Speed 9232.80 samples/sec Loss 3.8291 LearningRate 0.0032 Epoch: 16 Global Step: 274450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:32,073-Speed 9067.89 samples/sec Loss 3.8346 LearningRate 0.0032 Epoch: 16 Global Step: 274460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:33,248-Speed 8723.04 samples/sec Loss 3.8523 LearningRate 0.0032 Epoch: 16 Global Step: 274470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:34,358-Speed 9235.96 samples/sec Loss 3.8337 LearningRate 0.0032 Epoch: 16 Global Step: 274480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:35,427-Speed 9580.12 samples/sec Loss 3.8725 LearningRate 0.0032 Epoch: 16 Global Step: 274490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:36,601-Speed 8728.01 samples/sec Loss 3.8163 LearningRate 0.0032 Epoch: 16 Global Step: 274500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:37,708-Speed 9258.20 samples/sec Loss 3.7352 LearningRate 0.0032 Epoch: 16 Global Step: 274510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:38,807-Speed 9321.57 samples/sec Loss 3.8354 LearningRate 0.0032 Epoch: 16 Global Step: 274520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:39,943-Speed 9020.72 samples/sec Loss 3.8418 LearningRate 0.0032 Epoch: 16 Global Step: 274530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:41,082-Speed 8994.14 samples/sec Loss 3.7364 LearningRate 0.0032 Epoch: 16 Global Step: 274540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:38:42,177-Speed 9352.20 samples/sec Loss 3.8265 LearningRate 0.0032 Epoch: 16 Global Step: 274550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:43,287-Speed 9236.23 samples/sec Loss 3.8087 LearningRate 0.0032 Epoch: 16 Global Step: 274560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:44,441-Speed 8873.58 samples/sec Loss 3.8417 LearningRate 0.0032 Epoch: 16 Global Step: 274570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:45,630-Speed 8621.10 samples/sec Loss 3.8684 LearningRate 0.0031 Epoch: 16 Global Step: 274580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:46,739-Speed 9239.15 samples/sec Loss 3.9231 LearningRate 0.0031 Epoch: 16 Global Step: 274590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:38:47,823-Speed 9452.18 samples/sec Loss 3.7636 LearningRate 0.0031 Epoch: 16 Global Step: 274600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:38:48,935-Speed 9219.50 samples/sec Loss 3.7922 LearningRate 0.0031 Epoch: 16 Global Step: 274610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:38:50,029-Speed 9364.87 samples/sec Loss 3.8567 LearningRate 0.0031 Epoch: 16 Global Step: 274620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:38:51,177-Speed 8919.82 samples/sec Loss 3.8260 LearningRate 0.0031 Epoch: 16 Global Step: 274630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:38:52,252-Speed 9531.95 samples/sec Loss 3.7376 LearningRate 0.0031 Epoch: 16 Global Step: 274640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:38:53,364-Speed 9213.74 samples/sec Loss 3.8742 LearningRate 0.0031 Epoch: 16 Global Step: 274650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:38:54,530-Speed 8789.89 samples/sec Loss 3.8613 LearningRate 0.0031 Epoch: 16 Global Step: 274660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:38:55,662-Speed 9048.92 samples/sec Loss 3.8471 LearningRate 0.0031 Epoch: 16 Global Step: 274670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:38:56,806-Speed 8957.11 samples/sec Loss 3.7781 LearningRate 0.0031 Epoch: 16 Global Step: 274680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:38:57,902-Speed 9345.09 samples/sec Loss 3.8320 LearningRate 0.0031 Epoch: 16 Global Step: 274690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:38:59,068-Speed 8785.49 samples/sec Loss 3.8010 LearningRate 0.0031 Epoch: 16 Global Step: 274700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:00,179-Speed 9220.62 samples/sec Loss 3.7463 LearningRate 0.0031 Epoch: 16 Global Step: 274710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:01,297-Speed 9168.87 samples/sec Loss 3.8537 LearningRate 0.0031 Epoch: 16 Global Step: 274720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:02,358-Speed 9651.34 samples/sec Loss 3.8776 LearningRate 0.0031 Epoch: 16 Global Step: 274730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:03,439-Speed 9478.71 samples/sec Loss 3.8195 LearningRate 0.0031 Epoch: 16 Global Step: 274740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:04,539-Speed 9318.28 samples/sec Loss 3.8610 LearningRate 0.0031 Epoch: 16 Global Step: 274750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:05,674-Speed 9032.11 samples/sec Loss 3.7814 LearningRate 0.0031 Epoch: 16 Global Step: 274760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:06,763-Speed 9408.27 samples/sec Loss 3.8352 LearningRate 0.0031 Epoch: 16 Global Step: 274770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:07,872-Speed 9241.41 samples/sec Loss 3.7549 LearningRate 0.0031 Epoch: 16 Global Step: 274780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:08,975-Speed 9288.94 samples/sec Loss 3.8909 LearningRate 0.0031 Epoch: 16 Global Step: 274790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:10,073-Speed 9332.25 samples/sec Loss 3.8255 LearningRate 0.0031 Epoch: 16 Global Step: 274800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:11,178-Speed 9273.65 samples/sec Loss 3.7578 LearningRate 0.0031 Epoch: 16 Global Step: 274810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:12,294-Speed 9178.23 samples/sec Loss 3.8769 LearningRate 0.0031 Epoch: 16 Global Step: 274820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:13,449-Speed 8873.27 samples/sec Loss 3.9462 LearningRate 0.0031 Epoch: 16 Global Step: 274830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:14,619-Speed 8753.94 samples/sec Loss 3.8606 LearningRate 0.0031 Epoch: 16 Global Step: 274840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:15,685-Speed 9614.17 samples/sec Loss 3.8144 LearningRate 0.0031 Epoch: 16 Global Step: 274850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:16,821-Speed 9016.68 samples/sec Loss 3.8677 LearningRate 0.0031 Epoch: 16 Global Step: 274860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:17,911-Speed 9401.25 samples/sec Loss 3.8053 LearningRate 0.0031 Epoch: 16 Global Step: 274870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:18,996-Speed 9440.63 samples/sec Loss 3.7804 LearningRate 0.0031 Epoch: 16 Global Step: 274880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:20,115-Speed 9163.07 samples/sec Loss 3.8429 LearningRate 0.0031 Epoch: 16 Global Step: 274890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:21,245-Speed 9061.56 samples/sec Loss 3.8099 LearningRate 0.0031 Epoch: 16 Global Step: 274900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:39:22,390-Speed 8955.20 samples/sec Loss 3.8736 LearningRate 0.0031 Epoch: 16 Global Step: 274910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:39:23,490-Speed 9317.39 samples/sec Loss 3.7794 LearningRate 0.0031 Epoch: 16 Global Step: 274920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:39:24,623-Speed 9044.01 samples/sec Loss 3.7779 LearningRate 0.0031 Epoch: 16 Global Step: 274930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:39:25,769-Speed 8938.15 samples/sec Loss 3.8808 LearningRate 0.0031 Epoch: 16 Global Step: 274940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:26,908-Speed 8997.60 samples/sec Loss 3.8104 LearningRate 0.0031 Epoch: 16 Global Step: 274950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:28,023-Speed 9189.23 samples/sec Loss 3.8464 LearningRate 0.0031 Epoch: 16 Global Step: 274960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:29,158-Speed 9024.49 samples/sec Loss 3.8725 LearningRate 0.0031 Epoch: 16 Global Step: 274970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:30,293-Speed 9028.06 samples/sec Loss 3.8110 LearningRate 0.0031 Epoch: 16 Global Step: 274980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:31,413-Speed 9147.09 samples/sec Loss 3.8236 LearningRate 0.0031 Epoch: 16 Global Step: 274990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:32,551-Speed 9001.74 samples/sec Loss 3.8648 LearningRate 0.0031 Epoch: 16 Global Step: 275000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:33,684-Speed 9047.02 samples/sec Loss 3.8817 LearningRate 0.0031 Epoch: 16 Global Step: 275010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:34,849-Speed 8792.53 samples/sec Loss 3.9426 LearningRate 0.0031 Epoch: 16 Global Step: 275020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:35,961-Speed 9218.79 samples/sec Loss 3.9405 LearningRate 0.0031 Epoch: 16 Global Step: 275030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:37,046-Speed 9444.83 samples/sec Loss 3.8438 LearningRate 0.0031 Epoch: 16 Global Step: 275040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:39:38,207-Speed 8824.04 samples/sec Loss 3.7765 LearningRate 0.0031 Epoch: 16 Global Step: 275050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:39:39,320-Speed 9201.22 samples/sec Loss 3.8159 LearningRate 0.0031 Epoch: 16 Global Step: 275060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:40,473-Speed 8889.07 samples/sec Loss 3.7910 LearningRate 0.0031 Epoch: 16 Global Step: 275070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:41,612-Speed 8995.04 samples/sec Loss 3.8727 LearningRate 0.0031 Epoch: 16 Global Step: 275080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:42,726-Speed 9193.16 samples/sec Loss 3.8672 LearningRate 0.0031 Epoch: 16 Global Step: 275090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:43,873-Speed 8937.27 samples/sec Loss 3.8299 LearningRate 0.0031 Epoch: 16 Global Step: 275100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:44,989-Speed 9183.88 samples/sec Loss 3.8532 LearningRate 0.0031 Epoch: 16 Global Step: 275110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:46,093-Speed 9280.12 samples/sec Loss 3.7781 LearningRate 0.0031 Epoch: 16 Global Step: 275120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:47,243-Speed 8904.39 samples/sec Loss 3.7875 LearningRate 0.0031 Epoch: 16 Global Step: 275130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:48,332-Speed 9410.91 samples/sec Loss 3.9310 LearningRate 0.0031 Epoch: 16 Global Step: 275140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:49,460-Speed 9084.80 samples/sec Loss 3.8636 LearningRate 0.0031 Epoch: 16 Global Step: 275150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:50,583-Speed 9122.85 samples/sec Loss 3.7395 LearningRate 0.0031 Epoch: 16 Global Step: 275160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:51,720-Speed 9011.68 samples/sec Loss 3.8562 LearningRate 0.0031 Epoch: 16 Global Step: 275170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:52,838-Speed 9166.13 samples/sec Loss 3.8243 LearningRate 0.0031 Epoch: 16 Global Step: 275180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:53,974-Speed 9023.43 samples/sec Loss 3.8030 LearningRate 0.0031 Epoch: 16 Global Step: 275190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:55,109-Speed 9022.19 samples/sec Loss 3.7513 LearningRate 0.0031 Epoch: 16 Global Step: 275200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:56,252-Speed 8960.53 samples/sec Loss 3.9401 LearningRate 0.0031 Epoch: 16 Global Step: 275210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:57,392-Speed 8987.32 samples/sec Loss 3.7834 LearningRate 0.0031 Epoch: 16 Global Step: 275220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:58,474-Speed 9477.10 samples/sec Loss 3.7562 LearningRate 0.0031 Epoch: 16 Global Step: 275230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:39:59,628-Speed 8870.67 samples/sec Loss 3.8307 LearningRate 0.0031 Epoch: 16 Global Step: 275240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:00,753-Speed 9111.52 samples/sec Loss 3.8095 LearningRate 0.0031 Epoch: 16 Global Step: 275250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:01,890-Speed 9016.16 samples/sec Loss 3.8277 LearningRate 0.0031 Epoch: 16 Global Step: 275260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:03,002-Speed 9209.96 samples/sec Loss 3.8637 LearningRate 0.0031 Epoch: 16 Global Step: 275270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:04,158-Speed 8863.94 samples/sec Loss 3.8592 LearningRate 0.0031 Epoch: 16 Global Step: 275280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:05,263-Speed 9278.21 samples/sec Loss 3.7921 LearningRate 0.0031 Epoch: 16 Global Step: 275290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:06,387-Speed 9115.16 samples/sec Loss 3.7680 LearningRate 0.0031 Epoch: 16 Global Step: 275300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:07,494-Speed 9254.53 samples/sec Loss 3.8839 LearningRate 0.0031 Epoch: 16 Global Step: 275310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:08,568-Speed 9538.38 samples/sec Loss 3.8952 LearningRate 0.0031 Epoch: 16 Global Step: 275320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:09,648-Speed 9482.54 samples/sec Loss 3.8605 LearningRate 0.0031 Epoch: 16 Global Step: 275330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:10,734-Speed 9439.20 samples/sec Loss 3.8262 LearningRate 0.0031 Epoch: 16 Global Step: 275340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:11,863-Speed 9068.13 samples/sec Loss 3.7867 LearningRate 0.0031 Epoch: 16 Global Step: 275350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:13,055-Speed 8599.08 samples/sec Loss 3.7444 LearningRate 0.0031 Epoch: 16 Global Step: 275360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:14,184-Speed 9074.13 samples/sec Loss 3.8635 LearningRate 0.0031 Epoch: 16 Global Step: 275370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:15,249-Speed 9620.15 samples/sec Loss 3.8622 LearningRate 0.0031 Epoch: 16 Global Step: 275380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:16,387-Speed 9003.95 samples/sec Loss 3.7997 LearningRate 0.0031 Epoch: 16 Global Step: 275390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:17,495-Speed 9245.12 samples/sec Loss 3.8313 LearningRate 0.0031 Epoch: 16 Global Step: 275400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:18,571-Speed 9525.95 samples/sec Loss 3.7777 LearningRate 0.0031 Epoch: 16 Global Step: 275410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:19,664-Speed 9375.60 samples/sec Loss 3.9297 LearningRate 0.0031 Epoch: 16 Global Step: 275420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:20,749-Speed 9446.27 samples/sec Loss 3.9260 LearningRate 0.0031 Epoch: 16 Global Step: 275430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:21,827-Speed 9505.40 samples/sec Loss 3.7532 LearningRate 0.0031 Epoch: 16 Global Step: 275440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:22,954-Speed 9091.07 samples/sec Loss 3.7613 LearningRate 0.0031 Epoch: 16 Global Step: 275450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:24,067-Speed 9205.18 samples/sec Loss 3.8520 LearningRate 0.0031 Epoch: 16 Global Step: 275460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:25,183-Speed 9182.24 samples/sec Loss 3.8824 LearningRate 0.0031 Epoch: 16 Global Step: 275470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:26,304-Speed 9138.59 samples/sec Loss 3.8004 LearningRate 0.0031 Epoch: 16 Global Step: 275480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:27,473-Speed 8767.55 samples/sec Loss 3.8966 LearningRate 0.0031 Epoch: 16 Global Step: 275490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:28,598-Speed 9100.65 samples/sec Loss 3.8232 LearningRate 0.0031 Epoch: 16 Global Step: 275500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:29,681-Speed 9468.69 samples/sec Loss 3.8716 LearningRate 0.0031 Epoch: 16 Global Step: 275510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:30,801-Speed 9141.10 samples/sec Loss 3.8454 LearningRate 0.0031 Epoch: 16 Global Step: 275520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:31,879-Speed 9509.10 samples/sec Loss 3.8536 LearningRate 0.0030 Epoch: 16 Global Step: 275530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:32,987-Speed 9249.39 samples/sec Loss 3.8165 LearningRate 0.0030 Epoch: 16 Global Step: 275540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:34,071-Speed 9446.69 samples/sec Loss 3.9376 LearningRate 0.0030 Epoch: 16 Global Step: 275550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:35,179-Speed 9250.61 samples/sec Loss 3.8199 LearningRate 0.0030 Epoch: 16 Global Step: 275560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:36,321-Speed 8966.75 samples/sec Loss 3.8226 LearningRate 0.0030 Epoch: 16 Global Step: 275570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:37,488-Speed 8781.82 samples/sec Loss 3.8178 LearningRate 0.0030 Epoch: 16 Global Step: 275580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:38,655-Speed 8780.57 samples/sec Loss 3.8541 LearningRate 0.0030 Epoch: 16 Global Step: 275590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:39,786-Speed 9070.47 samples/sec Loss 3.9089 LearningRate 0.0030 Epoch: 16 Global Step: 275600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:40,863-Speed 9512.66 samples/sec Loss 3.8098 LearningRate 0.0030 Epoch: 16 Global Step: 275610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:41,966-Speed 9286.59 samples/sec Loss 3.8641 LearningRate 0.0030 Epoch: 16 Global Step: 275620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:43,086-Speed 9148.62 samples/sec Loss 3.8937 LearningRate 0.0030 Epoch: 16 Global Step: 275630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:44,157-Speed 9565.13 samples/sec Loss 3.8914 LearningRate 0.0030 Epoch: 16 Global Step: 275640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:45,284-Speed 9092.82 samples/sec Loss 3.9332 LearningRate 0.0030 Epoch: 16 Global Step: 275650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:46,343-Speed 9672.82 samples/sec Loss 3.8414 LearningRate 0.0030 Epoch: 16 Global Step: 275660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:47,424-Speed 9480.23 samples/sec Loss 3.7792 LearningRate 0.0030 Epoch: 16 Global Step: 275670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:48,521-Speed 9339.18 samples/sec Loss 3.8953 LearningRate 0.0030 Epoch: 16 Global Step: 275680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:40:49,612-Speed 9390.09 samples/sec Loss 3.9043 LearningRate 0.0030 Epoch: 16 Global Step: 275690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:50,718-Speed 9291.51 samples/sec Loss 3.8819 LearningRate 0.0030 Epoch: 16 Global Step: 275700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:51,831-Speed 9197.67 samples/sec Loss 3.8937 LearningRate 0.0030 Epoch: 16 Global Step: 275710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:52,943-Speed 9220.29 samples/sec Loss 3.8468 LearningRate 0.0030 Epoch: 16 Global Step: 275720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:54,119-Speed 8707.18 samples/sec Loss 3.8548 LearningRate 0.0030 Epoch: 16 Global Step: 275730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:55,223-Speed 9286.00 samples/sec Loss 3.8955 LearningRate 0.0030 Epoch: 16 Global Step: 275740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:56,345-Speed 9133.76 samples/sec Loss 3.7789 LearningRate 0.0030 Epoch: 16 Global Step: 275750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:57,486-Speed 8981.58 samples/sec Loss 3.8313 LearningRate 0.0030 Epoch: 16 Global Step: 275760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:58,618-Speed 9048.30 samples/sec Loss 3.8471 LearningRate 0.0030 Epoch: 16 Global Step: 275770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:40:59,762-Speed 8960.55 samples/sec Loss 3.8414 LearningRate 0.0030 Epoch: 16 Global Step: 275780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:00,883-Speed 9139.22 samples/sec Loss 3.8890 LearningRate 0.0030 Epoch: 16 Global Step: 275790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:41:01,972-Speed 9405.95 samples/sec Loss 3.8367 LearningRate 0.0030 Epoch: 16 Global Step: 275800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:03,071-Speed 9324.51 samples/sec Loss 3.9143 LearningRate 0.0030 Epoch: 16 Global Step: 275810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:04,159-Speed 9417.77 samples/sec Loss 3.9207 LearningRate 0.0030 Epoch: 16 Global Step: 275820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:05,224-Speed 9613.14 samples/sec Loss 3.8968 LearningRate 0.0030 Epoch: 16 Global Step: 275830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:06,286-Speed 9651.14 samples/sec Loss 3.8722 LearningRate 0.0030 Epoch: 16 Global Step: 275840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:07,401-Speed 9189.93 samples/sec Loss 3.8262 LearningRate 0.0030 Epoch: 16 Global Step: 275850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:08,521-Speed 9145.15 samples/sec Loss 3.8141 LearningRate 0.0030 Epoch: 16 Global Step: 275860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:09,676-Speed 8873.31 samples/sec Loss 3.8575 LearningRate 0.0030 Epoch: 16 Global Step: 275870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:10,789-Speed 9208.94 samples/sec Loss 3.7359 LearningRate 0.0030 Epoch: 16 Global Step: 275880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:11,887-Speed 9331.16 samples/sec Loss 3.8398 LearningRate 0.0030 Epoch: 16 Global Step: 275890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:12,952-Speed 9617.71 samples/sec Loss 3.8890 LearningRate 0.0030 Epoch: 16 Global Step: 275900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:14,067-Speed 9190.15 samples/sec Loss 3.8951 LearningRate 0.0030 Epoch: 16 Global Step: 275910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:15,233-Speed 8788.21 samples/sec Loss 3.8070 LearningRate 0.0030 Epoch: 16 Global Step: 275920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:16,317-Speed 9448.31 samples/sec Loss 3.8015 LearningRate 0.0030 Epoch: 16 Global Step: 275930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:17,438-Speed 9144.33 samples/sec Loss 3.8726 LearningRate 0.0030 Epoch: 16 Global Step: 275940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:18,561-Speed 9123.54 samples/sec Loss 3.8303 LearningRate 0.0030 Epoch: 16 Global Step: 275950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:19,688-Speed 9087.05 samples/sec Loss 3.8751 LearningRate 0.0030 Epoch: 16 Global Step: 275960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:20,792-Speed 9286.10 samples/sec Loss 3.7864 LearningRate 0.0030 Epoch: 16 Global Step: 275970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:21,884-Speed 9380.89 samples/sec Loss 3.9316 LearningRate 0.0030 Epoch: 16 Global Step: 275980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:22,962-Speed 9510.32 samples/sec Loss 3.8318 LearningRate 0.0030 Epoch: 16 Global Step: 275990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:41:24,052-Speed 9396.34 samples/sec Loss 3.8075 LearningRate 0.0030 Epoch: 16 Global Step: 276000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:41:45,905-[lfw][276000]XNorm: 6.900148 Training: 2022-04-11 22:41:45,906-[lfw][276000]Accuracy-Flip: 0.99583+-0.00281 Training: 2022-04-11 22:41:45,906-[lfw][276000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:42:11,233-[cfp_fp][276000]XNorm: 5.996978 Training: 2022-04-11 22:42:11,234-[cfp_fp][276000]Accuracy-Flip: 0.97243+-0.00670 Training: 2022-04-11 22:42:11,234-[cfp_fp][276000]Accuracy-Highest: 0.97243 Training: 2022-04-11 22:42:33,078-[agedb_30][276000]XNorm: 6.713654 Training: 2022-04-11 22:42:33,079-[agedb_30][276000]Accuracy-Flip: 0.97233+-0.00901 Training: 2022-04-11 22:42:33,079-[agedb_30][276000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:42:34,144-Speed 146.09 samples/sec Loss 3.8436 LearningRate 0.0030 Epoch: 16 Global Step: 276010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:42:35,220-Speed 9522.98 samples/sec Loss 3.8884 LearningRate 0.0030 Epoch: 16 Global Step: 276020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:42:36,327-Speed 9260.48 samples/sec Loss 3.7923 LearningRate 0.0030 Epoch: 16 Global Step: 276030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:42:37,424-Speed 9336.66 samples/sec Loss 3.8758 LearningRate 0.0030 Epoch: 16 Global Step: 276040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:42:38,565-Speed 8982.61 samples/sec Loss 3.9269 LearningRate 0.0030 Epoch: 16 Global Step: 276050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:42:39,620-Speed 9710.11 samples/sec Loss 3.7974 LearningRate 0.0030 Epoch: 16 Global Step: 276060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:42:40,722-Speed 9343.20 samples/sec Loss 3.8229 LearningRate 0.0030 Epoch: 16 Global Step: 276070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:41,864-Speed 8970.97 samples/sec Loss 3.7969 LearningRate 0.0030 Epoch: 16 Global Step: 276080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:42,993-Speed 9072.85 samples/sec Loss 3.8760 LearningRate 0.0030 Epoch: 16 Global Step: 276090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:44,129-Speed 9018.43 samples/sec Loss 3.8804 LearningRate 0.0030 Epoch: 16 Global Step: 276100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:45,241-Speed 9212.73 samples/sec Loss 3.8648 LearningRate 0.0030 Epoch: 16 Global Step: 276110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:46,360-Speed 9157.56 samples/sec Loss 3.8286 LearningRate 0.0030 Epoch: 16 Global Step: 276120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:47,467-Speed 9251.63 samples/sec Loss 3.8293 LearningRate 0.0030 Epoch: 16 Global Step: 276130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:48,613-Speed 8943.08 samples/sec Loss 3.7843 LearningRate 0.0030 Epoch: 16 Global Step: 276140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:49,735-Speed 9126.19 samples/sec Loss 3.9446 LearningRate 0.0030 Epoch: 16 Global Step: 276150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:50,850-Speed 9197.79 samples/sec Loss 3.8376 LearningRate 0.0030 Epoch: 16 Global Step: 276160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:51,995-Speed 8944.96 samples/sec Loss 3.9197 LearningRate 0.0030 Epoch: 16 Global Step: 276170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:42:53,103-Speed 9249.41 samples/sec Loss 3.8887 LearningRate 0.0030 Epoch: 16 Global Step: 276180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:42:54,230-Speed 9086.51 samples/sec Loss 3.8539 LearningRate 0.0030 Epoch: 16 Global Step: 276190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:55,344-Speed 9203.84 samples/sec Loss 3.8291 LearningRate 0.0030 Epoch: 16 Global Step: 276200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:56,473-Speed 9072.30 samples/sec Loss 3.8613 LearningRate 0.0030 Epoch: 16 Global Step: 276210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:57,586-Speed 9212.05 samples/sec Loss 3.8475 LearningRate 0.0030 Epoch: 16 Global Step: 276220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:58,710-Speed 9114.31 samples/sec Loss 3.8404 LearningRate 0.0030 Epoch: 16 Global Step: 276230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:42:59,810-Speed 9313.87 samples/sec Loss 3.8607 LearningRate 0.0030 Epoch: 16 Global Step: 276240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:00,923-Speed 9205.08 samples/sec Loss 3.8636 LearningRate 0.0030 Epoch: 16 Global Step: 276250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:02,036-Speed 9200.68 samples/sec Loss 3.9067 LearningRate 0.0030 Epoch: 16 Global Step: 276260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:03,124-Speed 9422.38 samples/sec Loss 3.8274 LearningRate 0.0030 Epoch: 16 Global Step: 276270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:04,242-Speed 9161.73 samples/sec Loss 3.8447 LearningRate 0.0030 Epoch: 16 Global Step: 276280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:05,343-Speed 9304.51 samples/sec Loss 3.8856 LearningRate 0.0030 Epoch: 16 Global Step: 276290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:43:06,449-Speed 9262.97 samples/sec Loss 3.9106 LearningRate 0.0030 Epoch: 16 Global Step: 276300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:07,605-Speed 8864.21 samples/sec Loss 3.8496 LearningRate 0.0030 Epoch: 16 Global Step: 276310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:08,774-Speed 8769.67 samples/sec Loss 3.8906 LearningRate 0.0030 Epoch: 16 Global Step: 276320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:09,921-Speed 8935.55 samples/sec Loss 3.7813 LearningRate 0.0030 Epoch: 16 Global Step: 276330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:11,035-Speed 9193.81 samples/sec Loss 3.8892 LearningRate 0.0030 Epoch: 16 Global Step: 276340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:12,199-Speed 8804.65 samples/sec Loss 3.8660 LearningRate 0.0030 Epoch: 16 Global Step: 276350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:13,334-Speed 9028.32 samples/sec Loss 3.8384 LearningRate 0.0030 Epoch: 16 Global Step: 276360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:14,448-Speed 9196.50 samples/sec Loss 3.8313 LearningRate 0.0030 Epoch: 16 Global Step: 276370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:15,623-Speed 8718.76 samples/sec Loss 3.8231 LearningRate 0.0030 Epoch: 16 Global Step: 276380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:16,725-Speed 9292.56 samples/sec Loss 3.8279 LearningRate 0.0030 Epoch: 16 Global Step: 276390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:17,891-Speed 8788.52 samples/sec Loss 3.8992 LearningRate 0.0030 Epoch: 16 Global Step: 276400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:43:19,016-Speed 9107.75 samples/sec Loss 3.8861 LearningRate 0.0030 Epoch: 16 Global Step: 276410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:20,140-Speed 9117.07 samples/sec Loss 3.8958 LearningRate 0.0030 Epoch: 16 Global Step: 276420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:21,254-Speed 9200.62 samples/sec Loss 3.8519 LearningRate 0.0030 Epoch: 16 Global Step: 276430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:22,375-Speed 9140.10 samples/sec Loss 3.8265 LearningRate 0.0030 Epoch: 16 Global Step: 276440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:23,486-Speed 9223.34 samples/sec Loss 3.7962 LearningRate 0.0030 Epoch: 16 Global Step: 276450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:24,614-Speed 9084.84 samples/sec Loss 3.8219 LearningRate 0.0030 Epoch: 16 Global Step: 276460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:25,736-Speed 9133.02 samples/sec Loss 3.8983 LearningRate 0.0030 Epoch: 16 Global Step: 276470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:26,870-Speed 9028.50 samples/sec Loss 3.8412 LearningRate 0.0030 Epoch: 16 Global Step: 276480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:27,985-Speed 9198.97 samples/sec Loss 3.8743 LearningRate 0.0029 Epoch: 16 Global Step: 276490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:29,079-Speed 9361.03 samples/sec Loss 3.8158 LearningRate 0.0029 Epoch: 16 Global Step: 276500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:30,193-Speed 9196.41 samples/sec Loss 3.8395 LearningRate 0.0029 Epoch: 16 Global Step: 276510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:43:31,320-Speed 9095.49 samples/sec Loss 3.8130 LearningRate 0.0029 Epoch: 16 Global Step: 276520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:43:32,455-Speed 9024.99 samples/sec Loss 3.8060 LearningRate 0.0029 Epoch: 16 Global Step: 276530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:43:33,643-Speed 8621.28 samples/sec Loss 3.8196 LearningRate 0.0029 Epoch: 16 Global Step: 276540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:43:34,759-Speed 9184.11 samples/sec Loss 3.9228 LearningRate 0.0029 Epoch: 16 Global Step: 276550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:35,860-Speed 9302.95 samples/sec Loss 3.9363 LearningRate 0.0029 Epoch: 16 Global Step: 276560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:36,978-Speed 9167.93 samples/sec Loss 3.8277 LearningRate 0.0029 Epoch: 16 Global Step: 276570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:38,089-Speed 9221.59 samples/sec Loss 3.7477 LearningRate 0.0029 Epoch: 16 Global Step: 276580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:39,178-Speed 9411.88 samples/sec Loss 3.8751 LearningRate 0.0029 Epoch: 16 Global Step: 276590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:40,302-Speed 9119.07 samples/sec Loss 3.8817 LearningRate 0.0029 Epoch: 16 Global Step: 276600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:41,456-Speed 8878.87 samples/sec Loss 3.8556 LearningRate 0.0029 Epoch: 16 Global Step: 276610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:42,614-Speed 8848.24 samples/sec Loss 3.8854 LearningRate 0.0029 Epoch: 16 Global Step: 276620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:43,770-Speed 8856.14 samples/sec Loss 3.9248 LearningRate 0.0029 Epoch: 16 Global Step: 276630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:44,863-Speed 9373.20 samples/sec Loss 3.8979 LearningRate 0.0029 Epoch: 16 Global Step: 276640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:45,976-Speed 9208.73 samples/sec Loss 3.8688 LearningRate 0.0029 Epoch: 16 Global Step: 276650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:43:47,047-Speed 9565.00 samples/sec Loss 3.9117 LearningRate 0.0029 Epoch: 16 Global Step: 276660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:48,138-Speed 9393.71 samples/sec Loss 3.8760 LearningRate 0.0029 Epoch: 16 Global Step: 276670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:49,266-Speed 9084.23 samples/sec Loss 3.7514 LearningRate 0.0029 Epoch: 16 Global Step: 276680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:50,384-Speed 9167.28 samples/sec Loss 3.8472 LearningRate 0.0029 Epoch: 16 Global Step: 276690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:51,496-Speed 9209.08 samples/sec Loss 3.8227 LearningRate 0.0029 Epoch: 16 Global Step: 276700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:52,613-Speed 9173.96 samples/sec Loss 3.8535 LearningRate 0.0029 Epoch: 16 Global Step: 276710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:53,740-Speed 9093.03 samples/sec Loss 3.8580 LearningRate 0.0029 Epoch: 16 Global Step: 276720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:54,884-Speed 8955.84 samples/sec Loss 3.9338 LearningRate 0.0029 Epoch: 16 Global Step: 276730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:55,987-Speed 9286.96 samples/sec Loss 3.9013 LearningRate 0.0029 Epoch: 16 Global Step: 276740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:57,099-Speed 9221.39 samples/sec Loss 3.9397 LearningRate 0.0029 Epoch: 16 Global Step: 276750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:43:58,266-Speed 8777.48 samples/sec Loss 3.9321 LearningRate 0.0029 Epoch: 16 Global Step: 276760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:43:59,432-Speed 8785.73 samples/sec Loss 3.9172 LearningRate 0.0029 Epoch: 16 Global Step: 276770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:00,550-Speed 9167.52 samples/sec Loss 3.9418 LearningRate 0.0029 Epoch: 16 Global Step: 276780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:01,657-Speed 9255.59 samples/sec Loss 3.7881 LearningRate 0.0029 Epoch: 16 Global Step: 276790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:02,774-Speed 9171.14 samples/sec Loss 3.7801 LearningRate 0.0029 Epoch: 16 Global Step: 276800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:03,897-Speed 9124.62 samples/sec Loss 3.9263 LearningRate 0.0029 Epoch: 16 Global Step: 276810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:05,003-Speed 9261.22 samples/sec Loss 3.9041 LearningRate 0.0029 Epoch: 16 Global Step: 276820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:06,122-Speed 9155.26 samples/sec Loss 3.7799 LearningRate 0.0029 Epoch: 16 Global Step: 276830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:07,265-Speed 8969.85 samples/sec Loss 3.8685 LearningRate 0.0029 Epoch: 16 Global Step: 276840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:08,371-Speed 9260.26 samples/sec Loss 3.9082 LearningRate 0.0029 Epoch: 16 Global Step: 276850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:09,517-Speed 8942.44 samples/sec Loss 3.8659 LearningRate 0.0029 Epoch: 16 Global Step: 276860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:10,637-Speed 9151.44 samples/sec Loss 3.9080 LearningRate 0.0029 Epoch: 16 Global Step: 276870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:11,742-Speed 9270.35 samples/sec Loss 3.9062 LearningRate 0.0029 Epoch: 16 Global Step: 276880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:12,906-Speed 8800.29 samples/sec Loss 3.9100 LearningRate 0.0029 Epoch: 16 Global Step: 276890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:14,120-Speed 8438.82 samples/sec Loss 3.8160 LearningRate 0.0029 Epoch: 16 Global Step: 276900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:15,252-Speed 9054.98 samples/sec Loss 3.8958 LearningRate 0.0029 Epoch: 16 Global Step: 276910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:16,356-Speed 9284.71 samples/sec Loss 3.8747 LearningRate 0.0029 Epoch: 16 Global Step: 276920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:17,441-Speed 9445.31 samples/sec Loss 3.8045 LearningRate 0.0029 Epoch: 16 Global Step: 276930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:18,565-Speed 9113.97 samples/sec Loss 3.8087 LearningRate 0.0029 Epoch: 16 Global Step: 276940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:19,689-Speed 9110.65 samples/sec Loss 3.8545 LearningRate 0.0029 Epoch: 16 Global Step: 276950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:20,820-Speed 9062.77 samples/sec Loss 3.8884 LearningRate 0.0029 Epoch: 16 Global Step: 276960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:21,919-Speed 9324.85 samples/sec Loss 3.8907 LearningRate 0.0029 Epoch: 16 Global Step: 276970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:44:23,055-Speed 9022.36 samples/sec Loss 3.7369 LearningRate 0.0029 Epoch: 16 Global Step: 276980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:44:24,139-Speed 9451.34 samples/sec Loss 3.8020 LearningRate 0.0029 Epoch: 16 Global Step: 276990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:25,219-Speed 9484.04 samples/sec Loss 3.8229 LearningRate 0.0029 Epoch: 16 Global Step: 277000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:26,328-Speed 9236.91 samples/sec Loss 3.8003 LearningRate 0.0029 Epoch: 16 Global Step: 277010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:27,497-Speed 8765.09 samples/sec Loss 3.8594 LearningRate 0.0029 Epoch: 16 Global Step: 277020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:28,629-Speed 9055.49 samples/sec Loss 3.8837 LearningRate 0.0029 Epoch: 16 Global Step: 277030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:29,779-Speed 8907.33 samples/sec Loss 3.8633 LearningRate 0.0029 Epoch: 16 Global Step: 277040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:30,925-Speed 8939.50 samples/sec Loss 3.8683 LearningRate 0.0029 Epoch: 16 Global Step: 277050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:32,033-Speed 9250.42 samples/sec Loss 3.8296 LearningRate 0.0029 Epoch: 16 Global Step: 277060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:33,228-Speed 8575.15 samples/sec Loss 3.8338 LearningRate 0.0029 Epoch: 16 Global Step: 277070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:34,344-Speed 9176.90 samples/sec Loss 3.8477 LearningRate 0.0029 Epoch: 16 Global Step: 277080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:35,433-Speed 9409.97 samples/sec Loss 3.9437 LearningRate 0.0029 Epoch: 16 Global Step: 277090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:44:36,562-Speed 9077.38 samples/sec Loss 3.8060 LearningRate 0.0029 Epoch: 16 Global Step: 277100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:44:37,669-Speed 9252.02 samples/sec Loss 3.8505 LearningRate 0.0029 Epoch: 16 Global Step: 277110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:44:38,751-Speed 9471.42 samples/sec Loss 3.8706 LearningRate 0.0029 Epoch: 16 Global Step: 277120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:44:39,878-Speed 9096.84 samples/sec Loss 3.8023 LearningRate 0.0029 Epoch: 16 Global Step: 277130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:44:40,984-Speed 9264.97 samples/sec Loss 3.7835 LearningRate 0.0029 Epoch: 16 Global Step: 277140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:42,107-Speed 9118.75 samples/sec Loss 3.8545 LearningRate 0.0029 Epoch: 16 Global Step: 277150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:43,238-Speed 9060.92 samples/sec Loss 3.7971 LearningRate 0.0029 Epoch: 16 Global Step: 277160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:44,356-Speed 9159.94 samples/sec Loss 3.8944 LearningRate 0.0029 Epoch: 16 Global Step: 277170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:45,467-Speed 9227.28 samples/sec Loss 3.8902 LearningRate 0.0029 Epoch: 16 Global Step: 277180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:46,550-Speed 9457.41 samples/sec Loss 3.8077 LearningRate 0.0029 Epoch: 16 Global Step: 277190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:47,683-Speed 9042.60 samples/sec Loss 3.9161 LearningRate 0.0029 Epoch: 16 Global Step: 277200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:48,763-Speed 9483.81 samples/sec Loss 3.8175 LearningRate 0.0029 Epoch: 16 Global Step: 277210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:49,846-Speed 9462.24 samples/sec Loss 3.9816 LearningRate 0.0029 Epoch: 16 Global Step: 277220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:50,970-Speed 9116.52 samples/sec Loss 3.8385 LearningRate 0.0029 Epoch: 16 Global Step: 277230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:52,063-Speed 9381.56 samples/sec Loss 3.8494 LearningRate 0.0029 Epoch: 16 Global Step: 277240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:44:53,174-Speed 9220.91 samples/sec Loss 3.7822 LearningRate 0.0029 Epoch: 16 Global Step: 277250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:44:54,288-Speed 9199.00 samples/sec Loss 3.8272 LearningRate 0.0029 Epoch: 16 Global Step: 277260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:44:55,362-Speed 9537.44 samples/sec Loss 3.8118 LearningRate 0.0029 Epoch: 16 Global Step: 277270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:56,458-Speed 9354.17 samples/sec Loss 3.9055 LearningRate 0.0029 Epoch: 16 Global Step: 277280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:57,620-Speed 8818.02 samples/sec Loss 3.7689 LearningRate 0.0029 Epoch: 16 Global Step: 277290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:58,695-Speed 9529.01 samples/sec Loss 3.9199 LearningRate 0.0029 Epoch: 16 Global Step: 277300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:44:59,818-Speed 9122.36 samples/sec Loss 3.8724 LearningRate 0.0029 Epoch: 16 Global Step: 277310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:00,946-Speed 9083.20 samples/sec Loss 3.7559 LearningRate 0.0029 Epoch: 16 Global Step: 277320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:02,055-Speed 9236.42 samples/sec Loss 3.8743 LearningRate 0.0029 Epoch: 16 Global Step: 277330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:03,204-Speed 8916.16 samples/sec Loss 3.8485 LearningRate 0.0029 Epoch: 16 Global Step: 277340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:04,351-Speed 8937.72 samples/sec Loss 3.8929 LearningRate 0.0029 Epoch: 16 Global Step: 277350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:05,447-Speed 9347.57 samples/sec Loss 3.8418 LearningRate 0.0029 Epoch: 16 Global Step: 277360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:06,559-Speed 9207.47 samples/sec Loss 3.8444 LearningRate 0.0029 Epoch: 16 Global Step: 277370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:07,716-Speed 8854.26 samples/sec Loss 3.8827 LearningRate 0.0029 Epoch: 16 Global Step: 277380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:08,808-Speed 9389.00 samples/sec Loss 3.9862 LearningRate 0.0029 Epoch: 16 Global Step: 277390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:09,873-Speed 9618.74 samples/sec Loss 3.8082 LearningRate 0.0029 Epoch: 16 Global Step: 277400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:10,962-Speed 9415.34 samples/sec Loss 3.9270 LearningRate 0.0029 Epoch: 16 Global Step: 277410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:12,052-Speed 9402.75 samples/sec Loss 3.8100 LearningRate 0.0029 Epoch: 16 Global Step: 277420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:13,177-Speed 9102.61 samples/sec Loss 3.8739 LearningRate 0.0029 Epoch: 16 Global Step: 277430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:14,284-Speed 9256.94 samples/sec Loss 3.8697 LearningRate 0.0029 Epoch: 16 Global Step: 277440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:15,409-Speed 9108.10 samples/sec Loss 3.8222 LearningRate 0.0029 Epoch: 16 Global Step: 277450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:16,486-Speed 9509.79 samples/sec Loss 3.8072 LearningRate 0.0029 Epoch: 16 Global Step: 277460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:17,639-Speed 8887.48 samples/sec Loss 3.7892 LearningRate 0.0028 Epoch: 16 Global Step: 277470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:18,800-Speed 8822.07 samples/sec Loss 3.9717 LearningRate 0.0028 Epoch: 16 Global Step: 277480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:19,887-Speed 9429.67 samples/sec Loss 3.7863 LearningRate 0.0028 Epoch: 16 Global Step: 277490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:20,990-Speed 9295.08 samples/sec Loss 3.8707 LearningRate 0.0028 Epoch: 16 Global Step: 277500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:22,056-Speed 9610.35 samples/sec Loss 3.8064 LearningRate 0.0028 Epoch: 16 Global Step: 277510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:23,158-Speed 9293.94 samples/sec Loss 3.9172 LearningRate 0.0028 Epoch: 16 Global Step: 277520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:24,322-Speed 8804.53 samples/sec Loss 3.9159 LearningRate 0.0028 Epoch: 16 Global Step: 277530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:25,455-Speed 9041.51 samples/sec Loss 3.8668 LearningRate 0.0028 Epoch: 16 Global Step: 277540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:26,575-Speed 9147.85 samples/sec Loss 3.8859 LearningRate 0.0028 Epoch: 16 Global Step: 277550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:27,704-Speed 9079.38 samples/sec Loss 3.7802 LearningRate 0.0028 Epoch: 16 Global Step: 277560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:28,797-Speed 9373.26 samples/sec Loss 3.7751 LearningRate 0.0028 Epoch: 16 Global Step: 277570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:29,891-Speed 9372.03 samples/sec Loss 3.8248 LearningRate 0.0028 Epoch: 16 Global Step: 277580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:31,004-Speed 9200.50 samples/sec Loss 3.9549 LearningRate 0.0028 Epoch: 16 Global Step: 277590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:32,148-Speed 8960.84 samples/sec Loss 3.8326 LearningRate 0.0028 Epoch: 16 Global Step: 277600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:33,315-Speed 8779.58 samples/sec Loss 3.8507 LearningRate 0.0028 Epoch: 16 Global Step: 277610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:34,461-Speed 8935.85 samples/sec Loss 3.7920 LearningRate 0.0028 Epoch: 16 Global Step: 277620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:35,582-Speed 9141.74 samples/sec Loss 3.9320 LearningRate 0.0028 Epoch: 16 Global Step: 277630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:36,689-Speed 9256.31 samples/sec Loss 3.8977 LearningRate 0.0028 Epoch: 16 Global Step: 277640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:37,826-Speed 9009.82 samples/sec Loss 3.9020 LearningRate 0.0028 Epoch: 16 Global Step: 277650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:38,967-Speed 8983.32 samples/sec Loss 3.8225 LearningRate 0.0028 Epoch: 16 Global Step: 277660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:40,108-Speed 8981.24 samples/sec Loss 3.7819 LearningRate 0.0028 Epoch: 16 Global Step: 277670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:41,252-Speed 8955.95 samples/sec Loss 3.8912 LearningRate 0.0028 Epoch: 16 Global Step: 277680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:42,384-Speed 9052.54 samples/sec Loss 3.7971 LearningRate 0.0028 Epoch: 16 Global Step: 277690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:43,494-Speed 9226.71 samples/sec Loss 3.8845 LearningRate 0.0028 Epoch: 16 Global Step: 277700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:44,612-Speed 9165.47 samples/sec Loss 3.8607 LearningRate 0.0028 Epoch: 16 Global Step: 277710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:45,717-Speed 9272.23 samples/sec Loss 3.8225 LearningRate 0.0028 Epoch: 16 Global Step: 277720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:46,826-Speed 9237.28 samples/sec Loss 3.8838 LearningRate 0.0028 Epoch: 16 Global Step: 277730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:47,960-Speed 9038.02 samples/sec Loss 3.9236 LearningRate 0.0028 Epoch: 16 Global Step: 277740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:49,117-Speed 8852.85 samples/sec Loss 3.8389 LearningRate 0.0028 Epoch: 16 Global Step: 277750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:50,269-Speed 8894.82 samples/sec Loss 3.9061 LearningRate 0.0028 Epoch: 16 Global Step: 277760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:51,381-Speed 9213.93 samples/sec Loss 3.8108 LearningRate 0.0028 Epoch: 16 Global Step: 277770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:52,462-Speed 9483.36 samples/sec Loss 3.7779 LearningRate 0.0028 Epoch: 16 Global Step: 277780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:45:53,549-Speed 9428.73 samples/sec Loss 3.8741 LearningRate 0.0028 Epoch: 16 Global Step: 277790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:54,666-Speed 9168.72 samples/sec Loss 3.9656 LearningRate 0.0028 Epoch: 16 Global Step: 277800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:55,774-Speed 9249.68 samples/sec Loss 3.7664 LearningRate 0.0028 Epoch: 16 Global Step: 277810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:56,897-Speed 9119.17 samples/sec Loss 3.8777 LearningRate 0.0028 Epoch: 16 Global Step: 277820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:58,026-Speed 9090.34 samples/sec Loss 3.9030 LearningRate 0.0028 Epoch: 16 Global Step: 277830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:45:59,099-Speed 9542.19 samples/sec Loss 3.8825 LearningRate 0.0028 Epoch: 16 Global Step: 277840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:00,195-Speed 9352.66 samples/sec Loss 3.9551 LearningRate 0.0028 Epoch: 16 Global Step: 277850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:01,292-Speed 9332.16 samples/sec Loss 3.9473 LearningRate 0.0028 Epoch: 16 Global Step: 277860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:02,416-Speed 9121.51 samples/sec Loss 3.8924 LearningRate 0.0028 Epoch: 16 Global Step: 277870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:03,520-Speed 9277.37 samples/sec Loss 3.7503 LearningRate 0.0028 Epoch: 16 Global Step: 277880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:04,631-Speed 9220.58 samples/sec Loss 3.9364 LearningRate 0.0028 Epoch: 16 Global Step: 277890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:46:05,740-Speed 9242.14 samples/sec Loss 3.8767 LearningRate 0.0028 Epoch: 16 Global Step: 277900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:06,859-Speed 9151.78 samples/sec Loss 3.8362 LearningRate 0.0028 Epoch: 16 Global Step: 277910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:07,981-Speed 9133.08 samples/sec Loss 3.8698 LearningRate 0.0028 Epoch: 16 Global Step: 277920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:09,120-Speed 8996.27 samples/sec Loss 3.7804 LearningRate 0.0028 Epoch: 16 Global Step: 277930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:10,242-Speed 9140.96 samples/sec Loss 3.8460 LearningRate 0.0028 Epoch: 16 Global Step: 277940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:11,403-Speed 8822.43 samples/sec Loss 3.8214 LearningRate 0.0028 Epoch: 16 Global Step: 277950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:12,514-Speed 9222.16 samples/sec Loss 3.9226 LearningRate 0.0028 Epoch: 16 Global Step: 277960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:13,688-Speed 8724.39 samples/sec Loss 3.9034 LearningRate 0.0028 Epoch: 16 Global Step: 277970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:14,790-Speed 9293.39 samples/sec Loss 3.8631 LearningRate 0.0028 Epoch: 16 Global Step: 277980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:15,918-Speed 9088.62 samples/sec Loss 3.8888 LearningRate 0.0028 Epoch: 16 Global Step: 277990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:46:17,048-Speed 9066.61 samples/sec Loss 3.8054 LearningRate 0.0028 Epoch: 16 Global Step: 278000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:46:39,249-[lfw][278000]XNorm: 6.848850 Training: 2022-04-11 22:46:39,250-[lfw][278000]Accuracy-Flip: 0.99683+-0.00273 Training: 2022-04-11 22:46:39,250-[lfw][278000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:47:04,961-[cfp_fp][278000]XNorm: 5.966458 Training: 2022-04-11 22:47:04,961-[cfp_fp][278000]Accuracy-Flip: 0.97257+-0.00903 Training: 2022-04-11 22:47:04,962-[cfp_fp][278000]Accuracy-Highest: 0.97257 Training: 2022-04-11 22:47:26,951-[agedb_30][278000]XNorm: 6.658534 Training: 2022-04-11 22:47:26,951-[agedb_30][278000]Accuracy-Flip: 0.97200+-0.00933 Training: 2022-04-11 22:47:26,952-[agedb_30][278000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:47:28,054-Speed 144.21 samples/sec Loss 3.7723 LearningRate 0.0028 Epoch: 16 Global Step: 278010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:29,268-Speed 8441.06 samples/sec Loss 3.8825 LearningRate 0.0028 Epoch: 16 Global Step: 278020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:30,414-Speed 8941.32 samples/sec Loss 3.8407 LearningRate 0.0028 Epoch: 16 Global Step: 278030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:31,610-Speed 8570.92 samples/sec Loss 3.9064 LearningRate 0.0028 Epoch: 16 Global Step: 278040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:32,699-Speed 9408.00 samples/sec Loss 3.8645 LearningRate 0.0028 Epoch: 16 Global Step: 278050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:33,843-Speed 8955.03 samples/sec Loss 3.9286 LearningRate 0.0028 Epoch: 16 Global Step: 278060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:34,975-Speed 9050.97 samples/sec Loss 3.9169 LearningRate 0.0028 Epoch: 16 Global Step: 278070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:36,070-Speed 9358.30 samples/sec Loss 3.9268 LearningRate 0.0028 Epoch: 16 Global Step: 278080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:37,209-Speed 8997.08 samples/sec Loss 3.8778 LearningRate 0.0028 Epoch: 16 Global Step: 278090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:38,371-Speed 8818.11 samples/sec Loss 3.8168 LearningRate 0.0028 Epoch: 16 Global Step: 278100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:39,523-Speed 8889.83 samples/sec Loss 3.8440 LearningRate 0.0028 Epoch: 16 Global Step: 278110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:40,667-Speed 8959.28 samples/sec Loss 3.9051 LearningRate 0.0028 Epoch: 16 Global Step: 278120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:41,813-Speed 8943.66 samples/sec Loss 3.9339 LearningRate 0.0028 Epoch: 16 Global Step: 278130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:42,910-Speed 9336.81 samples/sec Loss 3.9532 LearningRate 0.0028 Epoch: 16 Global Step: 278140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:44,019-Speed 9238.09 samples/sec Loss 3.8056 LearningRate 0.0028 Epoch: 16 Global Step: 278150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:45,094-Speed 9532.71 samples/sec Loss 3.9150 LearningRate 0.0028 Epoch: 16 Global Step: 278160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:46,281-Speed 8627.95 samples/sec Loss 3.9068 LearningRate 0.0028 Epoch: 16 Global Step: 278170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:47,400-Speed 9163.21 samples/sec Loss 3.8579 LearningRate 0.0028 Epoch: 16 Global Step: 278180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:48,562-Speed 8820.25 samples/sec Loss 3.8147 LearningRate 0.0028 Epoch: 16 Global Step: 278190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:49,670-Speed 9243.11 samples/sec Loss 3.8299 LearningRate 0.0028 Epoch: 16 Global Step: 278200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:47:50,762-Speed 9385.91 samples/sec Loss 3.8971 LearningRate 0.0028 Epoch: 16 Global Step: 278210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:47:51,831-Speed 9580.71 samples/sec Loss 3.8485 LearningRate 0.0028 Epoch: 16 Global Step: 278220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:47:53,001-Speed 8754.92 samples/sec Loss 3.8087 LearningRate 0.0028 Epoch: 16 Global Step: 278230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:47:54,136-Speed 9047.46 samples/sec Loss 3.8380 LearningRate 0.0028 Epoch: 16 Global Step: 278240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:47:55,259-Speed 9122.58 samples/sec Loss 3.8512 LearningRate 0.0028 Epoch: 16 Global Step: 278250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:47:56,409-Speed 8911.38 samples/sec Loss 3.7963 LearningRate 0.0028 Epoch: 16 Global Step: 278260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:47:57,528-Speed 9161.76 samples/sec Loss 3.8882 LearningRate 0.0028 Epoch: 16 Global Step: 278270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:47:58,686-Speed 8845.08 samples/sec Loss 3.9334 LearningRate 0.0028 Epoch: 16 Global Step: 278280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:47:59,782-Speed 9351.31 samples/sec Loss 3.8240 LearningRate 0.0028 Epoch: 16 Global Step: 278290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:48:00,881-Speed 9328.20 samples/sec Loss 3.8226 LearningRate 0.0028 Epoch: 16 Global Step: 278300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:48:01,978-Speed 9331.38 samples/sec Loss 3.7778 LearningRate 0.0028 Epoch: 16 Global Step: 278310 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-04-11 22:48:03,053-Speed 9530.27 samples/sec Loss 3.8579 LearningRate 0.0028 Epoch: 16 Global Step: 278320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:04,179-Speed 9099.58 samples/sec Loss 3.7967 LearningRate 0.0028 Epoch: 16 Global Step: 278330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:05,346-Speed 8779.49 samples/sec Loss 3.8859 LearningRate 0.0028 Epoch: 16 Global Step: 278340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:06,410-Speed 9632.38 samples/sec Loss 3.8065 LearningRate 0.0028 Epoch: 16 Global Step: 278350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:07,523-Speed 9207.37 samples/sec Loss 3.8884 LearningRate 0.0028 Epoch: 16 Global Step: 278360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:08,650-Speed 9086.72 samples/sec Loss 3.8599 LearningRate 0.0028 Epoch: 16 Global Step: 278370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:09,761-Speed 9252.29 samples/sec Loss 3.7941 LearningRate 0.0028 Epoch: 16 Global Step: 278380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:10,880-Speed 9152.63 samples/sec Loss 3.8744 LearningRate 0.0028 Epoch: 16 Global Step: 278390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:12,006-Speed 9100.03 samples/sec Loss 3.8614 LearningRate 0.0028 Epoch: 16 Global Step: 278400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:13,159-Speed 8881.37 samples/sec Loss 3.8145 LearningRate 0.0028 Epoch: 16 Global Step: 278410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:14,305-Speed 8943.56 samples/sec Loss 3.8638 LearningRate 0.0028 Epoch: 16 Global Step: 278420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:15,429-Speed 9116.84 samples/sec Loss 3.8666 LearningRate 0.0028 Epoch: 16 Global Step: 278430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:16,533-Speed 9285.29 samples/sec Loss 3.8414 LearningRate 0.0028 Epoch: 16 Global Step: 278440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:17,626-Speed 9371.22 samples/sec Loss 3.8631 LearningRate 0.0028 Epoch: 16 Global Step: 278450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:18,749-Speed 9124.99 samples/sec Loss 3.8764 LearningRate 0.0028 Epoch: 16 Global Step: 278460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:19,950-Speed 8526.99 samples/sec Loss 3.8439 LearningRate 0.0027 Epoch: 16 Global Step: 278470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:21,067-Speed 9180.55 samples/sec Loss 3.8619 LearningRate 0.0027 Epoch: 16 Global Step: 278480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:22,198-Speed 9052.79 samples/sec Loss 3.9193 LearningRate 0.0027 Epoch: 16 Global Step: 278490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:23,328-Speed 9071.84 samples/sec Loss 3.8454 LearningRate 0.0027 Epoch: 16 Global Step: 278500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:24,454-Speed 9092.85 samples/sec Loss 3.8121 LearningRate 0.0027 Epoch: 16 Global Step: 278510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:25,570-Speed 9182.50 samples/sec Loss 3.8901 LearningRate 0.0027 Epoch: 16 Global Step: 278520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:26,672-Speed 9300.95 samples/sec Loss 3.8012 LearningRate 0.0027 Epoch: 16 Global Step: 278530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:27,769-Speed 9342.06 samples/sec Loss 3.8634 LearningRate 0.0027 Epoch: 16 Global Step: 278540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:28,896-Speed 9091.17 samples/sec Loss 3.8156 LearningRate 0.0027 Epoch: 16 Global Step: 278550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:30,010-Speed 9196.62 samples/sec Loss 3.8184 LearningRate 0.0027 Epoch: 16 Global Step: 278560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:31,198-Speed 8622.15 samples/sec Loss 3.8041 LearningRate 0.0027 Epoch: 16 Global Step: 278570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:32,332-Speed 9032.94 samples/sec Loss 3.9110 LearningRate 0.0027 Epoch: 16 Global Step: 278580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:33,503-Speed 8753.84 samples/sec Loss 3.9150 LearningRate 0.0027 Epoch: 16 Global Step: 278590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:34,608-Speed 9270.13 samples/sec Loss 3.8176 LearningRate 0.0027 Epoch: 16 Global Step: 278600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:35,706-Speed 9329.49 samples/sec Loss 3.8449 LearningRate 0.0027 Epoch: 16 Global Step: 278610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:36,852-Speed 8946.79 samples/sec Loss 3.9433 LearningRate 0.0027 Epoch: 16 Global Step: 278620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:48:37,950-Speed 9324.73 samples/sec Loss 3.8580 LearningRate 0.0027 Epoch: 16 Global Step: 278630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:48:39,051-Speed 9312.24 samples/sec Loss 3.8198 LearningRate 0.0027 Epoch: 16 Global Step: 278640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:40,168-Speed 9174.90 samples/sec Loss 3.8784 LearningRate 0.0027 Epoch: 16 Global Step: 278650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:41,288-Speed 9143.32 samples/sec Loss 3.9281 LearningRate 0.0027 Epoch: 16 Global Step: 278660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:42,426-Speed 9003.21 samples/sec Loss 3.8112 LearningRate 0.0027 Epoch: 16 Global Step: 278670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:43,532-Speed 9269.26 samples/sec Loss 3.9140 LearningRate 0.0027 Epoch: 16 Global Step: 278680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:44,701-Speed 8764.34 samples/sec Loss 3.8438 LearningRate 0.0027 Epoch: 16 Global Step: 278690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:45,833-Speed 9049.13 samples/sec Loss 3.8541 LearningRate 0.0027 Epoch: 16 Global Step: 278700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:46,948-Speed 9189.66 samples/sec Loss 3.8849 LearningRate 0.0027 Epoch: 16 Global Step: 278710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:48,048-Speed 9314.70 samples/sec Loss 3.8132 LearningRate 0.0027 Epoch: 16 Global Step: 278720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:49,280-Speed 8312.13 samples/sec Loss 3.8819 LearningRate 0.0027 Epoch: 16 Global Step: 278730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:50,425-Speed 8953.28 samples/sec Loss 3.8791 LearningRate 0.0027 Epoch: 16 Global Step: 278740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:48:51,498-Speed 9553.39 samples/sec Loss 3.7725 LearningRate 0.0027 Epoch: 16 Global Step: 278750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:52,620-Speed 9128.89 samples/sec Loss 3.8849 LearningRate 0.0027 Epoch: 16 Global Step: 278760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:53,800-Speed 8686.37 samples/sec Loss 3.8748 LearningRate 0.0027 Epoch: 16 Global Step: 278770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:54,920-Speed 9147.93 samples/sec Loss 3.8833 LearningRate 0.0027 Epoch: 16 Global Step: 278780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:56,034-Speed 9280.12 samples/sec Loss 3.9054 LearningRate 0.0027 Epoch: 16 Global Step: 278790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:57,158-Speed 9116.82 samples/sec Loss 3.8443 LearningRate 0.0027 Epoch: 16 Global Step: 278800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:58,263-Speed 9272.45 samples/sec Loss 3.8358 LearningRate 0.0027 Epoch: 16 Global Step: 278810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:48:59,359-Speed 9354.10 samples/sec Loss 3.8386 LearningRate 0.0027 Epoch: 16 Global Step: 278820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:00,502-Speed 8958.78 samples/sec Loss 3.8789 LearningRate 0.0027 Epoch: 16 Global Step: 278830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:01,638-Speed 9021.12 samples/sec Loss 3.8866 LearningRate 0.0027 Epoch: 16 Global Step: 278840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:02,785-Speed 8936.34 samples/sec Loss 3.8971 LearningRate 0.0027 Epoch: 16 Global Step: 278850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:03,883-Speed 9330.53 samples/sec Loss 3.7465 LearningRate 0.0027 Epoch: 16 Global Step: 278860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:04,976-Speed 9370.95 samples/sec Loss 3.8586 LearningRate 0.0027 Epoch: 16 Global Step: 278870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:06,095-Speed 9157.84 samples/sec Loss 3.8195 LearningRate 0.0027 Epoch: 16 Global Step: 278880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:07,186-Speed 9395.18 samples/sec Loss 3.8284 LearningRate 0.0027 Epoch: 16 Global Step: 278890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:08,376-Speed 8607.38 samples/sec Loss 3.8906 LearningRate 0.0027 Epoch: 16 Global Step: 278900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:09,508-Speed 9047.36 samples/sec Loss 3.9743 LearningRate 0.0027 Epoch: 16 Global Step: 278910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:10,675-Speed 8784.76 samples/sec Loss 3.7706 LearningRate 0.0027 Epoch: 16 Global Step: 278920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:11,766-Speed 9390.45 samples/sec Loss 3.8594 LearningRate 0.0027 Epoch: 16 Global Step: 278930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:12,851-Speed 9438.85 samples/sec Loss 3.8247 LearningRate 0.0027 Epoch: 16 Global Step: 278940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:13,986-Speed 9031.57 samples/sec Loss 3.8968 LearningRate 0.0027 Epoch: 16 Global Step: 278950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:15,149-Speed 8808.70 samples/sec Loss 3.7846 LearningRate 0.0027 Epoch: 16 Global Step: 278960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:16,292-Speed 8967.64 samples/sec Loss 3.9067 LearningRate 0.0027 Epoch: 16 Global Step: 278970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:17,391-Speed 9322.08 samples/sec Loss 3.9084 LearningRate 0.0027 Epoch: 16 Global Step: 278980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:18,482-Speed 9389.49 samples/sec Loss 3.9220 LearningRate 0.0027 Epoch: 16 Global Step: 278990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:19,645-Speed 8812.32 samples/sec Loss 3.8377 LearningRate 0.0027 Epoch: 16 Global Step: 279000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:20,780-Speed 9027.88 samples/sec Loss 3.8248 LearningRate 0.0027 Epoch: 16 Global Step: 279010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:21,927-Speed 8931.56 samples/sec Loss 3.9016 LearningRate 0.0027 Epoch: 16 Global Step: 279020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:23,079-Speed 8895.26 samples/sec Loss 3.8974 LearningRate 0.0027 Epoch: 16 Global Step: 279030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:24,239-Speed 8828.86 samples/sec Loss 3.9229 LearningRate 0.0027 Epoch: 16 Global Step: 279040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:25,355-Speed 9183.94 samples/sec Loss 3.8674 LearningRate 0.0027 Epoch: 16 Global Step: 279050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:26,472-Speed 9173.69 samples/sec Loss 3.8639 LearningRate 0.0027 Epoch: 16 Global Step: 279060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:27,621-Speed 8917.55 samples/sec Loss 3.8432 LearningRate 0.0027 Epoch: 16 Global Step: 279070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:28,800-Speed 8691.27 samples/sec Loss 3.8740 LearningRate 0.0027 Epoch: 16 Global Step: 279080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:29,901-Speed 9306.11 samples/sec Loss 3.8102 LearningRate 0.0027 Epoch: 16 Global Step: 279090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:31,058-Speed 8857.16 samples/sec Loss 3.8819 LearningRate 0.0027 Epoch: 16 Global Step: 279100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:32,163-Speed 9273.96 samples/sec Loss 3.8327 LearningRate 0.0027 Epoch: 16 Global Step: 279110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:33,253-Speed 9399.96 samples/sec Loss 3.8685 LearningRate 0.0027 Epoch: 16 Global Step: 279120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:34,417-Speed 8799.26 samples/sec Loss 3.8477 LearningRate 0.0027 Epoch: 16 Global Step: 279130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:35,488-Speed 9569.76 samples/sec Loss 3.8864 LearningRate 0.0027 Epoch: 16 Global Step: 279140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:36,580-Speed 9381.52 samples/sec Loss 3.8775 LearningRate 0.0027 Epoch: 16 Global Step: 279150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:37,719-Speed 8996.29 samples/sec Loss 3.9232 LearningRate 0.0027 Epoch: 16 Global Step: 279160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:38,864-Speed 8948.72 samples/sec Loss 3.7978 LearningRate 0.0027 Epoch: 16 Global Step: 279170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:39,931-Speed 9604.88 samples/sec Loss 3.7514 LearningRate 0.0027 Epoch: 16 Global Step: 279180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:41,071-Speed 8984.68 samples/sec Loss 3.7954 LearningRate 0.0027 Epoch: 16 Global Step: 279190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:42,216-Speed 8945.63 samples/sec Loss 3.8708 LearningRate 0.0027 Epoch: 16 Global Step: 279200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:43,360-Speed 8958.85 samples/sec Loss 3.8243 LearningRate 0.0027 Epoch: 16 Global Step: 279210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:44,519-Speed 8839.80 samples/sec Loss 3.9650 LearningRate 0.0027 Epoch: 16 Global Step: 279220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:45,690-Speed 8751.91 samples/sec Loss 3.8401 LearningRate 0.0027 Epoch: 16 Global Step: 279230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:46,867-Speed 8702.75 samples/sec Loss 3.8320 LearningRate 0.0027 Epoch: 16 Global Step: 279240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:47,939-Speed 9563.16 samples/sec Loss 3.8894 LearningRate 0.0027 Epoch: 16 Global Step: 279250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:49,057-Speed 9163.80 samples/sec Loss 3.8801 LearningRate 0.0027 Epoch: 16 Global Step: 279260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:50,203-Speed 8937.94 samples/sec Loss 3.9228 LearningRate 0.0027 Epoch: 16 Global Step: 279270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:51,313-Speed 9228.18 samples/sec Loss 3.8634 LearningRate 0.0027 Epoch: 16 Global Step: 279280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:52,439-Speed 9102.90 samples/sec Loss 3.9243 LearningRate 0.0027 Epoch: 16 Global Step: 279290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:53,549-Speed 9234.02 samples/sec Loss 3.8806 LearningRate 0.0027 Epoch: 16 Global Step: 279300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:49:54,609-Speed 9661.85 samples/sec Loss 3.8855 LearningRate 0.0027 Epoch: 16 Global Step: 279310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:55,730-Speed 9144.22 samples/sec Loss 3.8269 LearningRate 0.0027 Epoch: 16 Global Step: 279320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:56,797-Speed 9597.80 samples/sec Loss 3.8339 LearningRate 0.0027 Epoch: 16 Global Step: 279330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:57,904-Speed 9261.96 samples/sec Loss 3.9001 LearningRate 0.0027 Epoch: 16 Global Step: 279340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:49:59,054-Speed 8910.00 samples/sec Loss 3.9057 LearningRate 0.0027 Epoch: 16 Global Step: 279350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:00,177-Speed 9117.88 samples/sec Loss 3.8388 LearningRate 0.0027 Epoch: 16 Global Step: 279360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:01,333-Speed 8868.17 samples/sec Loss 3.8129 LearningRate 0.0027 Epoch: 16 Global Step: 279370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:02,509-Speed 8711.64 samples/sec Loss 3.8915 LearningRate 0.0027 Epoch: 16 Global Step: 279380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:03,635-Speed 9097.48 samples/sec Loss 3.8994 LearningRate 0.0027 Epoch: 16 Global Step: 279390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:04,780-Speed 8949.18 samples/sec Loss 3.8185 LearningRate 0.0027 Epoch: 16 Global Step: 279400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:05,918-Speed 9011.43 samples/sec Loss 3.8216 LearningRate 0.0027 Epoch: 16 Global Step: 279410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:50:07,028-Speed 9229.60 samples/sec Loss 3.9316 LearningRate 0.0027 Epoch: 16 Global Step: 279420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:50:08,170-Speed 8967.96 samples/sec Loss 3.8034 LearningRate 0.0027 Epoch: 16 Global Step: 279430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:09,255-Speed 9443.69 samples/sec Loss 3.7882 LearningRate 0.0027 Epoch: 16 Global Step: 279440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:10,382-Speed 9090.87 samples/sec Loss 3.8900 LearningRate 0.0027 Epoch: 16 Global Step: 279450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:11,495-Speed 9203.07 samples/sec Loss 3.7639 LearningRate 0.0027 Epoch: 16 Global Step: 279460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:12,631-Speed 9024.59 samples/sec Loss 3.8464 LearningRate 0.0027 Epoch: 16 Global Step: 279470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:13,732-Speed 9305.41 samples/sec Loss 3.9018 LearningRate 0.0026 Epoch: 16 Global Step: 279480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:14,852-Speed 9148.34 samples/sec Loss 3.8232 LearningRate 0.0026 Epoch: 16 Global Step: 279490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:15,955-Speed 9291.19 samples/sec Loss 3.9188 LearningRate 0.0026 Epoch: 16 Global Step: 279500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:17,095-Speed 8984.10 samples/sec Loss 3.8832 LearningRate 0.0026 Epoch: 16 Global Step: 279510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:18,212-Speed 9169.52 samples/sec Loss 3.9190 LearningRate 0.0026 Epoch: 16 Global Step: 279520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:19,339-Speed 9091.21 samples/sec Loss 3.8861 LearningRate 0.0026 Epoch: 16 Global Step: 279530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:50:20,526-Speed 8629.76 samples/sec Loss 3.8185 LearningRate 0.0026 Epoch: 16 Global Step: 279540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:50:21,623-Speed 9340.15 samples/sec Loss 3.8191 LearningRate 0.0026 Epoch: 16 Global Step: 279550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:22,775-Speed 8894.35 samples/sec Loss 3.8325 LearningRate 0.0026 Epoch: 16 Global Step: 279560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:23,902-Speed 9098.85 samples/sec Loss 3.8494 LearningRate 0.0026 Epoch: 16 Global Step: 279570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:25,011-Speed 9233.21 samples/sec Loss 3.8579 LearningRate 0.0026 Epoch: 16 Global Step: 279580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:26,137-Speed 9106.62 samples/sec Loss 3.8518 LearningRate 0.0026 Epoch: 16 Global Step: 279590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:27,313-Speed 8711.30 samples/sec Loss 3.8431 LearningRate 0.0026 Epoch: 16 Global Step: 279600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:28,458-Speed 8949.79 samples/sec Loss 3.8915 LearningRate 0.0026 Epoch: 16 Global Step: 279610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:29,559-Speed 9307.87 samples/sec Loss 3.8941 LearningRate 0.0026 Epoch: 16 Global Step: 279620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:30,684-Speed 9105.55 samples/sec Loss 3.9456 LearningRate 0.0026 Epoch: 16 Global Step: 279630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:31,781-Speed 9341.42 samples/sec Loss 3.8789 LearningRate 0.0026 Epoch: 16 Global Step: 279640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:32,879-Speed 9332.04 samples/sec Loss 3.8714 LearningRate 0.0026 Epoch: 16 Global Step: 279650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:50:33,969-Speed 9397.39 samples/sec Loss 3.8855 LearningRate 0.0026 Epoch: 16 Global Step: 279660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:35,091-Speed 9133.24 samples/sec Loss 3.8501 LearningRate 0.0026 Epoch: 16 Global Step: 279670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:36,168-Speed 9518.82 samples/sec Loss 3.7929 LearningRate 0.0026 Epoch: 16 Global Step: 279680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:37,280-Speed 9206.54 samples/sec Loss 3.8379 LearningRate 0.0026 Epoch: 16 Global Step: 279690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:38,380-Speed 9317.01 samples/sec Loss 3.8442 LearningRate 0.0026 Epoch: 16 Global Step: 279700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:39,484-Speed 9280.10 samples/sec Loss 3.9168 LearningRate 0.0026 Epoch: 16 Global Step: 279710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:40,634-Speed 8910.64 samples/sec Loss 3.8904 LearningRate 0.0026 Epoch: 16 Global Step: 279720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:41,761-Speed 9091.87 samples/sec Loss 3.7821 LearningRate 0.0026 Epoch: 16 Global Step: 279730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:42,890-Speed 9072.75 samples/sec Loss 3.8844 LearningRate 0.0026 Epoch: 16 Global Step: 279740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:43,987-Speed 9345.94 samples/sec Loss 3.8072 LearningRate 0.0026 Epoch: 16 Global Step: 279750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:45,061-Speed 9543.57 samples/sec Loss 3.9171 LearningRate 0.0026 Epoch: 16 Global Step: 279760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:50:46,125-Speed 9631.19 samples/sec Loss 3.8636 LearningRate 0.0026 Epoch: 16 Global Step: 279770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:50:47,192-Speed 9596.31 samples/sec Loss 3.8298 LearningRate 0.0026 Epoch: 16 Global Step: 279780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:50:48,295-Speed 9288.10 samples/sec Loss 3.8932 LearningRate 0.0026 Epoch: 16 Global Step: 279790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:49,421-Speed 9104.12 samples/sec Loss 3.7885 LearningRate 0.0026 Epoch: 16 Global Step: 279800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:50,535-Speed 9195.84 samples/sec Loss 3.8338 LearningRate 0.0026 Epoch: 16 Global Step: 279810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:51,663-Speed 9082.56 samples/sec Loss 3.9611 LearningRate 0.0026 Epoch: 16 Global Step: 279820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:52,809-Speed 8939.59 samples/sec Loss 3.8701 LearningRate 0.0026 Epoch: 16 Global Step: 279830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:53,935-Speed 9096.84 samples/sec Loss 3.9276 LearningRate 0.0026 Epoch: 16 Global Step: 279840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:55,087-Speed 8894.89 samples/sec Loss 3.8949 LearningRate 0.0026 Epoch: 16 Global Step: 279850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:56,252-Speed 8797.34 samples/sec Loss 3.9652 LearningRate 0.0026 Epoch: 16 Global Step: 279860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:57,404-Speed 8890.71 samples/sec Loss 3.9459 LearningRate 0.0026 Epoch: 16 Global Step: 279870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:58,560-Speed 8865.50 samples/sec Loss 3.8066 LearningRate 0.0026 Epoch: 16 Global Step: 279880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:50:59,675-Speed 9188.15 samples/sec Loss 3.8440 LearningRate 0.0026 Epoch: 16 Global Step: 279890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:51:00,790-Speed 9192.63 samples/sec Loss 3.8461 LearningRate 0.0026 Epoch: 16 Global Step: 279900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:01,920-Speed 9065.20 samples/sec Loss 3.8369 LearningRate 0.0026 Epoch: 16 Global Step: 279910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:02,997-Speed 9520.57 samples/sec Loss 3.8206 LearningRate 0.0026 Epoch: 16 Global Step: 279920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:04,104-Speed 9256.19 samples/sec Loss 3.9402 LearningRate 0.0026 Epoch: 16 Global Step: 279930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:05,170-Speed 9616.99 samples/sec Loss 3.8907 LearningRate 0.0026 Epoch: 16 Global Step: 279940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:06,281-Speed 9214.75 samples/sec Loss 3.8892 LearningRate 0.0026 Epoch: 16 Global Step: 279950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:07,397-Speed 9181.21 samples/sec Loss 3.8510 LearningRate 0.0026 Epoch: 16 Global Step: 279960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:08,510-Speed 9206.31 samples/sec Loss 3.8475 LearningRate 0.0026 Epoch: 16 Global Step: 279970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:09,630-Speed 9147.41 samples/sec Loss 3.8837 LearningRate 0.0026 Epoch: 16 Global Step: 279980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:10,741-Speed 9219.79 samples/sec Loss 3.8107 LearningRate 0.0026 Epoch: 16 Global Step: 279990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:11,849-Speed 9248.02 samples/sec Loss 3.8380 LearningRate 0.0026 Epoch: 16 Global Step: 280000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:51:33,704-[lfw][280000]XNorm: 6.746896 Training: 2022-04-11 22:51:33,705-[lfw][280000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-04-11 22:51:33,705-[lfw][280000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:51:58,993-[cfp_fp][280000]XNorm: 5.880494 Training: 2022-04-11 22:51:58,994-[cfp_fp][280000]Accuracy-Flip: 0.97257+-0.00810 Training: 2022-04-11 22:51:58,994-[cfp_fp][280000]Accuracy-Highest: 0.97257 Training: 2022-04-11 22:52:20,864-[agedb_30][280000]XNorm: 6.566937 Training: 2022-04-11 22:52:20,865-[agedb_30][280000]Accuracy-Flip: 0.97300+-0.00912 Training: 2022-04-11 22:52:20,865-[agedb_30][280000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:52:21,997-Speed 145.98 samples/sec Loss 3.7978 LearningRate 0.0026 Epoch: 16 Global Step: 280010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:23,074-Speed 9509.78 samples/sec Loss 3.9451 LearningRate 0.0026 Epoch: 16 Global Step: 280020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:24,216-Speed 8971.31 samples/sec Loss 3.8587 LearningRate 0.0026 Epoch: 16 Global Step: 280030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:25,330-Speed 9197.89 samples/sec Loss 3.9285 LearningRate 0.0026 Epoch: 16 Global Step: 280040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:26,391-Speed 9657.28 samples/sec Loss 3.8831 LearningRate 0.0026 Epoch: 16 Global Step: 280050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:27,510-Speed 9159.15 samples/sec Loss 3.8133 LearningRate 0.0026 Epoch: 16 Global Step: 280060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:28,622-Speed 9215.43 samples/sec Loss 3.9250 LearningRate 0.0026 Epoch: 16 Global Step: 280070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:29,737-Speed 9180.72 samples/sec Loss 3.8401 LearningRate 0.0026 Epoch: 16 Global Step: 280080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:30,859-Speed 9133.96 samples/sec Loss 3.8880 LearningRate 0.0026 Epoch: 16 Global Step: 280090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:32,006-Speed 8931.77 samples/sec Loss 3.8693 LearningRate 0.0026 Epoch: 16 Global Step: 280100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:52:33,136-Speed 9074.94 samples/sec Loss 3.9166 LearningRate 0.0026 Epoch: 16 Global Step: 280110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:52:34,265-Speed 9069.87 samples/sec Loss 3.8419 LearningRate 0.0026 Epoch: 16 Global Step: 280120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:35,368-Speed 9288.82 samples/sec Loss 3.8143 LearningRate 0.0026 Epoch: 16 Global Step: 280130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:36,482-Speed 9201.23 samples/sec Loss 3.8493 LearningRate 0.0026 Epoch: 16 Global Step: 280140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:37,645-Speed 8810.88 samples/sec Loss 3.8034 LearningRate 0.0026 Epoch: 16 Global Step: 280150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:38,770-Speed 9108.68 samples/sec Loss 3.8317 LearningRate 0.0026 Epoch: 16 Global Step: 280160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:39,872-Speed 9297.31 samples/sec Loss 3.9069 LearningRate 0.0026 Epoch: 16 Global Step: 280170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:40,959-Speed 9426.61 samples/sec Loss 3.8715 LearningRate 0.0026 Epoch: 16 Global Step: 280180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:42,111-Speed 8889.70 samples/sec Loss 3.9005 LearningRate 0.0026 Epoch: 16 Global Step: 280190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:43,241-Speed 9067.83 samples/sec Loss 3.8460 LearningRate 0.0026 Epoch: 16 Global Step: 280200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:44,430-Speed 8618.68 samples/sec Loss 3.7793 LearningRate 0.0026 Epoch: 16 Global Step: 280210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:45,549-Speed 9158.13 samples/sec Loss 3.8736 LearningRate 0.0026 Epoch: 16 Global Step: 280220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:52:46,690-Speed 8979.02 samples/sec Loss 3.8489 LearningRate 0.0026 Epoch: 16 Global Step: 280230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:52:47,864-Speed 8727.74 samples/sec Loss 3.7772 LearningRate 0.0026 Epoch: 16 Global Step: 280240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:52:48,994-Speed 9066.96 samples/sec Loss 3.8944 LearningRate 0.0026 Epoch: 16 Global Step: 280250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:52:50,161-Speed 8777.98 samples/sec Loss 3.8884 LearningRate 0.0026 Epoch: 16 Global Step: 280260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:51,296-Speed 9030.42 samples/sec Loss 3.8188 LearningRate 0.0026 Epoch: 16 Global Step: 280270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:52,434-Speed 9000.56 samples/sec Loss 3.9139 LearningRate 0.0026 Epoch: 16 Global Step: 280280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:53,577-Speed 8962.17 samples/sec Loss 3.8727 LearningRate 0.0026 Epoch: 16 Global Step: 280290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:54,659-Speed 9475.40 samples/sec Loss 3.8363 LearningRate 0.0026 Epoch: 16 Global Step: 280300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:55,766-Speed 9249.47 samples/sec Loss 3.8459 LearningRate 0.0026 Epoch: 16 Global Step: 280310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:56,936-Speed 8763.79 samples/sec Loss 3.8016 LearningRate 0.0026 Epoch: 16 Global Step: 280320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:58,060-Speed 9114.32 samples/sec Loss 3.8008 LearningRate 0.0026 Epoch: 16 Global Step: 280330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:52:59,185-Speed 9104.72 samples/sec Loss 3.8444 LearningRate 0.0026 Epoch: 16 Global Step: 280340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:00,301-Speed 9181.25 samples/sec Loss 3.8954 LearningRate 0.0026 Epoch: 16 Global Step: 280350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:01,430-Speed 9075.65 samples/sec Loss 3.8831 LearningRate 0.0026 Epoch: 16 Global Step: 280360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:53:02,545-Speed 9184.86 samples/sec Loss 3.9190 LearningRate 0.0026 Epoch: 16 Global Step: 280370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:53:03,664-Speed 9164.15 samples/sec Loss 3.8478 LearningRate 0.0026 Epoch: 16 Global Step: 280380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:04,768-Speed 9280.38 samples/sec Loss 3.9000 LearningRate 0.0026 Epoch: 16 Global Step: 280390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:05,882-Speed 9193.31 samples/sec Loss 3.8774 LearningRate 0.0026 Epoch: 16 Global Step: 280400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:07,019-Speed 9018.72 samples/sec Loss 3.8811 LearningRate 0.0026 Epoch: 16 Global Step: 280410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:08,116-Speed 9338.18 samples/sec Loss 3.8224 LearningRate 0.0026 Epoch: 16 Global Step: 280420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:09,281-Speed 8793.69 samples/sec Loss 3.8565 LearningRate 0.0026 Epoch: 16 Global Step: 280430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:10,382-Speed 9300.90 samples/sec Loss 3.8914 LearningRate 0.0026 Epoch: 16 Global Step: 280440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:11,520-Speed 9005.90 samples/sec Loss 3.8421 LearningRate 0.0026 Epoch: 16 Global Step: 280450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:12,644-Speed 9117.00 samples/sec Loss 3.7619 LearningRate 0.0026 Epoch: 16 Global Step: 280460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:13,778-Speed 9034.12 samples/sec Loss 3.8376 LearningRate 0.0026 Epoch: 16 Global Step: 280470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:14,862-Speed 9447.04 samples/sec Loss 3.8363 LearningRate 0.0026 Epoch: 16 Global Step: 280480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:53:15,993-Speed 9065.25 samples/sec Loss 3.9531 LearningRate 0.0026 Epoch: 16 Global Step: 280490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:53:17,097-Speed 9282.74 samples/sec Loss 3.7872 LearningRate 0.0026 Epoch: 16 Global Step: 280500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:18,240-Speed 8960.23 samples/sec Loss 3.8982 LearningRate 0.0026 Epoch: 16 Global Step: 280510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:19,315-Speed 9529.39 samples/sec Loss 3.8540 LearningRate 0.0025 Epoch: 16 Global Step: 280520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:20,401-Speed 9433.99 samples/sec Loss 3.8964 LearningRate 0.0025 Epoch: 16 Global Step: 280530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:21,511-Speed 9231.38 samples/sec Loss 3.8728 LearningRate 0.0025 Epoch: 16 Global Step: 280540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:22,623-Speed 9210.47 samples/sec Loss 3.8580 LearningRate 0.0025 Epoch: 16 Global Step: 280550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:23,795-Speed 8747.51 samples/sec Loss 3.8657 LearningRate 0.0025 Epoch: 16 Global Step: 280560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:24,847-Speed 9735.16 samples/sec Loss 3.7513 LearningRate 0.0025 Epoch: 16 Global Step: 280570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:25,937-Speed 9407.20 samples/sec Loss 3.8767 LearningRate 0.0025 Epoch: 16 Global Step: 280580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:27,017-Speed 9485.34 samples/sec Loss 3.9063 LearningRate 0.0025 Epoch: 16 Global Step: 280590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:28,140-Speed 9129.07 samples/sec Loss 3.8690 LearningRate 0.0025 Epoch: 16 Global Step: 280600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:53:29,288-Speed 8923.57 samples/sec Loss 3.9215 LearningRate 0.0025 Epoch: 16 Global Step: 280610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:53:30,397-Speed 9231.41 samples/sec Loss 3.8378 LearningRate 0.0025 Epoch: 16 Global Step: 280620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:53:31,502-Speed 9277.81 samples/sec Loss 3.8445 LearningRate 0.0025 Epoch: 16 Global Step: 280630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:32,670-Speed 8769.21 samples/sec Loss 3.8493 LearningRate 0.0025 Epoch: 16 Global Step: 280640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:33,828-Speed 8852.82 samples/sec Loss 3.8080 LearningRate 0.0025 Epoch: 16 Global Step: 280650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:34,914-Speed 9434.10 samples/sec Loss 3.8572 LearningRate 0.0025 Epoch: 16 Global Step: 280660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:36,073-Speed 8837.56 samples/sec Loss 3.8181 LearningRate 0.0025 Epoch: 16 Global Step: 280670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:37,239-Speed 8784.92 samples/sec Loss 3.8759 LearningRate 0.0025 Epoch: 16 Global Step: 280680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:38,404-Speed 8794.63 samples/sec Loss 3.8247 LearningRate 0.0025 Epoch: 16 Global Step: 280690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:39,549-Speed 8947.46 samples/sec Loss 3.9088 LearningRate 0.0025 Epoch: 16 Global Step: 280700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:40,666-Speed 9175.62 samples/sec Loss 3.8445 LearningRate 0.0025 Epoch: 16 Global Step: 280710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:41,808-Speed 8964.82 samples/sec Loss 3.8550 LearningRate 0.0025 Epoch: 16 Global Step: 280720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:42,950-Speed 8975.52 samples/sec Loss 3.8765 LearningRate 0.0025 Epoch: 16 Global Step: 280730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:53:44,057-Speed 9264.82 samples/sec Loss 3.8362 LearningRate 0.0025 Epoch: 16 Global Step: 280740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:45,140-Speed 9458.76 samples/sec Loss 3.7997 LearningRate 0.0025 Epoch: 16 Global Step: 280750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:46,220-Speed 9483.79 samples/sec Loss 3.8337 LearningRate 0.0025 Epoch: 16 Global Step: 280760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:47,319-Speed 9324.01 samples/sec Loss 3.9038 LearningRate 0.0025 Epoch: 16 Global Step: 280770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:48,443-Speed 9116.65 samples/sec Loss 3.8725 LearningRate 0.0025 Epoch: 16 Global Step: 280780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:49,546-Speed 9290.16 samples/sec Loss 3.8552 LearningRate 0.0025 Epoch: 16 Global Step: 280790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:50,654-Speed 9241.75 samples/sec Loss 3.8191 LearningRate 0.0025 Epoch: 16 Global Step: 280800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:51,792-Speed 9007.51 samples/sec Loss 3.8289 LearningRate 0.0025 Epoch: 16 Global Step: 280810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:52,868-Speed 9517.54 samples/sec Loss 3.8162 LearningRate 0.0025 Epoch: 16 Global Step: 280820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:53,978-Speed 9232.02 samples/sec Loss 3.8988 LearningRate 0.0025 Epoch: 16 Global Step: 280830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:55,079-Speed 9306.02 samples/sec Loss 3.7335 LearningRate 0.0025 Epoch: 16 Global Step: 280840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:53:56,214-Speed 9027.07 samples/sec Loss 3.8624 LearningRate 0.0025 Epoch: 16 Global Step: 280850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:53:57,284-Speed 9576.34 samples/sec Loss 3.7994 LearningRate 0.0025 Epoch: 16 Global Step: 280860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:58,428-Speed 8959.43 samples/sec Loss 3.8747 LearningRate 0.0025 Epoch: 16 Global Step: 280870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:53:59,565-Speed 9009.78 samples/sec Loss 3.8227 LearningRate 0.0025 Epoch: 16 Global Step: 280880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:00,742-Speed 8702.72 samples/sec Loss 3.8530 LearningRate 0.0025 Epoch: 16 Global Step: 280890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:01,859-Speed 9180.55 samples/sec Loss 3.9304 LearningRate 0.0025 Epoch: 16 Global Step: 280900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:02,967-Speed 9242.26 samples/sec Loss 3.8847 LearningRate 0.0025 Epoch: 16 Global Step: 280910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:04,037-Speed 9583.37 samples/sec Loss 3.8806 LearningRate 0.0025 Epoch: 16 Global Step: 280920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:05,162-Speed 9103.98 samples/sec Loss 3.8829 LearningRate 0.0025 Epoch: 16 Global Step: 280930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:06,243-Speed 9477.89 samples/sec Loss 3.8058 LearningRate 0.0025 Epoch: 16 Global Step: 280940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:07,290-Speed 9782.91 samples/sec Loss 3.8852 LearningRate 0.0025 Epoch: 16 Global Step: 280950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:08,366-Speed 9524.05 samples/sec Loss 3.9204 LearningRate 0.0025 Epoch: 16 Global Step: 280960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:54:09,490-Speed 9120.20 samples/sec Loss 3.9090 LearningRate 0.0025 Epoch: 16 Global Step: 280970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:54:10,588-Speed 9330.63 samples/sec Loss 3.8813 LearningRate 0.0025 Epoch: 16 Global Step: 280980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:11,693-Speed 9269.93 samples/sec Loss 3.8646 LearningRate 0.0025 Epoch: 16 Global Step: 280990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:12,855-Speed 8815.58 samples/sec Loss 3.9056 LearningRate 0.0025 Epoch: 16 Global Step: 281000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:14,039-Speed 8653.41 samples/sec Loss 3.8489 LearningRate 0.0025 Epoch: 16 Global Step: 281010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:15,174-Speed 9026.16 samples/sec Loss 3.8180 LearningRate 0.0025 Epoch: 16 Global Step: 281020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:16,283-Speed 9241.50 samples/sec Loss 3.9262 LearningRate 0.0025 Epoch: 16 Global Step: 281030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:17,416-Speed 9043.73 samples/sec Loss 3.8324 LearningRate 0.0025 Epoch: 16 Global Step: 281040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:18,562-Speed 8940.12 samples/sec Loss 3.8289 LearningRate 0.0025 Epoch: 16 Global Step: 281050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:19,741-Speed 8689.42 samples/sec Loss 3.8427 LearningRate 0.0025 Epoch: 16 Global Step: 281060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:20,888-Speed 8938.26 samples/sec Loss 3.9191 LearningRate 0.0025 Epoch: 16 Global Step: 281070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:22,039-Speed 8901.75 samples/sec Loss 3.9460 LearningRate 0.0025 Epoch: 16 Global Step: 281080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:23,186-Speed 8928.15 samples/sec Loss 3.7916 LearningRate 0.0025 Epoch: 16 Global Step: 281090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:24,289-Speed 9288.42 samples/sec Loss 3.8188 LearningRate 0.0025 Epoch: 16 Global Step: 281100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:25,420-Speed 9059.16 samples/sec Loss 3.9484 LearningRate 0.0025 Epoch: 16 Global Step: 281110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:26,535-Speed 9195.04 samples/sec Loss 3.9037 LearningRate 0.0025 Epoch: 16 Global Step: 281120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:27,688-Speed 8882.12 samples/sec Loss 3.8678 LearningRate 0.0025 Epoch: 16 Global Step: 281130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:28,793-Speed 9272.27 samples/sec Loss 3.9190 LearningRate 0.0025 Epoch: 16 Global Step: 281140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:29,927-Speed 9039.16 samples/sec Loss 3.8223 LearningRate 0.0025 Epoch: 16 Global Step: 281150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:31,072-Speed 8944.82 samples/sec Loss 3.8788 LearningRate 0.0025 Epoch: 16 Global Step: 281160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:32,210-Speed 9000.80 samples/sec Loss 3.8287 LearningRate 0.0025 Epoch: 16 Global Step: 281170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:33,372-Speed 8823.64 samples/sec Loss 3.8292 LearningRate 0.0025 Epoch: 16 Global Step: 281180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:34,496-Speed 9113.91 samples/sec Loss 3.8026 LearningRate 0.0025 Epoch: 16 Global Step: 281190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:35,646-Speed 8910.20 samples/sec Loss 3.8397 LearningRate 0.0025 Epoch: 16 Global Step: 281200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:36,760-Speed 9193.18 samples/sec Loss 3.7647 LearningRate 0.0025 Epoch: 16 Global Step: 281210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:37,870-Speed 9240.02 samples/sec Loss 3.9202 LearningRate 0.0025 Epoch: 16 Global Step: 281220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:38,995-Speed 9105.61 samples/sec Loss 3.8526 LearningRate 0.0025 Epoch: 16 Global Step: 281230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:40,116-Speed 9135.78 samples/sec Loss 3.8064 LearningRate 0.0025 Epoch: 16 Global Step: 281240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:41,252-Speed 9025.11 samples/sec Loss 3.8898 LearningRate 0.0025 Epoch: 16 Global Step: 281250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:42,355-Speed 9280.63 samples/sec Loss 3.8825 LearningRate 0.0025 Epoch: 16 Global Step: 281260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:43,504-Speed 8919.84 samples/sec Loss 3.8339 LearningRate 0.0025 Epoch: 16 Global Step: 281270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:44,600-Speed 9351.91 samples/sec Loss 3.7654 LearningRate 0.0025 Epoch: 16 Global Step: 281280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:54:45,695-Speed 9356.78 samples/sec Loss 3.9499 LearningRate 0.0025 Epoch: 16 Global Step: 281290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:46,779-Speed 9452.31 samples/sec Loss 3.9079 LearningRate 0.0025 Epoch: 16 Global Step: 281300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:47,864-Speed 9439.74 samples/sec Loss 3.8083 LearningRate 0.0025 Epoch: 16 Global Step: 281310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:49,024-Speed 8831.33 samples/sec Loss 3.8728 LearningRate 0.0025 Epoch: 16 Global Step: 281320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:50,167-Speed 8963.94 samples/sec Loss 3.8351 LearningRate 0.0025 Epoch: 16 Global Step: 281330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:51,285-Speed 9166.90 samples/sec Loss 3.8134 LearningRate 0.0025 Epoch: 16 Global Step: 281340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:52,380-Speed 9351.54 samples/sec Loss 3.9478 LearningRate 0.0025 Epoch: 16 Global Step: 281350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:53,541-Speed 8832.13 samples/sec Loss 3.8064 LearningRate 0.0025 Epoch: 16 Global Step: 281360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:54,639-Speed 9330.04 samples/sec Loss 3.8782 LearningRate 0.0025 Epoch: 16 Global Step: 281370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:55,734-Speed 9354.59 samples/sec Loss 3.8909 LearningRate 0.0025 Epoch: 16 Global Step: 281380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:54:56,840-Speed 9267.22 samples/sec Loss 3.8336 LearningRate 0.0025 Epoch: 16 Global Step: 281390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:54:58,023-Speed 8664.26 samples/sec Loss 3.8825 LearningRate 0.0025 Epoch: 16 Global Step: 281400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:54:59,153-Speed 9070.74 samples/sec Loss 3.8557 LearningRate 0.0025 Epoch: 16 Global Step: 281410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:55:00,279-Speed 9092.69 samples/sec Loss 3.8516 LearningRate 0.0025 Epoch: 16 Global Step: 281420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:55:01,386-Speed 9258.44 samples/sec Loss 3.9245 LearningRate 0.0025 Epoch: 16 Global Step: 281430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:55:02,523-Speed 9012.36 samples/sec Loss 3.7993 LearningRate 0.0025 Epoch: 16 Global Step: 281440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:03,641-Speed 9166.66 samples/sec Loss 3.8850 LearningRate 0.0025 Epoch: 16 Global Step: 281450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:04,720-Speed 9501.54 samples/sec Loss 3.8305 LearningRate 0.0025 Epoch: 16 Global Step: 281460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:05,833-Speed 9203.33 samples/sec Loss 3.8825 LearningRate 0.0025 Epoch: 16 Global Step: 281470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:06,950-Speed 9172.69 samples/sec Loss 3.8697 LearningRate 0.0025 Epoch: 16 Global Step: 281480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:08,094-Speed 8953.61 samples/sec Loss 3.8879 LearningRate 0.0025 Epoch: 16 Global Step: 281490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:09,199-Speed 9269.31 samples/sec Loss 3.9524 LearningRate 0.0025 Epoch: 16 Global Step: 281500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:10,328-Speed 9074.55 samples/sec Loss 3.8828 LearningRate 0.0025 Epoch: 16 Global Step: 281510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:11,456-Speed 9082.43 samples/sec Loss 3.8447 LearningRate 0.0025 Epoch: 16 Global Step: 281520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:12,612-Speed 8864.01 samples/sec Loss 3.9068 LearningRate 0.0025 Epoch: 16 Global Step: 281530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:13,697-Speed 9441.92 samples/sec Loss 3.7837 LearningRate 0.0025 Epoch: 16 Global Step: 281540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:55:14,848-Speed 8902.89 samples/sec Loss 3.7684 LearningRate 0.0025 Epoch: 16 Global Step: 281550 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:55:15,989-Speed 8987.66 samples/sec Loss 3.9085 LearningRate 0.0025 Epoch: 16 Global Step: 281560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:17,094-Speed 9272.68 samples/sec Loss 3.8752 LearningRate 0.0024 Epoch: 16 Global Step: 281570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:18,161-Speed 9602.70 samples/sec Loss 3.8767 LearningRate 0.0024 Epoch: 16 Global Step: 281580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:19,304-Speed 8963.59 samples/sec Loss 3.8176 LearningRate 0.0024 Epoch: 16 Global Step: 281590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:20,417-Speed 9199.84 samples/sec Loss 3.8356 LearningRate 0.0024 Epoch: 16 Global Step: 281600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:21,573-Speed 8864.51 samples/sec Loss 3.7916 LearningRate 0.0024 Epoch: 16 Global Step: 281610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:22,670-Speed 9341.11 samples/sec Loss 3.9017 LearningRate 0.0024 Epoch: 16 Global Step: 281620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:23,828-Speed 8848.42 samples/sec Loss 3.8780 LearningRate 0.0024 Epoch: 16 Global Step: 281630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:24,982-Speed 8876.71 samples/sec Loss 3.8215 LearningRate 0.0024 Epoch: 16 Global Step: 281640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:26,115-Speed 9040.26 samples/sec Loss 3.8393 LearningRate 0.0024 Epoch: 16 Global Step: 281650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:27,256-Speed 8983.39 samples/sec Loss 3.9210 LearningRate 0.0024 Epoch: 16 Global Step: 281660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:55:28,358-Speed 9297.49 samples/sec Loss 3.8738 LearningRate 0.0024 Epoch: 16 Global Step: 281670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:55:29,466-Speed 9243.87 samples/sec Loss 3.9550 LearningRate 0.0024 Epoch: 16 Global Step: 281680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:55:30,583-Speed 9176.09 samples/sec Loss 3.8008 LearningRate 0.0024 Epoch: 16 Global Step: 281690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:55:31,699-Speed 9176.26 samples/sec Loss 3.9421 LearningRate 0.0024 Epoch: 16 Global Step: 281700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:32,821-Speed 9131.14 samples/sec Loss 3.9412 LearningRate 0.0024 Epoch: 16 Global Step: 281710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:33,929-Speed 9261.88 samples/sec Loss 3.7829 LearningRate 0.0024 Epoch: 16 Global Step: 281720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:35,025-Speed 9350.87 samples/sec Loss 3.8741 LearningRate 0.0024 Epoch: 16 Global Step: 281730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:36,912-Speed 5426.70 samples/sec Loss 3.8623 LearningRate 0.0024 Epoch: 16 Global Step: 281740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:38,025-Speed 9209.24 samples/sec Loss 3.9031 LearningRate 0.0024 Epoch: 16 Global Step: 281750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:39,156-Speed 9057.83 samples/sec Loss 3.7834 LearningRate 0.0024 Epoch: 16 Global Step: 281760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:40,271-Speed 9183.48 samples/sec Loss 3.8684 LearningRate 0.0024 Epoch: 16 Global Step: 281770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:55:41,455-Speed 8654.83 samples/sec Loss 3.9187 LearningRate 0.0024 Epoch: 16 Global Step: 281780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:55:42,565-Speed 9234.71 samples/sec Loss 3.8062 LearningRate 0.0024 Epoch: 16 Global Step: 281790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:55:43,719-Speed 8875.46 samples/sec Loss 3.9130 LearningRate 0.0024 Epoch: 16 Global Step: 281800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:55:44,836-Speed 9170.71 samples/sec Loss 3.8552 LearningRate 0.0024 Epoch: 16 Global Step: 281810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:55:45,922-Speed 9442.95 samples/sec Loss 3.8561 LearningRate 0.0024 Epoch: 16 Global Step: 281820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:55:47,044-Speed 9132.00 samples/sec Loss 3.8762 LearningRate 0.0024 Epoch: 16 Global Step: 281830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:55:48,159-Speed 9187.95 samples/sec Loss 3.8871 LearningRate 0.0024 Epoch: 16 Global Step: 281840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:55:49,238-Speed 9495.93 samples/sec Loss 3.8634 LearningRate 0.0024 Epoch: 16 Global Step: 281850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:55:50,361-Speed 9122.56 samples/sec Loss 3.7854 LearningRate 0.0024 Epoch: 16 Global Step: 281860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:55:51,441-Speed 9479.68 samples/sec Loss 3.9176 LearningRate 0.0024 Epoch: 16 Global Step: 281870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:52,547-Speed 9263.65 samples/sec Loss 3.8954 LearningRate 0.0024 Epoch: 16 Global Step: 281880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:53,679-Speed 9055.85 samples/sec Loss 3.7497 LearningRate 0.0024 Epoch: 16 Global Step: 281890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:54,793-Speed 9195.00 samples/sec Loss 3.8709 LearningRate 0.0024 Epoch: 16 Global Step: 281900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:55,876-Speed 9459.29 samples/sec Loss 3.9203 LearningRate 0.0024 Epoch: 16 Global Step: 281910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:57,019-Speed 8962.05 samples/sec Loss 3.7808 LearningRate 0.0024 Epoch: 16 Global Step: 281920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:58,127-Speed 9246.53 samples/sec Loss 3.8955 LearningRate 0.0024 Epoch: 16 Global Step: 281930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:55:59,239-Speed 9221.73 samples/sec Loss 3.8213 LearningRate 0.0024 Epoch: 16 Global Step: 281940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:56:00,355-Speed 9174.47 samples/sec Loss 3.8070 LearningRate 0.0024 Epoch: 16 Global Step: 281950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:56:01,476-Speed 9142.96 samples/sec Loss 3.8190 LearningRate 0.0024 Epoch: 16 Global Step: 281960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:56:02,599-Speed 9122.61 samples/sec Loss 3.9588 LearningRate 0.0024 Epoch: 16 Global Step: 281970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:56:03,675-Speed 9525.48 samples/sec Loss 3.8863 LearningRate 0.0024 Epoch: 16 Global Step: 281980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:56:04,776-Speed 9305.90 samples/sec Loss 3.8517 LearningRate 0.0024 Epoch: 16 Global Step: 281990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:56:05,896-Speed 9147.41 samples/sec Loss 3.8703 LearningRate 0.0024 Epoch: 16 Global Step: 282000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:56:27,725-[lfw][282000]XNorm: 6.857243 Training: 2022-04-11 22:56:27,725-[lfw][282000]Accuracy-Flip: 0.99733+-0.00309 Training: 2022-04-11 22:56:27,726-[lfw][282000]Accuracy-Highest: 0.99733 Training: 2022-04-11 22:56:52,884-[cfp_fp][282000]XNorm: 5.975030 Training: 2022-04-11 22:56:52,884-[cfp_fp][282000]Accuracy-Flip: 0.97386+-0.00919 Training: 2022-04-11 22:56:52,885-[cfp_fp][282000]Accuracy-Highest: 0.97386 Training: 2022-04-11 22:57:14,592-[agedb_30][282000]XNorm: 6.682816 Training: 2022-04-11 22:57:14,593-[agedb_30][282000]Accuracy-Flip: 0.97250+-0.00870 Training: 2022-04-11 22:57:14,593-[agedb_30][282000]Accuracy-Highest: 0.97350 Training: 2022-04-11 22:57:15,695-Speed 146.71 samples/sec Loss 3.8338 LearningRate 0.0024 Epoch: 16 Global Step: 282010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:16,815-Speed 9148.69 samples/sec Loss 3.8612 LearningRate 0.0024 Epoch: 16 Global Step: 282020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:17,945-Speed 9069.32 samples/sec Loss 3.8369 LearningRate 0.0024 Epoch: 16 Global Step: 282030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:19,064-Speed 9153.02 samples/sec Loss 3.8576 LearningRate 0.0024 Epoch: 16 Global Step: 282040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:20,187-Speed 9128.91 samples/sec Loss 3.8185 LearningRate 0.0024 Epoch: 16 Global Step: 282050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:21,272-Speed 9440.29 samples/sec Loss 3.9598 LearningRate 0.0024 Epoch: 16 Global Step: 282060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:22,409-Speed 9021.77 samples/sec Loss 3.8742 LearningRate 0.0024 Epoch: 16 Global Step: 282070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:23,487-Speed 9504.14 samples/sec Loss 3.8644 LearningRate 0.0024 Epoch: 16 Global Step: 282080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:57:24,594-Speed 9254.77 samples/sec Loss 3.8651 LearningRate 0.0024 Epoch: 16 Global Step: 282090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:57:25,695-Speed 9306.39 samples/sec Loss 3.7992 LearningRate 0.0024 Epoch: 16 Global Step: 282100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:57:26,790-Speed 9352.28 samples/sec Loss 3.8838 LearningRate 0.0024 Epoch: 16 Global Step: 282110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:27,888-Speed 9336.81 samples/sec Loss 3.8403 LearningRate 0.0024 Epoch: 16 Global Step: 282120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:29,070-Speed 8668.52 samples/sec Loss 3.7843 LearningRate 0.0024 Epoch: 16 Global Step: 282130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:30,190-Speed 9146.75 samples/sec Loss 3.8915 LearningRate 0.0024 Epoch: 16 Global Step: 282140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:31,327-Speed 9009.37 samples/sec Loss 3.8485 LearningRate 0.0024 Epoch: 16 Global Step: 282150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:32,450-Speed 9121.48 samples/sec Loss 3.8901 LearningRate 0.0024 Epoch: 16 Global Step: 282160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:33,591-Speed 8980.58 samples/sec Loss 3.8500 LearningRate 0.0024 Epoch: 16 Global Step: 282170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:34,754-Speed 8807.83 samples/sec Loss 3.8401 LearningRate 0.0024 Epoch: 16 Global Step: 282180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:35,866-Speed 9220.14 samples/sec Loss 3.8148 LearningRate 0.0024 Epoch: 16 Global Step: 282190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:36,987-Speed 9148.33 samples/sec Loss 3.8550 LearningRate 0.0024 Epoch: 16 Global Step: 282200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:38,075-Speed 9412.27 samples/sec Loss 3.9032 LearningRate 0.0024 Epoch: 16 Global Step: 282210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:57:39,183-Speed 9247.80 samples/sec Loss 3.9255 LearningRate 0.0024 Epoch: 16 Global Step: 282220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:40,331-Speed 8928.06 samples/sec Loss 3.8605 LearningRate 0.0024 Epoch: 16 Global Step: 282230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:41,464-Speed 9036.49 samples/sec Loss 3.9283 LearningRate 0.0024 Epoch: 16 Global Step: 282240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:42,571-Speed 9256.82 samples/sec Loss 3.8454 LearningRate 0.0024 Epoch: 16 Global Step: 282250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:43,710-Speed 8995.28 samples/sec Loss 3.8238 LearningRate 0.0024 Epoch: 16 Global Step: 282260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:44,842-Speed 9057.05 samples/sec Loss 3.8076 LearningRate 0.0024 Epoch: 16 Global Step: 282270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:45,998-Speed 8856.08 samples/sec Loss 3.8694 LearningRate 0.0024 Epoch: 16 Global Step: 282280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:47,143-Speed 8954.45 samples/sec Loss 3.8273 LearningRate 0.0024 Epoch: 16 Global Step: 282290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:48,289-Speed 8943.52 samples/sec Loss 3.8977 LearningRate 0.0024 Epoch: 16 Global Step: 282300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:49,400-Speed 9220.52 samples/sec Loss 3.8799 LearningRate 0.0024 Epoch: 16 Global Step: 282310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:50,553-Speed 8880.64 samples/sec Loss 3.9055 LearningRate 0.0024 Epoch: 16 Global Step: 282320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:57:51,660-Speed 9256.64 samples/sec Loss 3.9431 LearningRate 0.0024 Epoch: 16 Global Step: 282330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:52,759-Speed 9327.10 samples/sec Loss 3.9137 LearningRate 0.0024 Epoch: 16 Global Step: 282340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:53,897-Speed 9002.42 samples/sec Loss 3.8364 LearningRate 0.0024 Epoch: 16 Global Step: 282350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:55,048-Speed 8894.65 samples/sec Loss 3.8131 LearningRate 0.0024 Epoch: 16 Global Step: 282360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:56,201-Speed 8890.39 samples/sec Loss 3.9555 LearningRate 0.0024 Epoch: 16 Global Step: 282370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:57,289-Speed 9416.20 samples/sec Loss 3.8118 LearningRate 0.0024 Epoch: 16 Global Step: 282380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:58,446-Speed 8857.76 samples/sec Loss 3.8796 LearningRate 0.0024 Epoch: 16 Global Step: 282390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:57:59,519-Speed 9549.86 samples/sec Loss 3.9079 LearningRate 0.0024 Epoch: 16 Global Step: 282400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:00,651-Speed 9053.91 samples/sec Loss 3.8348 LearningRate 0.0024 Epoch: 16 Global Step: 282410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:01,795-Speed 8957.20 samples/sec Loss 3.8644 LearningRate 0.0024 Epoch: 16 Global Step: 282420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:02,880-Speed 9452.55 samples/sec Loss 3.8103 LearningRate 0.0024 Epoch: 16 Global Step: 282430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:03,969-Speed 9407.49 samples/sec Loss 3.8786 LearningRate 0.0024 Epoch: 16 Global Step: 282440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:05,038-Speed 9586.71 samples/sec Loss 3.9038 LearningRate 0.0024 Epoch: 16 Global Step: 282450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:06,197-Speed 8836.38 samples/sec Loss 3.8555 LearningRate 0.0024 Epoch: 16 Global Step: 282460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:07,334-Speed 9016.44 samples/sec Loss 3.8463 LearningRate 0.0024 Epoch: 16 Global Step: 282470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:08,491-Speed 8852.72 samples/sec Loss 3.7898 LearningRate 0.0024 Epoch: 16 Global Step: 282480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:09,614-Speed 9124.95 samples/sec Loss 3.7671 LearningRate 0.0024 Epoch: 16 Global Step: 282490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:10,752-Speed 9000.54 samples/sec Loss 3.8650 LearningRate 0.0024 Epoch: 16 Global Step: 282500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:11,875-Speed 9133.33 samples/sec Loss 3.8312 LearningRate 0.0024 Epoch: 16 Global Step: 282510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:13,020-Speed 8942.11 samples/sec Loss 3.8685 LearningRate 0.0024 Epoch: 16 Global Step: 282520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:14,216-Speed 8565.68 samples/sec Loss 3.8100 LearningRate 0.0024 Epoch: 16 Global Step: 282530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:58:15,342-Speed 9108.52 samples/sec Loss 3.8380 LearningRate 0.0024 Epoch: 16 Global Step: 282540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:16,513-Speed 8743.79 samples/sec Loss 3.8970 LearningRate 0.0024 Epoch: 16 Global Step: 282550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:17,659-Speed 8941.53 samples/sec Loss 3.9172 LearningRate 0.0024 Epoch: 16 Global Step: 282560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:18,818-Speed 8840.84 samples/sec Loss 3.8933 LearningRate 0.0024 Epoch: 16 Global Step: 282570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:19,973-Speed 8874.41 samples/sec Loss 3.8563 LearningRate 0.0024 Epoch: 16 Global Step: 282580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:21,118-Speed 8946.67 samples/sec Loss 3.8376 LearningRate 0.0024 Epoch: 16 Global Step: 282590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:22,258-Speed 8986.44 samples/sec Loss 3.8652 LearningRate 0.0024 Epoch: 16 Global Step: 282600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:23,386-Speed 9082.23 samples/sec Loss 3.9164 LearningRate 0.0024 Epoch: 16 Global Step: 282610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:24,534-Speed 8926.18 samples/sec Loss 3.8089 LearningRate 0.0024 Epoch: 16 Global Step: 282620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:25,641-Speed 9250.87 samples/sec Loss 3.8189 LearningRate 0.0024 Epoch: 16 Global Step: 282630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:26,737-Speed 9347.97 samples/sec Loss 3.9115 LearningRate 0.0024 Epoch: 16 Global Step: 282640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:27,880-Speed 8962.88 samples/sec Loss 3.7963 LearningRate 0.0023 Epoch: 16 Global Step: 282650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:28,950-Speed 9576.86 samples/sec Loss 3.8266 LearningRate 0.0023 Epoch: 16 Global Step: 282660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:30,071-Speed 9145.60 samples/sec Loss 3.9511 LearningRate 0.0023 Epoch: 16 Global Step: 282670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:31,232-Speed 8825.84 samples/sec Loss 3.8393 LearningRate 0.0023 Epoch: 16 Global Step: 282680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:32,400-Speed 8771.63 samples/sec Loss 3.9105 LearningRate 0.0023 Epoch: 16 Global Step: 282690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:33,532-Speed 9053.88 samples/sec Loss 3.8320 LearningRate 0.0023 Epoch: 16 Global Step: 282700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:34,648-Speed 9177.87 samples/sec Loss 3.7442 LearningRate 0.0023 Epoch: 16 Global Step: 282710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:35,748-Speed 9313.39 samples/sec Loss 3.8429 LearningRate 0.0023 Epoch: 16 Global Step: 282720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:36,882-Speed 9032.97 samples/sec Loss 3.8533 LearningRate 0.0023 Epoch: 16 Global Step: 282730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:38,044-Speed 8819.37 samples/sec Loss 3.8624 LearningRate 0.0023 Epoch: 16 Global Step: 282740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:58:39,165-Speed 9140.10 samples/sec Loss 3.8322 LearningRate 0.0023 Epoch: 16 Global Step: 282750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:40,331-Speed 8786.77 samples/sec Loss 3.9126 LearningRate 0.0023 Epoch: 16 Global Step: 282760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:41,433-Speed 9295.76 samples/sec Loss 3.8558 LearningRate 0.0023 Epoch: 16 Global Step: 282770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:42,529-Speed 9349.87 samples/sec Loss 3.8705 LearningRate 0.0023 Epoch: 16 Global Step: 282780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:43,643-Speed 9194.73 samples/sec Loss 3.8600 LearningRate 0.0023 Epoch: 16 Global Step: 282790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:44,752-Speed 9246.05 samples/sec Loss 3.9158 LearningRate 0.0023 Epoch: 16 Global Step: 282800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:45,862-Speed 9227.52 samples/sec Loss 3.8121 LearningRate 0.0023 Epoch: 16 Global Step: 282810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:47,023-Speed 8820.11 samples/sec Loss 3.8703 LearningRate 0.0023 Epoch: 16 Global Step: 282820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:48,132-Speed 9244.91 samples/sec Loss 3.8286 LearningRate 0.0023 Epoch: 16 Global Step: 282830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:49,278-Speed 8945.35 samples/sec Loss 3.8450 LearningRate 0.0023 Epoch: 16 Global Step: 282840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:50,363-Speed 9444.02 samples/sec Loss 3.8022 LearningRate 0.0023 Epoch: 16 Global Step: 282850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:58:51,484-Speed 9138.63 samples/sec Loss 3.8869 LearningRate 0.0023 Epoch: 16 Global Step: 282860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:58:52,621-Speed 9005.84 samples/sec Loss 3.9278 LearningRate 0.0023 Epoch: 16 Global Step: 282870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:53,785-Speed 8804.57 samples/sec Loss 3.8827 LearningRate 0.0023 Epoch: 16 Global Step: 282880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:54,984-Speed 8545.45 samples/sec Loss 3.8151 LearningRate 0.0023 Epoch: 16 Global Step: 282890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:56,085-Speed 9301.77 samples/sec Loss 3.8113 LearningRate 0.0023 Epoch: 16 Global Step: 282900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:57,200-Speed 9192.86 samples/sec Loss 3.8566 LearningRate 0.0023 Epoch: 16 Global Step: 282910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:58,348-Speed 8921.63 samples/sec Loss 3.8650 LearningRate 0.0023 Epoch: 16 Global Step: 282920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:58:59,453-Speed 9277.00 samples/sec Loss 3.8621 LearningRate 0.0023 Epoch: 16 Global Step: 282930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:00,598-Speed 8945.25 samples/sec Loss 3.8578 LearningRate 0.0023 Epoch: 16 Global Step: 282940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:01,764-Speed 8788.05 samples/sec Loss 3.8423 LearningRate 0.0023 Epoch: 16 Global Step: 282950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:02,855-Speed 9388.87 samples/sec Loss 3.8732 LearningRate 0.0023 Epoch: 16 Global Step: 282960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:03,942-Speed 9429.57 samples/sec Loss 3.9083 LearningRate 0.0023 Epoch: 16 Global Step: 282970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:59:05,064-Speed 9136.62 samples/sec Loss 3.8773 LearningRate 0.0023 Epoch: 16 Global Step: 282980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:06,150-Speed 9431.60 samples/sec Loss 3.8601 LearningRate 0.0023 Epoch: 16 Global Step: 282990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:07,238-Speed 9420.22 samples/sec Loss 3.8004 LearningRate 0.0023 Epoch: 16 Global Step: 283000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:08,407-Speed 8761.74 samples/sec Loss 3.8342 LearningRate 0.0023 Epoch: 16 Global Step: 283010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:09,562-Speed 8867.39 samples/sec Loss 3.8422 LearningRate 0.0023 Epoch: 16 Global Step: 283020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:10,681-Speed 9159.21 samples/sec Loss 3.8403 LearningRate 0.0023 Epoch: 16 Global Step: 283030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:11,798-Speed 9169.35 samples/sec Loss 3.8827 LearningRate 0.0023 Epoch: 16 Global Step: 283040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:12,887-Speed 9416.41 samples/sec Loss 3.9007 LearningRate 0.0023 Epoch: 16 Global Step: 283050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:14,037-Speed 8904.24 samples/sec Loss 3.8124 LearningRate 0.0023 Epoch: 16 Global Step: 283060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:15,188-Speed 8900.95 samples/sec Loss 3.8818 LearningRate 0.0023 Epoch: 16 Global Step: 283070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:16,273-Speed 9448.79 samples/sec Loss 3.8337 LearningRate 0.0023 Epoch: 16 Global Step: 283080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:17,417-Speed 8952.77 samples/sec Loss 3.8898 LearningRate 0.0023 Epoch: 16 Global Step: 283090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:18,519-Speed 9298.43 samples/sec Loss 3.8098 LearningRate 0.0023 Epoch: 16 Global Step: 283100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:19,658-Speed 8994.97 samples/sec Loss 3.8642 LearningRate 0.0023 Epoch: 16 Global Step: 283110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:20,764-Speed 9260.24 samples/sec Loss 3.8186 LearningRate 0.0023 Epoch: 16 Global Step: 283120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:21,876-Speed 9306.74 samples/sec Loss 3.8227 LearningRate 0.0023 Epoch: 16 Global Step: 283130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:22,974-Speed 9334.14 samples/sec Loss 3.8613 LearningRate 0.0023 Epoch: 16 Global Step: 283140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:24,123-Speed 8919.94 samples/sec Loss 3.9189 LearningRate 0.0023 Epoch: 16 Global Step: 283150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:25,218-Speed 9351.29 samples/sec Loss 3.9084 LearningRate 0.0023 Epoch: 16 Global Step: 283160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:26,437-Speed 8405.55 samples/sec Loss 3.9305 LearningRate 0.0023 Epoch: 16 Global Step: 283170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:27,613-Speed 8710.28 samples/sec Loss 3.9182 LearningRate 0.0023 Epoch: 16 Global Step: 283180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 22:59:28,740-Speed 9093.67 samples/sec Loss 3.7868 LearningRate 0.0023 Epoch: 16 Global Step: 283190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:29,845-Speed 9269.24 samples/sec Loss 3.7557 LearningRate 0.0023 Epoch: 16 Global Step: 283200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:30,989-Speed 8959.47 samples/sec Loss 3.7719 LearningRate 0.0023 Epoch: 16 Global Step: 283210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:32,149-Speed 8829.41 samples/sec Loss 3.8178 LearningRate 0.0023 Epoch: 16 Global Step: 283220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:33,297-Speed 8927.03 samples/sec Loss 3.8707 LearningRate 0.0023 Epoch: 16 Global Step: 283230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:34,424-Speed 9093.56 samples/sec Loss 3.8461 LearningRate 0.0023 Epoch: 16 Global Step: 283240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:35,530-Speed 9260.22 samples/sec Loss 3.9598 LearningRate 0.0023 Epoch: 16 Global Step: 283250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:36,633-Speed 9288.21 samples/sec Loss 3.7988 LearningRate 0.0023 Epoch: 16 Global Step: 283260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:37,801-Speed 8771.32 samples/sec Loss 3.8339 LearningRate 0.0023 Epoch: 16 Global Step: 283270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:38,920-Speed 9153.40 samples/sec Loss 3.9198 LearningRate 0.0023 Epoch: 16 Global Step: 283280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:40,043-Speed 9125.91 samples/sec Loss 3.8033 LearningRate 0.0023 Epoch: 16 Global Step: 283290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:41,183-Speed 9007.29 samples/sec Loss 3.9409 LearningRate 0.0023 Epoch: 16 Global Step: 283300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:42,282-Speed 9326.50 samples/sec Loss 3.8011 LearningRate 0.0023 Epoch: 16 Global Step: 283310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:43,452-Speed 8755.58 samples/sec Loss 3.8082 LearningRate 0.0023 Epoch: 16 Global Step: 283320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:44,602-Speed 8911.11 samples/sec Loss 3.9136 LearningRate 0.0023 Epoch: 16 Global Step: 283330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:45,673-Speed 9563.78 samples/sec Loss 3.9136 LearningRate 0.0023 Epoch: 16 Global Step: 283340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:46,772-Speed 9320.70 samples/sec Loss 3.9019 LearningRate 0.0023 Epoch: 16 Global Step: 283350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:47,901-Speed 9071.61 samples/sec Loss 3.8619 LearningRate 0.0023 Epoch: 16 Global Step: 283360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:49,013-Speed 9223.28 samples/sec Loss 3.7791 LearningRate 0.0023 Epoch: 16 Global Step: 283370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:50,112-Speed 9317.76 samples/sec Loss 3.9471 LearningRate 0.0023 Epoch: 16 Global Step: 283380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:51,227-Speed 9186.23 samples/sec Loss 3.9200 LearningRate 0.0023 Epoch: 16 Global Step: 283390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:52,374-Speed 8935.02 samples/sec Loss 3.8879 LearningRate 0.0023 Epoch: 16 Global Step: 283400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:53,475-Speed 9302.86 samples/sec Loss 3.8691 LearningRate 0.0023 Epoch: 16 Global Step: 283410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:54,575-Speed 9317.74 samples/sec Loss 3.8563 LearningRate 0.0023 Epoch: 16 Global Step: 283420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:55,689-Speed 9195.79 samples/sec Loss 3.8296 LearningRate 0.0023 Epoch: 16 Global Step: 283430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 22:59:56,785-Speed 9347.89 samples/sec Loss 3.9117 LearningRate 0.0023 Epoch: 16 Global Step: 283440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:57,935-Speed 8914.55 samples/sec Loss 3.8088 LearningRate 0.0023 Epoch: 16 Global Step: 283450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 22:59:59,109-Speed 8724.67 samples/sec Loss 3.8689 LearningRate 0.0023 Epoch: 16 Global Step: 283460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:00,253-Speed 8956.97 samples/sec Loss 3.8223 LearningRate 0.0023 Epoch: 16 Global Step: 283470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:01,331-Speed 9514.37 samples/sec Loss 3.9081 LearningRate 0.0023 Epoch: 16 Global Step: 283480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:02,421-Speed 9392.77 samples/sec Loss 3.8700 LearningRate 0.0023 Epoch: 16 Global Step: 283490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:03,490-Speed 9584.71 samples/sec Loss 3.8954 LearningRate 0.0023 Epoch: 16 Global Step: 283500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:04,590-Speed 9315.83 samples/sec Loss 3.8567 LearningRate 0.0023 Epoch: 16 Global Step: 283510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:05,676-Speed 9433.42 samples/sec Loss 3.8480 LearningRate 0.0023 Epoch: 16 Global Step: 283520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:06,734-Speed 9684.43 samples/sec Loss 3.8508 LearningRate 0.0023 Epoch: 16 Global Step: 283530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:07,817-Speed 9464.52 samples/sec Loss 3.9172 LearningRate 0.0023 Epoch: 16 Global Step: 283540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:08,928-Speed 9218.90 samples/sec Loss 3.8944 LearningRate 0.0023 Epoch: 16 Global Step: 283550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:10,042-Speed 9193.08 samples/sec Loss 3.8330 LearningRate 0.0023 Epoch: 16 Global Step: 283560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:11,192-Speed 8908.97 samples/sec Loss 3.9412 LearningRate 0.0023 Epoch: 16 Global Step: 283570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:12,364-Speed 8745.66 samples/sec Loss 3.8642 LearningRate 0.0023 Epoch: 16 Global Step: 283580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:13,427-Speed 9641.28 samples/sec Loss 3.8424 LearningRate 0.0023 Epoch: 16 Global Step: 283590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:14,570-Speed 8960.73 samples/sec Loss 3.8227 LearningRate 0.0023 Epoch: 16 Global Step: 283600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:15,721-Speed 8902.90 samples/sec Loss 3.9396 LearningRate 0.0023 Epoch: 16 Global Step: 283610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:16,800-Speed 9494.45 samples/sec Loss 3.8278 LearningRate 0.0023 Epoch: 16 Global Step: 283620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:17,902-Speed 9298.40 samples/sec Loss 3.7516 LearningRate 0.0023 Epoch: 16 Global Step: 283630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:19,021-Speed 9161.30 samples/sec Loss 3.9112 LearningRate 0.0023 Epoch: 16 Global Step: 283640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:20,131-Speed 9235.55 samples/sec Loss 3.8416 LearningRate 0.0023 Epoch: 16 Global Step: 283650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:21,261-Speed 9065.75 samples/sec Loss 3.8144 LearningRate 0.0023 Epoch: 16 Global Step: 283660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:00:22,461-Speed 8537.48 samples/sec Loss 4.0328 LearningRate 0.0023 Epoch: 16 Global Step: 283670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:23,601-Speed 8982.11 samples/sec Loss 3.8254 LearningRate 0.0023 Epoch: 16 Global Step: 283680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:24,698-Speed 9340.16 samples/sec Loss 3.8305 LearningRate 0.0023 Epoch: 16 Global Step: 283690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:25,897-Speed 8547.87 samples/sec Loss 3.8197 LearningRate 0.0023 Epoch: 16 Global Step: 283700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:27,057-Speed 8832.55 samples/sec Loss 3.7826 LearningRate 0.0023 Epoch: 16 Global Step: 283710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:28,220-Speed 8810.10 samples/sec Loss 3.8752 LearningRate 0.0023 Epoch: 16 Global Step: 283720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:29,353-Speed 9039.14 samples/sec Loss 3.8027 LearningRate 0.0023 Epoch: 16 Global Step: 283730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:30,736-Speed 7408.68 samples/sec Loss 3.8656 LearningRate 0.0023 Epoch: 16 Global Step: 283740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:58,625-Speed 367.19 samples/sec Loss 3.8192 LearningRate 0.0022 Epoch: 17 Global Step: 283750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:00:59,978-Speed 7579.24 samples/sec Loss 3.5038 LearningRate 0.0022 Epoch: 17 Global Step: 283760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:01,151-Speed 8736.32 samples/sec Loss 3.4445 LearningRate 0.0022 Epoch: 17 Global Step: 283770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:01:02,818-Speed 6146.99 samples/sec Loss 3.3952 LearningRate 0.0022 Epoch: 17 Global Step: 283780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:04,433-Speed 6340.72 samples/sec Loss 3.3905 LearningRate 0.0022 Epoch: 17 Global Step: 283790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:05,642-Speed 8474.65 samples/sec Loss 3.4260 LearningRate 0.0022 Epoch: 17 Global Step: 283800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:06,804-Speed 8823.11 samples/sec Loss 3.4252 LearningRate 0.0022 Epoch: 17 Global Step: 283810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:07,855-Speed 9750.06 samples/sec Loss 3.4625 LearningRate 0.0022 Epoch: 17 Global Step: 283820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:09,027-Speed 8742.31 samples/sec Loss 3.4486 LearningRate 0.0022 Epoch: 17 Global Step: 283830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:10,202-Speed 8719.98 samples/sec Loss 3.5122 LearningRate 0.0022 Epoch: 17 Global Step: 283840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:11,308-Speed 9259.84 samples/sec Loss 3.4640 LearningRate 0.0022 Epoch: 17 Global Step: 283850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:12,443-Speed 9027.26 samples/sec Loss 3.4688 LearningRate 0.0022 Epoch: 17 Global Step: 283860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:13,561-Speed 9170.44 samples/sec Loss 3.4699 LearningRate 0.0022 Epoch: 17 Global Step: 283870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:14,677-Speed 9276.70 samples/sec Loss 3.4882 LearningRate 0.0022 Epoch: 17 Global Step: 283880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:01:15,796-Speed 9157.23 samples/sec Loss 3.5229 LearningRate 0.0022 Epoch: 17 Global Step: 283890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:16,929-Speed 9049.12 samples/sec Loss 3.4912 LearningRate 0.0022 Epoch: 17 Global Step: 283900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:18,043-Speed 9192.66 samples/sec Loss 3.5341 LearningRate 0.0022 Epoch: 17 Global Step: 283910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:19,127-Speed 9453.39 samples/sec Loss 3.4809 LearningRate 0.0022 Epoch: 17 Global Step: 283920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:20,230-Speed 9293.82 samples/sec Loss 3.4687 LearningRate 0.0022 Epoch: 17 Global Step: 283930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:21,336-Speed 9268.46 samples/sec Loss 3.4351 LearningRate 0.0022 Epoch: 17 Global Step: 283940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:22,445-Speed 9240.39 samples/sec Loss 3.5552 LearningRate 0.0022 Epoch: 17 Global Step: 283950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:23,601-Speed 8860.84 samples/sec Loss 3.3921 LearningRate 0.0022 Epoch: 17 Global Step: 283960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:25,021-Speed 7215.04 samples/sec Loss 3.3789 LearningRate 0.0022 Epoch: 17 Global Step: 283970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:26,156-Speed 9028.84 samples/sec Loss 3.4393 LearningRate 0.0022 Epoch: 17 Global Step: 283980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:01:27,345-Speed 8616.79 samples/sec Loss 3.3752 LearningRate 0.0022 Epoch: 17 Global Step: 283990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:01:28,457-Speed 9211.02 samples/sec Loss 3.4216 LearningRate 0.0022 Epoch: 17 Global Step: 284000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:01:50,554-[lfw][284000]XNorm: 6.808961 Training: 2022-04-11 23:01:50,555-[lfw][284000]Accuracy-Flip: 0.99683+-0.00283 Training: 2022-04-11 23:01:50,555-[lfw][284000]Accuracy-Highest: 0.99733 Training: 2022-04-11 23:02:16,072-[cfp_fp][284000]XNorm: 5.943644 Training: 2022-04-11 23:02:16,073-[cfp_fp][284000]Accuracy-Flip: 0.97214+-0.00884 Training: 2022-04-11 23:02:16,073-[cfp_fp][284000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:02:38,145-[agedb_30][284000]XNorm: 6.613818 Training: 2022-04-11 23:02:38,145-[agedb_30][284000]Accuracy-Flip: 0.97300+-0.00833 Training: 2022-04-11 23:02:38,146-[agedb_30][284000]Accuracy-Highest: 0.97350 Training: 2022-04-11 23:02:39,252-Speed 144.65 samples/sec Loss 3.5006 LearningRate 0.0022 Epoch: 17 Global Step: 284010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:40,355-Speed 9292.26 samples/sec Loss 3.4498 LearningRate 0.0022 Epoch: 17 Global Step: 284020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:41,480-Speed 9103.98 samples/sec Loss 3.4370 LearningRate 0.0022 Epoch: 17 Global Step: 284030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:42,575-Speed 9360.05 samples/sec Loss 3.4808 LearningRate 0.0022 Epoch: 17 Global Step: 284040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:43,747-Speed 8741.38 samples/sec Loss 3.4537 LearningRate 0.0022 Epoch: 17 Global Step: 284050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:44,889-Speed 8975.51 samples/sec Loss 3.5411 LearningRate 0.0022 Epoch: 17 Global Step: 284060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:46,001-Speed 9216.28 samples/sec Loss 3.5077 LearningRate 0.0022 Epoch: 17 Global Step: 284070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:47,074-Speed 9547.02 samples/sec Loss 3.4425 LearningRate 0.0022 Epoch: 17 Global Step: 284080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:48,233-Speed 8840.12 samples/sec Loss 3.4519 LearningRate 0.0022 Epoch: 17 Global Step: 284090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:49,302-Speed 9586.78 samples/sec Loss 3.5136 LearningRate 0.0022 Epoch: 17 Global Step: 284100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:50,411-Speed 9236.58 samples/sec Loss 3.4285 LearningRate 0.0022 Epoch: 17 Global Step: 284110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:02:51,553-Speed 8973.93 samples/sec Loss 3.4649 LearningRate 0.0022 Epoch: 17 Global Step: 284120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:02:52,703-Speed 8903.32 samples/sec Loss 3.5147 LearningRate 0.0022 Epoch: 17 Global Step: 284130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:02:53,849-Speed 8941.66 samples/sec Loss 3.4386 LearningRate 0.0022 Epoch: 17 Global Step: 284140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:02:54,978-Speed 9080.83 samples/sec Loss 3.5411 LearningRate 0.0022 Epoch: 17 Global Step: 284150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:56,086-Speed 9240.60 samples/sec Loss 3.4517 LearningRate 0.0022 Epoch: 17 Global Step: 284160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:57,220-Speed 9038.54 samples/sec Loss 3.4915 LearningRate 0.0022 Epoch: 17 Global Step: 284170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:58,321-Speed 9306.78 samples/sec Loss 3.4473 LearningRate 0.0022 Epoch: 17 Global Step: 284180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:02:59,387-Speed 9616.59 samples/sec Loss 3.4125 LearningRate 0.0022 Epoch: 17 Global Step: 284190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:00,478-Speed 9386.95 samples/sec Loss 3.4626 LearningRate 0.0022 Epoch: 17 Global Step: 284200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:01,701-Speed 8378.94 samples/sec Loss 3.4482 LearningRate 0.0022 Epoch: 17 Global Step: 284210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:02,849-Speed 8926.31 samples/sec Loss 3.4777 LearningRate 0.0022 Epoch: 17 Global Step: 284220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:04,006-Speed 8850.86 samples/sec Loss 3.5788 LearningRate 0.0022 Epoch: 17 Global Step: 284230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:05,082-Speed 9524.97 samples/sec Loss 3.4990 LearningRate 0.0022 Epoch: 17 Global Step: 284240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:06,180-Speed 9335.89 samples/sec Loss 3.4883 LearningRate 0.0022 Epoch: 17 Global Step: 284250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:03:07,680-Speed 6828.12 samples/sec Loss 3.3820 LearningRate 0.0022 Epoch: 17 Global Step: 284260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:03:08,796-Speed 9184.30 samples/sec Loss 3.5774 LearningRate 0.0022 Epoch: 17 Global Step: 284270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:03:09,927-Speed 9058.55 samples/sec Loss 3.4990 LearningRate 0.0022 Epoch: 17 Global Step: 284280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:03:11,442-Speed 6760.65 samples/sec Loss 3.4078 LearningRate 0.0022 Epoch: 17 Global Step: 284290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:03:12,757-Speed 7793.30 samples/sec Loss 3.5084 LearningRate 0.0022 Epoch: 17 Global Step: 284300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:03:14,054-Speed 7896.05 samples/sec Loss 3.5200 LearningRate 0.0022 Epoch: 17 Global Step: 284310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:03:15,157-Speed 9292.38 samples/sec Loss 3.4706 LearningRate 0.0022 Epoch: 17 Global Step: 284320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:16,430-Speed 8047.56 samples/sec Loss 3.4677 LearningRate 0.0022 Epoch: 17 Global Step: 284330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:17,590-Speed 8831.21 samples/sec Loss 3.4784 LearningRate 0.0022 Epoch: 17 Global Step: 284340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:18,730-Speed 8988.28 samples/sec Loss 3.5153 LearningRate 0.0022 Epoch: 17 Global Step: 284350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:20,000-Speed 8070.36 samples/sec Loss 3.5364 LearningRate 0.0022 Epoch: 17 Global Step: 284360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:21,102-Speed 9292.22 samples/sec Loss 3.5600 LearningRate 0.0022 Epoch: 17 Global Step: 284370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:22,186-Speed 9451.50 samples/sec Loss 3.4334 LearningRate 0.0022 Epoch: 17 Global Step: 284380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:23,387-Speed 8537.75 samples/sec Loss 3.4260 LearningRate 0.0022 Epoch: 17 Global Step: 284390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:24,536-Speed 8915.99 samples/sec Loss 3.3988 LearningRate 0.0022 Epoch: 17 Global Step: 284400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:25,620-Speed 9450.50 samples/sec Loss 3.4554 LearningRate 0.0022 Epoch: 17 Global Step: 284410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:26,756-Speed 9015.79 samples/sec Loss 3.5104 LearningRate 0.0022 Epoch: 17 Global Step: 284420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:27,917-Speed 8831.50 samples/sec Loss 3.4705 LearningRate 0.0022 Epoch: 17 Global Step: 284430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:29,201-Speed 7973.85 samples/sec Loss 3.4281 LearningRate 0.0022 Epoch: 17 Global Step: 284440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:30,292-Speed 9389.69 samples/sec Loss 3.5810 LearningRate 0.0022 Epoch: 17 Global Step: 284450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:31,388-Speed 9353.93 samples/sec Loss 3.5044 LearningRate 0.0022 Epoch: 17 Global Step: 284460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:32,508-Speed 9149.09 samples/sec Loss 3.4749 LearningRate 0.0022 Epoch: 17 Global Step: 284470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:03:33,637-Speed 9071.13 samples/sec Loss 3.3985 LearningRate 0.0022 Epoch: 17 Global Step: 284480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:03:34,752-Speed 9190.21 samples/sec Loss 3.4313 LearningRate 0.0022 Epoch: 17 Global Step: 284490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:03:35,820-Speed 9591.24 samples/sec Loss 3.4465 LearningRate 0.0022 Epoch: 17 Global Step: 284500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:03:36,958-Speed 9008.17 samples/sec Loss 3.4601 LearningRate 0.0022 Epoch: 17 Global Step: 284510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:03:38,097-Speed 8990.39 samples/sec Loss 3.5396 LearningRate 0.0022 Epoch: 17 Global Step: 284520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:03:39,234-Speed 9016.21 samples/sec Loss 3.5066 LearningRate 0.0022 Epoch: 17 Global Step: 284530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:03:40,421-Speed 8626.75 samples/sec Loss 3.4823 LearningRate 0.0022 Epoch: 17 Global Step: 284540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:03:41,544-Speed 9123.72 samples/sec Loss 3.4951 LearningRate 0.0022 Epoch: 17 Global Step: 284550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:03:42,677-Speed 9053.55 samples/sec Loss 3.5235 LearningRate 0.0022 Epoch: 17 Global Step: 284560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:03:43,771-Speed 9360.39 samples/sec Loss 3.4949 LearningRate 0.0022 Epoch: 17 Global Step: 284570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:44,930-Speed 8845.85 samples/sec Loss 3.4114 LearningRate 0.0022 Epoch: 17 Global Step: 284580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:46,047-Speed 9172.43 samples/sec Loss 3.4537 LearningRate 0.0022 Epoch: 17 Global Step: 284590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:47,176-Speed 9072.29 samples/sec Loss 3.4882 LearningRate 0.0022 Epoch: 17 Global Step: 284600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:48,329-Speed 8891.67 samples/sec Loss 3.4691 LearningRate 0.0022 Epoch: 17 Global Step: 284610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:49,527-Speed 8554.70 samples/sec Loss 3.5455 LearningRate 0.0022 Epoch: 17 Global Step: 284620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:50,686-Speed 8836.07 samples/sec Loss 3.4825 LearningRate 0.0022 Epoch: 17 Global Step: 284630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:51,773-Speed 9425.56 samples/sec Loss 3.4545 LearningRate 0.0022 Epoch: 17 Global Step: 284640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:52,945-Speed 8743.21 samples/sec Loss 3.5222 LearningRate 0.0022 Epoch: 17 Global Step: 284650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:54,082-Speed 9010.82 samples/sec Loss 3.5185 LearningRate 0.0022 Epoch: 17 Global Step: 284660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:55,264-Speed 8666.74 samples/sec Loss 3.5093 LearningRate 0.0022 Epoch: 17 Global Step: 284670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:03:56,439-Speed 8720.18 samples/sec Loss 3.5268 LearningRate 0.0022 Epoch: 17 Global Step: 284680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:57,578-Speed 8998.33 samples/sec Loss 3.4027 LearningRate 0.0022 Epoch: 17 Global Step: 284690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:58,717-Speed 8992.52 samples/sec Loss 3.4304 LearningRate 0.0022 Epoch: 17 Global Step: 284700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:03:59,827-Speed 9228.43 samples/sec Loss 3.5114 LearningRate 0.0022 Epoch: 17 Global Step: 284710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:00,954-Speed 9094.47 samples/sec Loss 3.5100 LearningRate 0.0022 Epoch: 17 Global Step: 284720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:02,087-Speed 9049.39 samples/sec Loss 3.4890 LearningRate 0.0022 Epoch: 17 Global Step: 284730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:03,233-Speed 8940.18 samples/sec Loss 3.4476 LearningRate 0.0022 Epoch: 17 Global Step: 284740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:04,329-Speed 9346.17 samples/sec Loss 3.5246 LearningRate 0.0022 Epoch: 17 Global Step: 284750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:05,434-Speed 9275.12 samples/sec Loss 3.4807 LearningRate 0.0022 Epoch: 17 Global Step: 284760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:06,581-Speed 8934.58 samples/sec Loss 3.5326 LearningRate 0.0022 Epoch: 17 Global Step: 284770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:04:07,717-Speed 9013.39 samples/sec Loss 3.5205 LearningRate 0.0022 Epoch: 17 Global Step: 284780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:04:08,788-Speed 9573.39 samples/sec Loss 3.4272 LearningRate 0.0022 Epoch: 17 Global Step: 284790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:04:09,911-Speed 9123.56 samples/sec Loss 3.4549 LearningRate 0.0022 Epoch: 17 Global Step: 284800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:04:11,008-Speed 9338.72 samples/sec Loss 3.4191 LearningRate 0.0022 Epoch: 17 Global Step: 284810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:04:12,177-Speed 8760.81 samples/sec Loss 3.4584 LearningRate 0.0022 Epoch: 17 Global Step: 284820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:04:13,315-Speed 9012.46 samples/sec Loss 3.5038 LearningRate 0.0022 Epoch: 17 Global Step: 284830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:04:14,390-Speed 9527.81 samples/sec Loss 3.5695 LearningRate 0.0022 Epoch: 17 Global Step: 284840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:04:15,534-Speed 8955.45 samples/sec Loss 3.4222 LearningRate 0.0022 Epoch: 17 Global Step: 284850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:04:16,667-Speed 9046.66 samples/sec Loss 3.4228 LearningRate 0.0022 Epoch: 17 Global Step: 284860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:04:17,800-Speed 9043.85 samples/sec Loss 3.4892 LearningRate 0.0022 Epoch: 17 Global Step: 284870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:18,911-Speed 9220.18 samples/sec Loss 3.4388 LearningRate 0.0021 Epoch: 17 Global Step: 284880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:20,002-Speed 9398.58 samples/sec Loss 3.4781 LearningRate 0.0021 Epoch: 17 Global Step: 284890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:21,102-Speed 9314.59 samples/sec Loss 3.4779 LearningRate 0.0021 Epoch: 17 Global Step: 284900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:22,264-Speed 8819.26 samples/sec Loss 3.5308 LearningRate 0.0021 Epoch: 17 Global Step: 284910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:23,410-Speed 8935.89 samples/sec Loss 3.4939 LearningRate 0.0021 Epoch: 17 Global Step: 284920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:24,497-Speed 9429.21 samples/sec Loss 3.5082 LearningRate 0.0021 Epoch: 17 Global Step: 284930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:25,599-Speed 9298.22 samples/sec Loss 3.5097 LearningRate 0.0021 Epoch: 17 Global Step: 284940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:26,658-Speed 9674.68 samples/sec Loss 3.4602 LearningRate 0.0021 Epoch: 17 Global Step: 284950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:27,801-Speed 8962.13 samples/sec Loss 3.5424 LearningRate 0.0021 Epoch: 17 Global Step: 284960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:28,925-Speed 9109.34 samples/sec Loss 3.4807 LearningRate 0.0021 Epoch: 17 Global Step: 284970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:30,057-Speed 9055.37 samples/sec Loss 3.4581 LearningRate 0.0021 Epoch: 17 Global Step: 284980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:31,184-Speed 9091.41 samples/sec Loss 3.5742 LearningRate 0.0021 Epoch: 17 Global Step: 284990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:32,322-Speed 9007.78 samples/sec Loss 3.5798 LearningRate 0.0021 Epoch: 17 Global Step: 285000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:33,451-Speed 9076.96 samples/sec Loss 3.5174 LearningRate 0.0021 Epoch: 17 Global Step: 285010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:34,584-Speed 9037.40 samples/sec Loss 3.4932 LearningRate 0.0021 Epoch: 17 Global Step: 285020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:35,764-Speed 8686.38 samples/sec Loss 3.4991 LearningRate 0.0021 Epoch: 17 Global Step: 285030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:36,901-Speed 9010.20 samples/sec Loss 3.4518 LearningRate 0.0021 Epoch: 17 Global Step: 285040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:38,074-Speed 8730.59 samples/sec Loss 3.4995 LearningRate 0.0021 Epoch: 17 Global Step: 285050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:39,200-Speed 9102.71 samples/sec Loss 3.4351 LearningRate 0.0021 Epoch: 17 Global Step: 285060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:40,313-Speed 9206.45 samples/sec Loss 3.4797 LearningRate 0.0021 Epoch: 17 Global Step: 285070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:04:41,418-Speed 9271.67 samples/sec Loss 3.5204 LearningRate 0.0021 Epoch: 17 Global Step: 285080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:42,527-Speed 9242.03 samples/sec Loss 3.5051 LearningRate 0.0021 Epoch: 17 Global Step: 285090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:43,633-Speed 9269.06 samples/sec Loss 3.4556 LearningRate 0.0021 Epoch: 17 Global Step: 285100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:44,723-Speed 9397.53 samples/sec Loss 3.5163 LearningRate 0.0021 Epoch: 17 Global Step: 285110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:45,849-Speed 9096.94 samples/sec Loss 3.5602 LearningRate 0.0021 Epoch: 17 Global Step: 285120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:46,987-Speed 9003.76 samples/sec Loss 3.5111 LearningRate 0.0021 Epoch: 17 Global Step: 285130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:48,123-Speed 9023.27 samples/sec Loss 3.4852 LearningRate 0.0021 Epoch: 17 Global Step: 285140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:49,274-Speed 8899.72 samples/sec Loss 3.5200 LearningRate 0.0021 Epoch: 17 Global Step: 285150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:50,415-Speed 8981.30 samples/sec Loss 3.5869 LearningRate 0.0021 Epoch: 17 Global Step: 285160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:51,523-Speed 9251.85 samples/sec Loss 3.5357 LearningRate 0.0021 Epoch: 17 Global Step: 285170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:52,583-Speed 9665.99 samples/sec Loss 3.4821 LearningRate 0.0021 Epoch: 17 Global Step: 285180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:04:53,675-Speed 9381.12 samples/sec Loss 3.4638 LearningRate 0.0021 Epoch: 17 Global Step: 285190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:54,774-Speed 9319.44 samples/sec Loss 3.5515 LearningRate 0.0021 Epoch: 17 Global Step: 285200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:55,891-Speed 9175.59 samples/sec Loss 3.5377 LearningRate 0.0021 Epoch: 17 Global Step: 285210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:56,991-Speed 9306.97 samples/sec Loss 3.5012 LearningRate 0.0021 Epoch: 17 Global Step: 285220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:58,084-Speed 9380.52 samples/sec Loss 3.4844 LearningRate 0.0021 Epoch: 17 Global Step: 285230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:04:59,173-Speed 9404.53 samples/sec Loss 3.4651 LearningRate 0.0021 Epoch: 17 Global Step: 285240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:00,284-Speed 9225.72 samples/sec Loss 3.4881 LearningRate 0.0021 Epoch: 17 Global Step: 285250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:01,409-Speed 9110.21 samples/sec Loss 3.5035 LearningRate 0.0021 Epoch: 17 Global Step: 285260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:02,527-Speed 9163.34 samples/sec Loss 3.5347 LearningRate 0.0021 Epoch: 17 Global Step: 285270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:03,608-Speed 9478.11 samples/sec Loss 3.4607 LearningRate 0.0021 Epoch: 17 Global Step: 285280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:04,736-Speed 9087.35 samples/sec Loss 3.5779 LearningRate 0.0021 Epoch: 17 Global Step: 285290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:05:05,848-Speed 9210.79 samples/sec Loss 3.5300 LearningRate 0.0021 Epoch: 17 Global Step: 285300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:05:06,944-Speed 9349.63 samples/sec Loss 3.5591 LearningRate 0.0021 Epoch: 17 Global Step: 285310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:08,075-Speed 9057.24 samples/sec Loss 3.5114 LearningRate 0.0021 Epoch: 17 Global Step: 285320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:09,177-Speed 9294.33 samples/sec Loss 3.5692 LearningRate 0.0021 Epoch: 17 Global Step: 285330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:10,308-Speed 9058.96 samples/sec Loss 3.5795 LearningRate 0.0021 Epoch: 17 Global Step: 285340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:11,456-Speed 8923.64 samples/sec Loss 3.5412 LearningRate 0.0021 Epoch: 17 Global Step: 285350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:12,606-Speed 8918.36 samples/sec Loss 3.4608 LearningRate 0.0021 Epoch: 17 Global Step: 285360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:13,751-Speed 8945.07 samples/sec Loss 3.4483 LearningRate 0.0021 Epoch: 17 Global Step: 285370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:14,891-Speed 8993.67 samples/sec Loss 3.4330 LearningRate 0.0021 Epoch: 17 Global Step: 285380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:16,022-Speed 9059.91 samples/sec Loss 3.4842 LearningRate 0.0021 Epoch: 17 Global Step: 285390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:17,236-Speed 8440.07 samples/sec Loss 3.4333 LearningRate 0.0021 Epoch: 17 Global Step: 285400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:18,342-Speed 9265.00 samples/sec Loss 3.4456 LearningRate 0.0021 Epoch: 17 Global Step: 285410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:05:19,446-Speed 9283.24 samples/sec Loss 3.4947 LearningRate 0.0021 Epoch: 17 Global Step: 285420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:20,591-Speed 8946.64 samples/sec Loss 3.5611 LearningRate 0.0021 Epoch: 17 Global Step: 285430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:21,710-Speed 9158.53 samples/sec Loss 3.5965 LearningRate 0.0021 Epoch: 17 Global Step: 285440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:22,851-Speed 8976.17 samples/sec Loss 3.4539 LearningRate 0.0021 Epoch: 17 Global Step: 285450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:23,946-Speed 9362.78 samples/sec Loss 3.4783 LearningRate 0.0021 Epoch: 17 Global Step: 285460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:25,038-Speed 9381.57 samples/sec Loss 3.5133 LearningRate 0.0021 Epoch: 17 Global Step: 285470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:26,113-Speed 9527.31 samples/sec Loss 3.4333 LearningRate 0.0021 Epoch: 17 Global Step: 285480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:27,249-Speed 9024.47 samples/sec Loss 3.5640 LearningRate 0.0021 Epoch: 17 Global Step: 285490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:28,339-Speed 9397.50 samples/sec Loss 3.5160 LearningRate 0.0021 Epoch: 17 Global Step: 285500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:29,515-Speed 8708.70 samples/sec Loss 3.5183 LearningRate 0.0021 Epoch: 17 Global Step: 285510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:30,635-Speed 9148.63 samples/sec Loss 3.4925 LearningRate 0.0021 Epoch: 17 Global Step: 285520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:31,735-Speed 9325.97 samples/sec Loss 3.4752 LearningRate 0.0021 Epoch: 17 Global Step: 285530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:32,829-Speed 9361.92 samples/sec Loss 3.5341 LearningRate 0.0021 Epoch: 17 Global Step: 285540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:33,987-Speed 8845.94 samples/sec Loss 3.5258 LearningRate 0.0021 Epoch: 17 Global Step: 285550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:35,104-Speed 9180.23 samples/sec Loss 3.5077 LearningRate 0.0021 Epoch: 17 Global Step: 285560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:36,183-Speed 9493.73 samples/sec Loss 3.4784 LearningRate 0.0021 Epoch: 17 Global Step: 285570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:37,256-Speed 9545.98 samples/sec Loss 3.5262 LearningRate 0.0021 Epoch: 17 Global Step: 285580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:38,373-Speed 9176.28 samples/sec Loss 3.5060 LearningRate 0.0021 Epoch: 17 Global Step: 285590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:39,447-Speed 9538.73 samples/sec Loss 3.5016 LearningRate 0.0021 Epoch: 17 Global Step: 285600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:40,561-Speed 9193.07 samples/sec Loss 3.4253 LearningRate 0.0021 Epoch: 17 Global Step: 285610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:41,664-Speed 9289.59 samples/sec Loss 3.5325 LearningRate 0.0021 Epoch: 17 Global Step: 285620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:05:42,761-Speed 9349.91 samples/sec Loss 3.5152 LearningRate 0.0021 Epoch: 17 Global Step: 285630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:05:43,887-Speed 9096.26 samples/sec Loss 3.5171 LearningRate 0.0021 Epoch: 17 Global Step: 285640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:45,014-Speed 9090.09 samples/sec Loss 3.4797 LearningRate 0.0021 Epoch: 17 Global Step: 285650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:46,117-Speed 9291.46 samples/sec Loss 3.5612 LearningRate 0.0021 Epoch: 17 Global Step: 285660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:47,295-Speed 8696.84 samples/sec Loss 3.4965 LearningRate 0.0021 Epoch: 17 Global Step: 285670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:48,447-Speed 8889.03 samples/sec Loss 3.5372 LearningRate 0.0021 Epoch: 17 Global Step: 285680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:49,604-Speed 8860.88 samples/sec Loss 3.5149 LearningRate 0.0021 Epoch: 17 Global Step: 285690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:50,688-Speed 9453.29 samples/sec Loss 3.4274 LearningRate 0.0021 Epoch: 17 Global Step: 285700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:51,753-Speed 9619.76 samples/sec Loss 3.5806 LearningRate 0.0021 Epoch: 17 Global Step: 285710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:52,866-Speed 9207.92 samples/sec Loss 3.4816 LearningRate 0.0021 Epoch: 17 Global Step: 285720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:53,982-Speed 9183.52 samples/sec Loss 3.6355 LearningRate 0.0021 Epoch: 17 Global Step: 285730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:55,128-Speed 8937.32 samples/sec Loss 3.5413 LearningRate 0.0021 Epoch: 17 Global Step: 285740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:05:56,247-Speed 9157.72 samples/sec Loss 3.4397 LearningRate 0.0021 Epoch: 17 Global Step: 285750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:05:57,342-Speed 9358.49 samples/sec Loss 3.5289 LearningRate 0.0021 Epoch: 17 Global Step: 285760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:58,499-Speed 8854.46 samples/sec Loss 3.6070 LearningRate 0.0021 Epoch: 17 Global Step: 285770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:05:59,622-Speed 9121.16 samples/sec Loss 3.5880 LearningRate 0.0021 Epoch: 17 Global Step: 285780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:00,765-Speed 8959.09 samples/sec Loss 3.4926 LearningRate 0.0021 Epoch: 17 Global Step: 285790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:01,851-Speed 9438.71 samples/sec Loss 3.5134 LearningRate 0.0021 Epoch: 17 Global Step: 285800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:02,972-Speed 9138.95 samples/sec Loss 3.4403 LearningRate 0.0021 Epoch: 17 Global Step: 285810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:04,063-Speed 9395.09 samples/sec Loss 3.4524 LearningRate 0.0021 Epoch: 17 Global Step: 285820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:05,155-Speed 9382.77 samples/sec Loss 3.5095 LearningRate 0.0021 Epoch: 17 Global Step: 285830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:06,359-Speed 8505.89 samples/sec Loss 3.4677 LearningRate 0.0021 Epoch: 17 Global Step: 285840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:07,496-Speed 9008.57 samples/sec Loss 3.5389 LearningRate 0.0021 Epoch: 17 Global Step: 285850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:08,599-Speed 9292.20 samples/sec Loss 3.6047 LearningRate 0.0021 Epoch: 17 Global Step: 285860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:06:09,742-Speed 8964.09 samples/sec Loss 3.5882 LearningRate 0.0021 Epoch: 17 Global Step: 285870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:06:10,829-Speed 9428.30 samples/sec Loss 3.5101 LearningRate 0.0021 Epoch: 17 Global Step: 285880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:06:11,919-Speed 9400.04 samples/sec Loss 3.5385 LearningRate 0.0021 Epoch: 17 Global Step: 285890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:13,051-Speed 9055.86 samples/sec Loss 3.5075 LearningRate 0.0021 Epoch: 17 Global Step: 285900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:14,175-Speed 9110.57 samples/sec Loss 3.5414 LearningRate 0.0021 Epoch: 17 Global Step: 285910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:15,301-Speed 9103.32 samples/sec Loss 3.4932 LearningRate 0.0021 Epoch: 17 Global Step: 285920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:16,428-Speed 9086.39 samples/sec Loss 3.6678 LearningRate 0.0021 Epoch: 17 Global Step: 285930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:17,569-Speed 8981.90 samples/sec Loss 3.5995 LearningRate 0.0021 Epoch: 17 Global Step: 285940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:18,730-Speed 8820.06 samples/sec Loss 3.4815 LearningRate 0.0021 Epoch: 17 Global Step: 285950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:19,812-Speed 9473.68 samples/sec Loss 3.5031 LearningRate 0.0021 Epoch: 17 Global Step: 285960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:20,952-Speed 8990.33 samples/sec Loss 3.5497 LearningRate 0.0021 Epoch: 17 Global Step: 285970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:22,102-Speed 8907.20 samples/sec Loss 3.5035 LearningRate 0.0021 Epoch: 17 Global Step: 285980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:23,209-Speed 9256.26 samples/sec Loss 3.4890 LearningRate 0.0021 Epoch: 17 Global Step: 285990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:06:24,318-Speed 9245.94 samples/sec Loss 3.4916 LearningRate 0.0021 Epoch: 17 Global Step: 286000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:06:46,419-[lfw][286000]XNorm: 6.755071 Training: 2022-04-11 23:06:46,420-[lfw][286000]Accuracy-Flip: 0.99600+-0.00281 Training: 2022-04-11 23:06:46,420-[lfw][286000]Accuracy-Highest: 0.99733 Training: 2022-04-11 23:07:11,982-[cfp_fp][286000]XNorm: 5.887999 Training: 2022-04-11 23:07:11,983-[cfp_fp][286000]Accuracy-Flip: 0.97329+-0.00740 Training: 2022-04-11 23:07:11,983-[cfp_fp][286000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:07:33,992-[agedb_30][286000]XNorm: 6.568239 Training: 2022-04-11 23:07:33,993-[agedb_30][286000]Accuracy-Flip: 0.97317+-0.00855 Training: 2022-04-11 23:07:33,993-[agedb_30][286000]Accuracy-Highest: 0.97350 Training: 2022-04-11 23:07:35,070-Speed 144.73 samples/sec Loss 3.5722 LearningRate 0.0021 Epoch: 17 Global Step: 286010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:36,137-Speed 9601.57 samples/sec Loss 3.5254 LearningRate 0.0021 Epoch: 17 Global Step: 286020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:37,269-Speed 9050.56 samples/sec Loss 3.5247 LearningRate 0.0020 Epoch: 17 Global Step: 286030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:38,329-Speed 9658.01 samples/sec Loss 3.5424 LearningRate 0.0020 Epoch: 17 Global Step: 286040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:39,450-Speed 9142.93 samples/sec Loss 3.4834 LearningRate 0.0020 Epoch: 17 Global Step: 286050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:40,528-Speed 9507.43 samples/sec Loss 3.4885 LearningRate 0.0020 Epoch: 17 Global Step: 286060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:41,644-Speed 9178.40 samples/sec Loss 3.5547 LearningRate 0.0020 Epoch: 17 Global Step: 286070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:42,748-Speed 9285.65 samples/sec Loss 3.5373 LearningRate 0.0020 Epoch: 17 Global Step: 286080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:43,856-Speed 9243.04 samples/sec Loss 3.5114 LearningRate 0.0020 Epoch: 17 Global Step: 286090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:44,920-Speed 9632.74 samples/sec Loss 3.4847 LearningRate 0.0020 Epoch: 17 Global Step: 286100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:46,042-Speed 9131.46 samples/sec Loss 3.6205 LearningRate 0.0020 Epoch: 17 Global Step: 286110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:47,155-Speed 9201.56 samples/sec Loss 3.4545 LearningRate 0.0020 Epoch: 17 Global Step: 286120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:48,287-Speed 9051.81 samples/sec Loss 3.5276 LearningRate 0.0020 Epoch: 17 Global Step: 286130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:49,355-Speed 9591.50 samples/sec Loss 3.4182 LearningRate 0.0020 Epoch: 17 Global Step: 286140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:50,445-Speed 9402.86 samples/sec Loss 3.4654 LearningRate 0.0020 Epoch: 17 Global Step: 286150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:51,598-Speed 8888.85 samples/sec Loss 3.5565 LearningRate 0.0020 Epoch: 17 Global Step: 286160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:52,725-Speed 9085.53 samples/sec Loss 3.5709 LearningRate 0.0020 Epoch: 17 Global Step: 286170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:53,833-Speed 9252.17 samples/sec Loss 3.4885 LearningRate 0.0020 Epoch: 17 Global Step: 286180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:54,922-Speed 9408.16 samples/sec Loss 3.5372 LearningRate 0.0020 Epoch: 17 Global Step: 286190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:56,125-Speed 8514.03 samples/sec Loss 3.4050 LearningRate 0.0020 Epoch: 17 Global Step: 286200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:07:57,211-Speed 9429.05 samples/sec Loss 3.5630 LearningRate 0.0020 Epoch: 17 Global Step: 286210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:07:58,356-Speed 8956.91 samples/sec Loss 3.5809 LearningRate 0.0020 Epoch: 17 Global Step: 286220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:07:59,465-Speed 9245.24 samples/sec Loss 3.5298 LearningRate 0.0020 Epoch: 17 Global Step: 286230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:00,590-Speed 9108.24 samples/sec Loss 3.5576 LearningRate 0.0020 Epoch: 17 Global Step: 286240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:01,740-Speed 8904.32 samples/sec Loss 3.4594 LearningRate 0.0020 Epoch: 17 Global Step: 286250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:02,853-Speed 9205.44 samples/sec Loss 3.5679 LearningRate 0.0020 Epoch: 17 Global Step: 286260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:03,959-Speed 9270.05 samples/sec Loss 3.5218 LearningRate 0.0020 Epoch: 17 Global Step: 286270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:05,056-Speed 9340.03 samples/sec Loss 3.5647 LearningRate 0.0020 Epoch: 17 Global Step: 286280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:06,193-Speed 9007.92 samples/sec Loss 3.5184 LearningRate 0.0020 Epoch: 17 Global Step: 286290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:07,315-Speed 9134.53 samples/sec Loss 3.4520 LearningRate 0.0020 Epoch: 17 Global Step: 286300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:08,460-Speed 8945.85 samples/sec Loss 3.4603 LearningRate 0.0020 Epoch: 17 Global Step: 286310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:09,577-Speed 9170.12 samples/sec Loss 3.4997 LearningRate 0.0020 Epoch: 17 Global Step: 286320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:10,694-Speed 9178.82 samples/sec Loss 3.6195 LearningRate 0.0020 Epoch: 17 Global Step: 286330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:11,862-Speed 8769.15 samples/sec Loss 3.5619 LearningRate 0.0020 Epoch: 17 Global Step: 286340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:12,917-Speed 9712.19 samples/sec Loss 3.5391 LearningRate 0.0020 Epoch: 17 Global Step: 286350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:14,040-Speed 9128.29 samples/sec Loss 3.5415 LearningRate 0.0020 Epoch: 17 Global Step: 286360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:15,175-Speed 9024.13 samples/sec Loss 3.5551 LearningRate 0.0020 Epoch: 17 Global Step: 286370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:16,319-Speed 8954.34 samples/sec Loss 3.4620 LearningRate 0.0020 Epoch: 17 Global Step: 286380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:17,446-Speed 9095.66 samples/sec Loss 3.5027 LearningRate 0.0020 Epoch: 17 Global Step: 286390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:18,600-Speed 8878.94 samples/sec Loss 3.4729 LearningRate 0.0020 Epoch: 17 Global Step: 286400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:19,720-Speed 9147.36 samples/sec Loss 3.5522 LearningRate 0.0020 Epoch: 17 Global Step: 286410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:20,799-Speed 9504.14 samples/sec Loss 3.5057 LearningRate 0.0020 Epoch: 17 Global Step: 286420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:21,936-Speed 9009.91 samples/sec Loss 3.5717 LearningRate 0.0020 Epoch: 17 Global Step: 286430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:23,093-Speed 8849.83 samples/sec Loss 3.5574 LearningRate 0.0020 Epoch: 17 Global Step: 286440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:24,218-Speed 9108.84 samples/sec Loss 3.5815 LearningRate 0.0020 Epoch: 17 Global Step: 286450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:25,299-Speed 9475.47 samples/sec Loss 3.5593 LearningRate 0.0020 Epoch: 17 Global Step: 286460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:26,393-Speed 9370.28 samples/sec Loss 3.5881 LearningRate 0.0020 Epoch: 17 Global Step: 286470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:27,472-Speed 9492.13 samples/sec Loss 3.5187 LearningRate 0.0020 Epoch: 17 Global Step: 286480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:28,624-Speed 8899.10 samples/sec Loss 3.5082 LearningRate 0.0020 Epoch: 17 Global Step: 286490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:29,748-Speed 9109.00 samples/sec Loss 3.5061 LearningRate 0.0020 Epoch: 17 Global Step: 286500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:30,855-Speed 9261.70 samples/sec Loss 3.4816 LearningRate 0.0020 Epoch: 17 Global Step: 286510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:31,958-Speed 9285.95 samples/sec Loss 3.4800 LearningRate 0.0020 Epoch: 17 Global Step: 286520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:08:33,061-Speed 9289.89 samples/sec Loss 3.5200 LearningRate 0.0020 Epoch: 17 Global Step: 286530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:08:34,157-Speed 9349.03 samples/sec Loss 3.5806 LearningRate 0.0020 Epoch: 17 Global Step: 286540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:35,270-Speed 9205.16 samples/sec Loss 3.4952 LearningRate 0.0020 Epoch: 17 Global Step: 286550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:36,423-Speed 8885.72 samples/sec Loss 3.5785 LearningRate 0.0020 Epoch: 17 Global Step: 286560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:37,516-Speed 9371.69 samples/sec Loss 3.5098 LearningRate 0.0020 Epoch: 17 Global Step: 286570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:38,641-Speed 9108.83 samples/sec Loss 3.5080 LearningRate 0.0020 Epoch: 17 Global Step: 286580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:39,753-Speed 9217.29 samples/sec Loss 3.6064 LearningRate 0.0020 Epoch: 17 Global Step: 286590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:40,847-Speed 9378.26 samples/sec Loss 3.4990 LearningRate 0.0020 Epoch: 17 Global Step: 286600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:41,992-Speed 8953.34 samples/sec Loss 3.5272 LearningRate 0.0020 Epoch: 17 Global Step: 286610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:43,082-Speed 9391.96 samples/sec Loss 3.4231 LearningRate 0.0020 Epoch: 17 Global Step: 286620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:44,241-Speed 8841.75 samples/sec Loss 3.5732 LearningRate 0.0020 Epoch: 17 Global Step: 286630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:45,351-Speed 9230.85 samples/sec Loss 3.5064 LearningRate 0.0020 Epoch: 17 Global Step: 286640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:08:46,449-Speed 9329.80 samples/sec Loss 3.5461 LearningRate 0.0020 Epoch: 17 Global Step: 286650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:47,586-Speed 9010.78 samples/sec Loss 3.5648 LearningRate 0.0020 Epoch: 17 Global Step: 286660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:48,733-Speed 8938.39 samples/sec Loss 3.4407 LearningRate 0.0020 Epoch: 17 Global Step: 286670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:49,914-Speed 8672.75 samples/sec Loss 3.4862 LearningRate 0.0020 Epoch: 17 Global Step: 286680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:51,049-Speed 9028.07 samples/sec Loss 3.4433 LearningRate 0.0020 Epoch: 17 Global Step: 286690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:52,164-Speed 9193.84 samples/sec Loss 3.5657 LearningRate 0.0020 Epoch: 17 Global Step: 286700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:53,338-Speed 8726.89 samples/sec Loss 3.5959 LearningRate 0.0020 Epoch: 17 Global Step: 286710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:54,482-Speed 8957.26 samples/sec Loss 3.5139 LearningRate 0.0020 Epoch: 17 Global Step: 286720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:55,593-Speed 9220.36 samples/sec Loss 3.4875 LearningRate 0.0020 Epoch: 17 Global Step: 286730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:56,749-Speed 8863.59 samples/sec Loss 3.4533 LearningRate 0.0020 Epoch: 17 Global Step: 286740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:57,903-Speed 8882.91 samples/sec Loss 3.5369 LearningRate 0.0020 Epoch: 17 Global Step: 286750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:08:59,038-Speed 9021.91 samples/sec Loss 3.5538 LearningRate 0.0020 Epoch: 17 Global Step: 286760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:00,167-Speed 9080.33 samples/sec Loss 3.5359 LearningRate 0.0020 Epoch: 17 Global Step: 286770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:01,324-Speed 8849.94 samples/sec Loss 3.5507 LearningRate 0.0020 Epoch: 17 Global Step: 286780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:02,451-Speed 9094.80 samples/sec Loss 3.5725 LearningRate 0.0020 Epoch: 17 Global Step: 286790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:03,617-Speed 8783.58 samples/sec Loss 3.6002 LearningRate 0.0020 Epoch: 17 Global Step: 286800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:04,681-Speed 9628.82 samples/sec Loss 3.4754 LearningRate 0.0020 Epoch: 17 Global Step: 286810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:05,800-Speed 9155.22 samples/sec Loss 3.5303 LearningRate 0.0020 Epoch: 17 Global Step: 286820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:06,961-Speed 8828.99 samples/sec Loss 3.5063 LearningRate 0.0020 Epoch: 17 Global Step: 286830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:08,100-Speed 8993.34 samples/sec Loss 3.5792 LearningRate 0.0020 Epoch: 17 Global Step: 286840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:09,261-Speed 8823.57 samples/sec Loss 3.5952 LearningRate 0.0020 Epoch: 17 Global Step: 286850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:09:10,362-Speed 9306.66 samples/sec Loss 3.4918 LearningRate 0.0020 Epoch: 17 Global Step: 286860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:09:11,461-Speed 9323.74 samples/sec Loss 3.5824 LearningRate 0.0020 Epoch: 17 Global Step: 286870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:12,627-Speed 8788.66 samples/sec Loss 3.5155 LearningRate 0.0020 Epoch: 17 Global Step: 286880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:13,772-Speed 8952.55 samples/sec Loss 3.4996 LearningRate 0.0020 Epoch: 17 Global Step: 286890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:14,852-Speed 9489.20 samples/sec Loss 3.4209 LearningRate 0.0020 Epoch: 17 Global Step: 286900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:15,978-Speed 9093.90 samples/sec Loss 3.5120 LearningRate 0.0020 Epoch: 17 Global Step: 286910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:17,075-Speed 9344.49 samples/sec Loss 3.5454 LearningRate 0.0020 Epoch: 17 Global Step: 286920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:18,201-Speed 9098.57 samples/sec Loss 3.4595 LearningRate 0.0020 Epoch: 17 Global Step: 286930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:19,318-Speed 9175.04 samples/sec Loss 3.5366 LearningRate 0.0020 Epoch: 17 Global Step: 286940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:20,458-Speed 8986.47 samples/sec Loss 3.4223 LearningRate 0.0020 Epoch: 17 Global Step: 286950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:21,589-Speed 9060.90 samples/sec Loss 3.5173 LearningRate 0.0020 Epoch: 17 Global Step: 286960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:22,740-Speed 8902.56 samples/sec Loss 3.5625 LearningRate 0.0020 Epoch: 17 Global Step: 286970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:09:23,833-Speed 9371.84 samples/sec Loss 3.5504 LearningRate 0.0020 Epoch: 17 Global Step: 286980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:24,952-Speed 9161.16 samples/sec Loss 3.5438 LearningRate 0.0020 Epoch: 17 Global Step: 286990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:26,075-Speed 9116.94 samples/sec Loss 3.6042 LearningRate 0.0020 Epoch: 17 Global Step: 287000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:27,208-Speed 9048.23 samples/sec Loss 3.5473 LearningRate 0.0020 Epoch: 17 Global Step: 287010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:28,350-Speed 8978.58 samples/sec Loss 3.5090 LearningRate 0.0020 Epoch: 17 Global Step: 287020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:29,468-Speed 9164.85 samples/sec Loss 3.5018 LearningRate 0.0020 Epoch: 17 Global Step: 287030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:30,602-Speed 9034.82 samples/sec Loss 3.5538 LearningRate 0.0020 Epoch: 17 Global Step: 287040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:31,783-Speed 8675.18 samples/sec Loss 3.5516 LearningRate 0.0020 Epoch: 17 Global Step: 287050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:32,931-Speed 8920.24 samples/sec Loss 3.6219 LearningRate 0.0020 Epoch: 17 Global Step: 287060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:34,023-Speed 9385.30 samples/sec Loss 3.5398 LearningRate 0.0020 Epoch: 17 Global Step: 287070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:35,090-Speed 9608.68 samples/sec Loss 3.5315 LearningRate 0.0020 Epoch: 17 Global Step: 287080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:36,211-Speed 9133.62 samples/sec Loss 3.5085 LearningRate 0.0020 Epoch: 17 Global Step: 287090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:37,308-Speed 9343.80 samples/sec Loss 3.6252 LearningRate 0.0020 Epoch: 17 Global Step: 287100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:38,442-Speed 9031.81 samples/sec Loss 3.4333 LearningRate 0.0020 Epoch: 17 Global Step: 287110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:39,554-Speed 9213.97 samples/sec Loss 3.5998 LearningRate 0.0020 Epoch: 17 Global Step: 287120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:40,645-Speed 9396.32 samples/sec Loss 3.5595 LearningRate 0.0020 Epoch: 17 Global Step: 287130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:41,716-Speed 9562.90 samples/sec Loss 3.5832 LearningRate 0.0020 Epoch: 17 Global Step: 287140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:42,831-Speed 9188.49 samples/sec Loss 3.4951 LearningRate 0.0020 Epoch: 17 Global Step: 287150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:44,018-Speed 8637.43 samples/sec Loss 3.5449 LearningRate 0.0020 Epoch: 17 Global Step: 287160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:45,142-Speed 9108.81 samples/sec Loss 3.5313 LearningRate 0.0020 Epoch: 17 Global Step: 287170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:46,232-Speed 9400.97 samples/sec Loss 3.5235 LearningRate 0.0020 Epoch: 17 Global Step: 287180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:47,358-Speed 9103.58 samples/sec Loss 3.5858 LearningRate 0.0020 Epoch: 17 Global Step: 287190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:48,426-Speed 9587.87 samples/sec Loss 3.5514 LearningRate 0.0020 Epoch: 17 Global Step: 287200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:49,531-Speed 9275.67 samples/sec Loss 3.6494 LearningRate 0.0019 Epoch: 17 Global Step: 287210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:50,631-Speed 9314.83 samples/sec Loss 3.5385 LearningRate 0.0019 Epoch: 17 Global Step: 287220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:51,762-Speed 9061.99 samples/sec Loss 3.5323 LearningRate 0.0019 Epoch: 17 Global Step: 287230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:52,902-Speed 8991.11 samples/sec Loss 3.5470 LearningRate 0.0019 Epoch: 17 Global Step: 287240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:54,020-Speed 9165.66 samples/sec Loss 3.6139 LearningRate 0.0019 Epoch: 17 Global Step: 287250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:55,149-Speed 9076.63 samples/sec Loss 3.6014 LearningRate 0.0019 Epoch: 17 Global Step: 287260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:56,287-Speed 9004.59 samples/sec Loss 3.5381 LearningRate 0.0019 Epoch: 17 Global Step: 287270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:57,375-Speed 9415.22 samples/sec Loss 3.5867 LearningRate 0.0019 Epoch: 17 Global Step: 287280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:09:58,479-Speed 9284.28 samples/sec Loss 3.5112 LearningRate 0.0019 Epoch: 17 Global Step: 287290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:09:59,582-Speed 9286.92 samples/sec Loss 3.4966 LearningRate 0.0019 Epoch: 17 Global Step: 287300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:00,731-Speed 8921.65 samples/sec Loss 3.4608 LearningRate 0.0019 Epoch: 17 Global Step: 287310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:01,857-Speed 9098.40 samples/sec Loss 3.5100 LearningRate 0.0019 Epoch: 17 Global Step: 287320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:03,008-Speed 8898.79 samples/sec Loss 3.5271 LearningRate 0.0019 Epoch: 17 Global Step: 287330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:04,099-Speed 9388.34 samples/sec Loss 3.5556 LearningRate 0.0019 Epoch: 17 Global Step: 287340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:05,230-Speed 9057.68 samples/sec Loss 3.5224 LearningRate 0.0019 Epoch: 17 Global Step: 287350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:06,370-Speed 8992.91 samples/sec Loss 3.5130 LearningRate 0.0019 Epoch: 17 Global Step: 287360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:07,476-Speed 9259.13 samples/sec Loss 3.5675 LearningRate 0.0019 Epoch: 17 Global Step: 287370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:08,670-Speed 8584.20 samples/sec Loss 3.5671 LearningRate 0.0019 Epoch: 17 Global Step: 287380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:09,830-Speed 8834.27 samples/sec Loss 3.5387 LearningRate 0.0019 Epoch: 17 Global Step: 287390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:10,971-Speed 8984.31 samples/sec Loss 3.5222 LearningRate 0.0019 Epoch: 17 Global Step: 287400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:12,089-Speed 9167.53 samples/sec Loss 3.5033 LearningRate 0.0019 Epoch: 17 Global Step: 287410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:13,194-Speed 9270.52 samples/sec Loss 3.5355 LearningRate 0.0019 Epoch: 17 Global Step: 287420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:14,326-Speed 9053.37 samples/sec Loss 3.5349 LearningRate 0.0019 Epoch: 17 Global Step: 287430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:16,305-Speed 5174.49 samples/sec Loss 3.5886 LearningRate 0.0019 Epoch: 17 Global Step: 287440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:17,427-Speed 9132.18 samples/sec Loss 3.4906 LearningRate 0.0019 Epoch: 17 Global Step: 287450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:18,531-Speed 9283.13 samples/sec Loss 3.5395 LearningRate 0.0019 Epoch: 17 Global Step: 287460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:19,608-Speed 9509.67 samples/sec Loss 3.4607 LearningRate 0.0019 Epoch: 17 Global Step: 287470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:20,721-Speed 9201.55 samples/sec Loss 3.5886 LearningRate 0.0019 Epoch: 17 Global Step: 287480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:21,796-Speed 9538.22 samples/sec Loss 3.5298 LearningRate 0.0019 Epoch: 17 Global Step: 287490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:22,913-Speed 9174.77 samples/sec Loss 3.5922 LearningRate 0.0019 Epoch: 17 Global Step: 287500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:24,001-Speed 9415.14 samples/sec Loss 3.4990 LearningRate 0.0019 Epoch: 17 Global Step: 287510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:25,164-Speed 8805.73 samples/sec Loss 3.5562 LearningRate 0.0019 Epoch: 17 Global Step: 287520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:26,275-Speed 9226.31 samples/sec Loss 3.5119 LearningRate 0.0019 Epoch: 17 Global Step: 287530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:27,357-Speed 9474.02 samples/sec Loss 3.5672 LearningRate 0.0019 Epoch: 17 Global Step: 287540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:28,460-Speed 9289.79 samples/sec Loss 3.5028 LearningRate 0.0019 Epoch: 17 Global Step: 287550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:29,528-Speed 9595.54 samples/sec Loss 3.4663 LearningRate 0.0019 Epoch: 17 Global Step: 287560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:30,645-Speed 9168.22 samples/sec Loss 3.5773 LearningRate 0.0019 Epoch: 17 Global Step: 287570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:31,739-Speed 9372.16 samples/sec Loss 3.4658 LearningRate 0.0019 Epoch: 17 Global Step: 287580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:32,809-Speed 9573.03 samples/sec Loss 3.5415 LearningRate 0.0019 Epoch: 17 Global Step: 287590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:33,903-Speed 9363.98 samples/sec Loss 3.5218 LearningRate 0.0019 Epoch: 17 Global Step: 287600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:35,035-Speed 9052.81 samples/sec Loss 3.5477 LearningRate 0.0019 Epoch: 17 Global Step: 287610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:36,139-Speed 9278.22 samples/sec Loss 3.4977 LearningRate 0.0019 Epoch: 17 Global Step: 287620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:37,297-Speed 8851.10 samples/sec Loss 3.5608 LearningRate 0.0019 Epoch: 17 Global Step: 287630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:38,396-Speed 9320.24 samples/sec Loss 3.5013 LearningRate 0.0019 Epoch: 17 Global Step: 287640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:39,517-Speed 9140.91 samples/sec Loss 3.5600 LearningRate 0.0019 Epoch: 17 Global Step: 287650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:40,643-Speed 9103.64 samples/sec Loss 3.5958 LearningRate 0.0019 Epoch: 17 Global Step: 287660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:41,701-Speed 9678.13 samples/sec Loss 3.6269 LearningRate 0.0019 Epoch: 17 Global Step: 287670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:42,813-Speed 9218.55 samples/sec Loss 3.5164 LearningRate 0.0019 Epoch: 17 Global Step: 287680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:43,920-Speed 9251.67 samples/sec Loss 3.5347 LearningRate 0.0019 Epoch: 17 Global Step: 287690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:45,013-Speed 9376.70 samples/sec Loss 3.5657 LearningRate 0.0019 Epoch: 17 Global Step: 287700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:46,155-Speed 8971.70 samples/sec Loss 3.6128 LearningRate 0.0019 Epoch: 17 Global Step: 287710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:47,310-Speed 8869.42 samples/sec Loss 3.5953 LearningRate 0.0019 Epoch: 17 Global Step: 287720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:48,380-Speed 9581.28 samples/sec Loss 3.6039 LearningRate 0.0019 Epoch: 17 Global Step: 287730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:49,551-Speed 8744.19 samples/sec Loss 3.5251 LearningRate 0.0019 Epoch: 17 Global Step: 287740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:50,640-Speed 9405.41 samples/sec Loss 3.5105 LearningRate 0.0019 Epoch: 17 Global Step: 287750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:51,758-Speed 9174.08 samples/sec Loss 3.5490 LearningRate 0.0019 Epoch: 17 Global Step: 287760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:52,879-Speed 9136.13 samples/sec Loss 3.6169 LearningRate 0.0019 Epoch: 17 Global Step: 287770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:53,975-Speed 9352.82 samples/sec Loss 3.5821 LearningRate 0.0019 Epoch: 17 Global Step: 287780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:55,037-Speed 9644.99 samples/sec Loss 3.5529 LearningRate 0.0019 Epoch: 17 Global Step: 287790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:56,204-Speed 8782.05 samples/sec Loss 3.5658 LearningRate 0.0019 Epoch: 17 Global Step: 287800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:57,330-Speed 9097.23 samples/sec Loss 3.5313 LearningRate 0.0019 Epoch: 17 Global Step: 287810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:10:58,411-Speed 9475.35 samples/sec Loss 3.5567 LearningRate 0.0019 Epoch: 17 Global Step: 287820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:10:59,503-Speed 9387.52 samples/sec Loss 3.5254 LearningRate 0.0019 Epoch: 17 Global Step: 287830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:00,639-Speed 9019.46 samples/sec Loss 3.5764 LearningRate 0.0019 Epoch: 17 Global Step: 287840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:01,754-Speed 9184.25 samples/sec Loss 3.5921 LearningRate 0.0019 Epoch: 17 Global Step: 287850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:02,892-Speed 9009.94 samples/sec Loss 3.5905 LearningRate 0.0019 Epoch: 17 Global Step: 287860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:04,040-Speed 8919.99 samples/sec Loss 3.5607 LearningRate 0.0019 Epoch: 17 Global Step: 287870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:05,141-Speed 9306.26 samples/sec Loss 3.5557 LearningRate 0.0019 Epoch: 17 Global Step: 287880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:06,267-Speed 9106.77 samples/sec Loss 3.4839 LearningRate 0.0019 Epoch: 17 Global Step: 287890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:07,433-Speed 8782.12 samples/sec Loss 3.5244 LearningRate 0.0019 Epoch: 17 Global Step: 287900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:08,517-Speed 9451.95 samples/sec Loss 3.5922 LearningRate 0.0019 Epoch: 17 Global Step: 287910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:09,630-Speed 9204.53 samples/sec Loss 3.5617 LearningRate 0.0019 Epoch: 17 Global Step: 287920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:11:10,778-Speed 8930.77 samples/sec Loss 3.5829 LearningRate 0.0019 Epoch: 17 Global Step: 287930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:11:11,929-Speed 8898.51 samples/sec Loss 3.4251 LearningRate 0.0019 Epoch: 17 Global Step: 287940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:11:13,031-Speed 9297.93 samples/sec Loss 3.5077 LearningRate 0.0019 Epoch: 17 Global Step: 287950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:11:14,165-Speed 9033.46 samples/sec Loss 3.6234 LearningRate 0.0019 Epoch: 17 Global Step: 287960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:11:15,274-Speed 9241.60 samples/sec Loss 3.4985 LearningRate 0.0019 Epoch: 17 Global Step: 287970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:11:16,391-Speed 9171.61 samples/sec Loss 3.5985 LearningRate 0.0019 Epoch: 17 Global Step: 287980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:17,542-Speed 8900.52 samples/sec Loss 3.4991 LearningRate 0.0019 Epoch: 17 Global Step: 287990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:18,676-Speed 9036.64 samples/sec Loss 3.5530 LearningRate 0.0019 Epoch: 17 Global Step: 288000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:11:40,672-[lfw][288000]XNorm: 6.748821 Training: 2022-04-11 23:11:40,673-[lfw][288000]Accuracy-Flip: 0.99683+-0.00263 Training: 2022-04-11 23:11:40,673-[lfw][288000]Accuracy-Highest: 0.99733 Training: 2022-04-11 23:12:06,130-[cfp_fp][288000]XNorm: 5.885726 Training: 2022-04-11 23:12:06,131-[cfp_fp][288000]Accuracy-Flip: 0.97243+-0.00852 Training: 2022-04-11 23:12:06,132-[cfp_fp][288000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:12:28,060-[agedb_30][288000]XNorm: 6.571554 Training: 2022-04-11 23:12:28,061-[agedb_30][288000]Accuracy-Flip: 0.97417+-0.00827 Training: 2022-04-11 23:12:28,061-[agedb_30][288000]Accuracy-Highest: 0.97417 Training: 2022-04-11 23:12:29,175-Speed 145.25 samples/sec Loss 3.5005 LearningRate 0.0019 Epoch: 17 Global Step: 288010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:30,268-Speed 9376.69 samples/sec Loss 3.5566 LearningRate 0.0019 Epoch: 17 Global Step: 288020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:31,387-Speed 9153.95 samples/sec Loss 3.4939 LearningRate 0.0019 Epoch: 17 Global Step: 288030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:32,519-Speed 9053.89 samples/sec Loss 3.5262 LearningRate 0.0019 Epoch: 17 Global Step: 288040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:33,630-Speed 9220.43 samples/sec Loss 3.5642 LearningRate 0.0019 Epoch: 17 Global Step: 288050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:34,744-Speed 9197.90 samples/sec Loss 3.5688 LearningRate 0.0019 Epoch: 17 Global Step: 288060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:35,894-Speed 8911.93 samples/sec Loss 3.5588 LearningRate 0.0019 Epoch: 17 Global Step: 288070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:37,000-Speed 9259.47 samples/sec Loss 3.5739 LearningRate 0.0019 Epoch: 17 Global Step: 288080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:12:38,113-Speed 9207.95 samples/sec Loss 3.5641 LearningRate 0.0019 Epoch: 17 Global Step: 288090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:39,225-Speed 9210.53 samples/sec Loss 3.4952 LearningRate 0.0019 Epoch: 17 Global Step: 288100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:40,330-Speed 9268.92 samples/sec Loss 3.5894 LearningRate 0.0019 Epoch: 17 Global Step: 288110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:41,507-Speed 8709.93 samples/sec Loss 3.5397 LearningRate 0.0019 Epoch: 17 Global Step: 288120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:42,626-Speed 9157.79 samples/sec Loss 3.5755 LearningRate 0.0019 Epoch: 17 Global Step: 288130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:43,772-Speed 8938.82 samples/sec Loss 3.5731 LearningRate 0.0019 Epoch: 17 Global Step: 288140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:44,892-Speed 9147.74 samples/sec Loss 3.5120 LearningRate 0.0019 Epoch: 17 Global Step: 288150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:46,013-Speed 9137.29 samples/sec Loss 3.5527 LearningRate 0.0019 Epoch: 17 Global Step: 288160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:47,106-Speed 9373.83 samples/sec Loss 3.6225 LearningRate 0.0019 Epoch: 17 Global Step: 288170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:48,246-Speed 8986.12 samples/sec Loss 3.5032 LearningRate 0.0019 Epoch: 17 Global Step: 288180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:51,156-Speed 3519.27 samples/sec Loss 3.4417 LearningRate 0.0019 Epoch: 17 Global Step: 288190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:12:52,304-Speed 8928.22 samples/sec Loss 3.5833 LearningRate 0.0019 Epoch: 17 Global Step: 288200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:53,403-Speed 9326.92 samples/sec Loss 3.5666 LearningRate 0.0019 Epoch: 17 Global Step: 288210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:54,574-Speed 8750.93 samples/sec Loss 3.6178 LearningRate 0.0019 Epoch: 17 Global Step: 288220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:55,734-Speed 8833.13 samples/sec Loss 3.4728 LearningRate 0.0019 Epoch: 17 Global Step: 288230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:56,873-Speed 8994.75 samples/sec Loss 3.6022 LearningRate 0.0019 Epoch: 17 Global Step: 288240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:57,998-Speed 9108.69 samples/sec Loss 3.5398 LearningRate 0.0019 Epoch: 17 Global Step: 288250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:12:59,134-Speed 9019.28 samples/sec Loss 3.5364 LearningRate 0.0019 Epoch: 17 Global Step: 288260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:00,276-Speed 8973.51 samples/sec Loss 3.6283 LearningRate 0.0019 Epoch: 17 Global Step: 288270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:01,406-Speed 9070.45 samples/sec Loss 3.5852 LearningRate 0.0019 Epoch: 17 Global Step: 288280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:02,493-Speed 9422.87 samples/sec Loss 3.5793 LearningRate 0.0019 Epoch: 17 Global Step: 288290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:03,620-Speed 9091.17 samples/sec Loss 3.5290 LearningRate 0.0019 Epoch: 17 Global Step: 288300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:13:04,702-Speed 9463.10 samples/sec Loss 3.4333 LearningRate 0.0019 Epoch: 17 Global Step: 288310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:13:05,855-Speed 8889.69 samples/sec Loss 3.5263 LearningRate 0.0019 Epoch: 17 Global Step: 288320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:13:06,978-Speed 9120.54 samples/sec Loss 3.4556 LearningRate 0.0019 Epoch: 17 Global Step: 288330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:13:08,036-Speed 9688.84 samples/sec Loss 3.5563 LearningRate 0.0019 Epoch: 17 Global Step: 288340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:09,096-Speed 9667.88 samples/sec Loss 3.5529 LearningRate 0.0019 Epoch: 17 Global Step: 288350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:10,193-Speed 9335.50 samples/sec Loss 3.5088 LearningRate 0.0019 Epoch: 17 Global Step: 288360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:11,304-Speed 9228.69 samples/sec Loss 3.6329 LearningRate 0.0019 Epoch: 17 Global Step: 288370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:12,426-Speed 9131.26 samples/sec Loss 3.5232 LearningRate 0.0019 Epoch: 17 Global Step: 288380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:13,574-Speed 8928.56 samples/sec Loss 3.6280 LearningRate 0.0019 Epoch: 17 Global Step: 288390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:14,706-Speed 9048.77 samples/sec Loss 3.5403 LearningRate 0.0019 Epoch: 17 Global Step: 288400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:15,860-Speed 8878.52 samples/sec Loss 3.5555 LearningRate 0.0019 Epoch: 17 Global Step: 288410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:16,985-Speed 9108.27 samples/sec Loss 3.6223 LearningRate 0.0018 Epoch: 17 Global Step: 288420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:18,119-Speed 9027.25 samples/sec Loss 3.4854 LearningRate 0.0018 Epoch: 17 Global Step: 288430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:19,218-Speed 9327.38 samples/sec Loss 3.5646 LearningRate 0.0018 Epoch: 17 Global Step: 288440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:13:20,345-Speed 9090.63 samples/sec Loss 3.5979 LearningRate 0.0018 Epoch: 17 Global Step: 288450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:21,424-Speed 9495.70 samples/sec Loss 3.5419 LearningRate 0.0018 Epoch: 17 Global Step: 288460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:22,573-Speed 8916.27 samples/sec Loss 3.5124 LearningRate 0.0018 Epoch: 17 Global Step: 288470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:23,711-Speed 9003.02 samples/sec Loss 3.6683 LearningRate 0.0018 Epoch: 17 Global Step: 288480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:24,822-Speed 9223.12 samples/sec Loss 3.5562 LearningRate 0.0018 Epoch: 17 Global Step: 288490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:25,969-Speed 8933.07 samples/sec Loss 3.4890 LearningRate 0.0018 Epoch: 17 Global Step: 288500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:27,185-Speed 8428.32 samples/sec Loss 3.5706 LearningRate 0.0018 Epoch: 17 Global Step: 288510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:28,304-Speed 9159.67 samples/sec Loss 3.5439 LearningRate 0.0018 Epoch: 17 Global Step: 288520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:29,422-Speed 9160.35 samples/sec Loss 3.4984 LearningRate 0.0018 Epoch: 17 Global Step: 288530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:30,558-Speed 9019.11 samples/sec Loss 3.6017 LearningRate 0.0018 Epoch: 17 Global Step: 288540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:31,732-Speed 8726.49 samples/sec Loss 3.5871 LearningRate 0.0018 Epoch: 17 Global Step: 288550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:32,894-Speed 8821.29 samples/sec Loss 3.6069 LearningRate 0.0018 Epoch: 17 Global Step: 288560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:34,034-Speed 8985.00 samples/sec Loss 3.4803 LearningRate 0.0018 Epoch: 17 Global Step: 288570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:35,150-Speed 9184.67 samples/sec Loss 3.6474 LearningRate 0.0018 Epoch: 17 Global Step: 288580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:36,288-Speed 8996.83 samples/sec Loss 3.5885 LearningRate 0.0018 Epoch: 17 Global Step: 288590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:37,410-Speed 9131.54 samples/sec Loss 3.5472 LearningRate 0.0018 Epoch: 17 Global Step: 288600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:38,523-Speed 9211.12 samples/sec Loss 3.5767 LearningRate 0.0018 Epoch: 17 Global Step: 288610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:39,650-Speed 9092.02 samples/sec Loss 3.4749 LearningRate 0.0018 Epoch: 17 Global Step: 288620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:40,763-Speed 9204.90 samples/sec Loss 3.5882 LearningRate 0.0018 Epoch: 17 Global Step: 288630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:41,868-Speed 9270.84 samples/sec Loss 3.5775 LearningRate 0.0018 Epoch: 17 Global Step: 288640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:42,979-Speed 9219.87 samples/sec Loss 3.5616 LearningRate 0.0018 Epoch: 17 Global Step: 288650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:44,139-Speed 8832.34 samples/sec Loss 3.5357 LearningRate 0.0018 Epoch: 17 Global Step: 288660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:45,246-Speed 9256.35 samples/sec Loss 3.5784 LearningRate 0.0018 Epoch: 17 Global Step: 288670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:46,408-Speed 8819.78 samples/sec Loss 3.5722 LearningRate 0.0018 Epoch: 17 Global Step: 288680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:47,565-Speed 8854.76 samples/sec Loss 3.5641 LearningRate 0.0018 Epoch: 17 Global Step: 288690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:48,713-Speed 8926.34 samples/sec Loss 3.5338 LearningRate 0.0018 Epoch: 17 Global Step: 288700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:49,844-Speed 9058.03 samples/sec Loss 3.5475 LearningRate 0.0018 Epoch: 17 Global Step: 288710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:50,990-Speed 8938.95 samples/sec Loss 3.5919 LearningRate 0.0018 Epoch: 17 Global Step: 288720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:52,132-Speed 8983.21 samples/sec Loss 3.5206 LearningRate 0.0018 Epoch: 17 Global Step: 288730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:53,240-Speed 9242.68 samples/sec Loss 3.5820 LearningRate 0.0018 Epoch: 17 Global Step: 288740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:13:54,376-Speed 9019.37 samples/sec Loss 3.5791 LearningRate 0.0018 Epoch: 17 Global Step: 288750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:13:55,509-Speed 9042.70 samples/sec Loss 3.5342 LearningRate 0.0018 Epoch: 17 Global Step: 288760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:13:56,617-Speed 9245.72 samples/sec Loss 3.5609 LearningRate 0.0018 Epoch: 17 Global Step: 288770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:13:57,759-Speed 8972.62 samples/sec Loss 3.5608 LearningRate 0.0018 Epoch: 17 Global Step: 288780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:13:58,894-Speed 9027.05 samples/sec Loss 3.5276 LearningRate 0.0018 Epoch: 17 Global Step: 288790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:00,029-Speed 9026.18 samples/sec Loss 3.6002 LearningRate 0.0018 Epoch: 17 Global Step: 288800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:01,155-Speed 9103.71 samples/sec Loss 3.4992 LearningRate 0.0018 Epoch: 17 Global Step: 288810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:02,297-Speed 8969.32 samples/sec Loss 3.5064 LearningRate 0.0018 Epoch: 17 Global Step: 288820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:03,448-Speed 8902.23 samples/sec Loss 3.5531 LearningRate 0.0018 Epoch: 17 Global Step: 288830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:04,575-Speed 9090.18 samples/sec Loss 3.5879 LearningRate 0.0018 Epoch: 17 Global Step: 288840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:05,730-Speed 8870.26 samples/sec Loss 3.6340 LearningRate 0.0018 Epoch: 17 Global Step: 288850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:06,883-Speed 8886.92 samples/sec Loss 3.5148 LearningRate 0.0018 Epoch: 17 Global Step: 288860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:08,037-Speed 8878.67 samples/sec Loss 3.5134 LearningRate 0.0018 Epoch: 17 Global Step: 288870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:14:09,152-Speed 9190.83 samples/sec Loss 3.5275 LearningRate 0.0018 Epoch: 17 Global Step: 288880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:14:10,306-Speed 8874.08 samples/sec Loss 3.6696 LearningRate 0.0018 Epoch: 17 Global Step: 288890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:14:11,418-Speed 9218.40 samples/sec Loss 3.4944 LearningRate 0.0018 Epoch: 17 Global Step: 288900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:14:12,592-Speed 8730.38 samples/sec Loss 3.5262 LearningRate 0.0018 Epoch: 17 Global Step: 288910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:14:13,712-Speed 9141.55 samples/sec Loss 3.5127 LearningRate 0.0018 Epoch: 17 Global Step: 288920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:14:15,912-Speed 4657.41 samples/sec Loss 3.5658 LearningRate 0.0018 Epoch: 17 Global Step: 288930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:14:17,073-Speed 8822.09 samples/sec Loss 3.5972 LearningRate 0.0018 Epoch: 17 Global Step: 288940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:14:18,213-Speed 8987.74 samples/sec Loss 3.4678 LearningRate 0.0018 Epoch: 17 Global Step: 288950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:14:20,211-Speed 5127.14 samples/sec Loss 3.6476 LearningRate 0.0018 Epoch: 17 Global Step: 288960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:14:21,330-Speed 9157.02 samples/sec Loss 3.5519 LearningRate 0.0018 Epoch: 17 Global Step: 288970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:22,483-Speed 8887.91 samples/sec Loss 3.6042 LearningRate 0.0018 Epoch: 17 Global Step: 288980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:24,716-Speed 4587.95 samples/sec Loss 3.5821 LearningRate 0.0018 Epoch: 17 Global Step: 288990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:25,868-Speed 8897.09 samples/sec Loss 3.5239 LearningRate 0.0018 Epoch: 17 Global Step: 289000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:27,062-Speed 8581.78 samples/sec Loss 3.6516 LearningRate 0.0018 Epoch: 17 Global Step: 289010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:28,986-Speed 5323.78 samples/sec Loss 3.5559 LearningRate 0.0018 Epoch: 17 Global Step: 289020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:30,116-Speed 9067.72 samples/sec Loss 3.5054 LearningRate 0.0018 Epoch: 17 Global Step: 289030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:31,273-Speed 8853.90 samples/sec Loss 3.4421 LearningRate 0.0018 Epoch: 17 Global Step: 289040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:32,412-Speed 8999.16 samples/sec Loss 3.5839 LearningRate 0.0018 Epoch: 17 Global Step: 289050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:33,565-Speed 8883.97 samples/sec Loss 3.6543 LearningRate 0.0018 Epoch: 17 Global Step: 289060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:34,671-Speed 9267.94 samples/sec Loss 3.6315 LearningRate 0.0018 Epoch: 17 Global Step: 289070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:14:35,790-Speed 9149.16 samples/sec Loss 3.5545 LearningRate 0.0018 Epoch: 17 Global Step: 289080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:36,879-Speed 9407.35 samples/sec Loss 3.6254 LearningRate 0.0018 Epoch: 17 Global Step: 289090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:37,979-Speed 9319.77 samples/sec Loss 3.5459 LearningRate 0.0018 Epoch: 17 Global Step: 289100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:39,092-Speed 9201.94 samples/sec Loss 3.6030 LearningRate 0.0018 Epoch: 17 Global Step: 289110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:40,221-Speed 9077.36 samples/sec Loss 3.5479 LearningRate 0.0018 Epoch: 17 Global Step: 289120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:41,378-Speed 8860.41 samples/sec Loss 3.5210 LearningRate 0.0018 Epoch: 17 Global Step: 289130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:42,515-Speed 9009.61 samples/sec Loss 3.5823 LearningRate 0.0018 Epoch: 17 Global Step: 289140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:43,690-Speed 8719.07 samples/sec Loss 3.6237 LearningRate 0.0018 Epoch: 17 Global Step: 289150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:44,820-Speed 9068.91 samples/sec Loss 3.4451 LearningRate 0.0018 Epoch: 17 Global Step: 289160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:45,958-Speed 9004.41 samples/sec Loss 3.5709 LearningRate 0.0018 Epoch: 17 Global Step: 289170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:47,067-Speed 9237.49 samples/sec Loss 3.5191 LearningRate 0.0018 Epoch: 17 Global Step: 289180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:14:48,158-Speed 9390.28 samples/sec Loss 3.5466 LearningRate 0.0018 Epoch: 17 Global Step: 289190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:14:49,286-Speed 9078.03 samples/sec Loss 3.5280 LearningRate 0.0018 Epoch: 17 Global Step: 289200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:14:50,386-Speed 9314.61 samples/sec Loss 3.5523 LearningRate 0.0018 Epoch: 17 Global Step: 289210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:51,480-Speed 9368.87 samples/sec Loss 3.6042 LearningRate 0.0018 Epoch: 17 Global Step: 289220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:52,619-Speed 9001.10 samples/sec Loss 3.5409 LearningRate 0.0018 Epoch: 17 Global Step: 289230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:53,763-Speed 8950.80 samples/sec Loss 3.5603 LearningRate 0.0018 Epoch: 17 Global Step: 289240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:54,894-Speed 9061.17 samples/sec Loss 3.5225 LearningRate 0.0018 Epoch: 17 Global Step: 289250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:56,040-Speed 8943.85 samples/sec Loss 3.5235 LearningRate 0.0018 Epoch: 17 Global Step: 289260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:57,183-Speed 8968.48 samples/sec Loss 3.5682 LearningRate 0.0018 Epoch: 17 Global Step: 289270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:58,320-Speed 9008.72 samples/sec Loss 3.5932 LearningRate 0.0018 Epoch: 17 Global Step: 289280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:14:59,458-Speed 9003.55 samples/sec Loss 3.5885 LearningRate 0.0018 Epoch: 17 Global Step: 289290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:00,590-Speed 9048.97 samples/sec Loss 3.4905 LearningRate 0.0018 Epoch: 17 Global Step: 289300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:01,692-Speed 9294.55 samples/sec Loss 3.5757 LearningRate 0.0018 Epoch: 17 Global Step: 289310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:02,804-Speed 9220.75 samples/sec Loss 3.5401 LearningRate 0.0018 Epoch: 17 Global Step: 289320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:03,903-Speed 9329.67 samples/sec Loss 3.5812 LearningRate 0.0018 Epoch: 17 Global Step: 289330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:05,013-Speed 9227.84 samples/sec Loss 3.5280 LearningRate 0.0018 Epoch: 17 Global Step: 289340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:06,123-Speed 9229.73 samples/sec Loss 3.6086 LearningRate 0.0018 Epoch: 17 Global Step: 289350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:07,205-Speed 9471.44 samples/sec Loss 3.5608 LearningRate 0.0018 Epoch: 17 Global Step: 289360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:08,385-Speed 8682.03 samples/sec Loss 3.6021 LearningRate 0.0018 Epoch: 17 Global Step: 289370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:09,535-Speed 8908.12 samples/sec Loss 3.5596 LearningRate 0.0018 Epoch: 17 Global Step: 289380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:10,653-Speed 9160.20 samples/sec Loss 3.6021 LearningRate 0.0018 Epoch: 17 Global Step: 289390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:11,823-Speed 8757.20 samples/sec Loss 3.6335 LearningRate 0.0018 Epoch: 17 Global Step: 289400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:12,974-Speed 8902.64 samples/sec Loss 3.6105 LearningRate 0.0018 Epoch: 17 Global Step: 289410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:14,082-Speed 9247.07 samples/sec Loss 3.5810 LearningRate 0.0018 Epoch: 17 Global Step: 289420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:15,220-Speed 9004.52 samples/sec Loss 3.5579 LearningRate 0.0018 Epoch: 17 Global Step: 289430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:16,329-Speed 9244.63 samples/sec Loss 3.5882 LearningRate 0.0018 Epoch: 17 Global Step: 289440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:17,460-Speed 9059.41 samples/sec Loss 3.6027 LearningRate 0.0018 Epoch: 17 Global Step: 289450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:18,616-Speed 8865.55 samples/sec Loss 3.4980 LearningRate 0.0018 Epoch: 17 Global Step: 289460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:19,720-Speed 9273.54 samples/sec Loss 3.5324 LearningRate 0.0018 Epoch: 17 Global Step: 289470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:20,787-Speed 9609.85 samples/sec Loss 3.5848 LearningRate 0.0018 Epoch: 17 Global Step: 289480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:21,964-Speed 8705.35 samples/sec Loss 3.5681 LearningRate 0.0018 Epoch: 17 Global Step: 289490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:23,083-Speed 9150.33 samples/sec Loss 3.5754 LearningRate 0.0018 Epoch: 17 Global Step: 289500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:24,250-Speed 8779.38 samples/sec Loss 3.6105 LearningRate 0.0018 Epoch: 17 Global Step: 289510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:25,411-Speed 8829.43 samples/sec Loss 3.5464 LearningRate 0.0018 Epoch: 17 Global Step: 289520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:26,537-Speed 9105.92 samples/sec Loss 3.5868 LearningRate 0.0018 Epoch: 17 Global Step: 289530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:27,672-Speed 9022.10 samples/sec Loss 3.5534 LearningRate 0.0018 Epoch: 17 Global Step: 289540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:28,789-Speed 9178.10 samples/sec Loss 3.6045 LearningRate 0.0018 Epoch: 17 Global Step: 289550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:29,922-Speed 9043.40 samples/sec Loss 3.5681 LearningRate 0.0018 Epoch: 17 Global Step: 289560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:31,056-Speed 9031.54 samples/sec Loss 3.5992 LearningRate 0.0018 Epoch: 17 Global Step: 289570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:32,206-Speed 8907.35 samples/sec Loss 3.4784 LearningRate 0.0018 Epoch: 17 Global Step: 289580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:33,367-Speed 8827.42 samples/sec Loss 3.5475 LearningRate 0.0018 Epoch: 17 Global Step: 289590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:34,489-Speed 9136.30 samples/sec Loss 3.5009 LearningRate 0.0018 Epoch: 17 Global Step: 289600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:35,615-Speed 9094.28 samples/sec Loss 3.5804 LearningRate 0.0018 Epoch: 17 Global Step: 289610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:36,753-Speed 9004.66 samples/sec Loss 3.5989 LearningRate 0.0018 Epoch: 17 Global Step: 289620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:37,852-Speed 9324.89 samples/sec Loss 3.5472 LearningRate 0.0018 Epoch: 17 Global Step: 289630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:39,104-Speed 8183.20 samples/sec Loss 3.5728 LearningRate 0.0018 Epoch: 17 Global Step: 289640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:40,207-Speed 9288.97 samples/sec Loss 3.5647 LearningRate 0.0018 Epoch: 17 Global Step: 289650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:41,396-Speed 8617.59 samples/sec Loss 3.6183 LearningRate 0.0017 Epoch: 17 Global Step: 289660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:42,526-Speed 9064.42 samples/sec Loss 3.5617 LearningRate 0.0017 Epoch: 17 Global Step: 289670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:43,672-Speed 8937.85 samples/sec Loss 3.5922 LearningRate 0.0017 Epoch: 17 Global Step: 289680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:44,776-Speed 9285.15 samples/sec Loss 3.5936 LearningRate 0.0017 Epoch: 17 Global Step: 289690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:45,881-Speed 9274.14 samples/sec Loss 3.5274 LearningRate 0.0017 Epoch: 17 Global Step: 289700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:47,021-Speed 8986.04 samples/sec Loss 3.5829 LearningRate 0.0017 Epoch: 17 Global Step: 289710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:48,231-Speed 8466.45 samples/sec Loss 3.5598 LearningRate 0.0017 Epoch: 17 Global Step: 289720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:49,361-Speed 9067.72 samples/sec Loss 3.6018 LearningRate 0.0017 Epoch: 17 Global Step: 289730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:50,472-Speed 9229.34 samples/sec Loss 3.5474 LearningRate 0.0017 Epoch: 17 Global Step: 289740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:51,582-Speed 9229.23 samples/sec Loss 3.6186 LearningRate 0.0017 Epoch: 17 Global Step: 289750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:52,674-Speed 9381.00 samples/sec Loss 3.6162 LearningRate 0.0017 Epoch: 17 Global Step: 289760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:53,815-Speed 8976.41 samples/sec Loss 3.6136 LearningRate 0.0017 Epoch: 17 Global Step: 289770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:54,892-Speed 9517.52 samples/sec Loss 3.6018 LearningRate 0.0017 Epoch: 17 Global Step: 289780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:56,036-Speed 8952.16 samples/sec Loss 3.5720 LearningRate 0.0017 Epoch: 17 Global Step: 289790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:15:57,167-Speed 9064.27 samples/sec Loss 3.5211 LearningRate 0.0017 Epoch: 17 Global Step: 289800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:58,283-Speed 9187.41 samples/sec Loss 3.6158 LearningRate 0.0017 Epoch: 17 Global Step: 289810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:15:59,422-Speed 8992.19 samples/sec Loss 3.5976 LearningRate 0.0017 Epoch: 17 Global Step: 289820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:16:00,554-Speed 9052.05 samples/sec Loss 3.5716 LearningRate 0.0017 Epoch: 17 Global Step: 289830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:16:01,682-Speed 9079.43 samples/sec Loss 3.4911 LearningRate 0.0017 Epoch: 17 Global Step: 289840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:02,803-Speed 9141.32 samples/sec Loss 3.6016 LearningRate 0.0017 Epoch: 17 Global Step: 289850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:03,938-Speed 9032.45 samples/sec Loss 3.4997 LearningRate 0.0017 Epoch: 17 Global Step: 289860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:05,094-Speed 8859.56 samples/sec Loss 3.5787 LearningRate 0.0017 Epoch: 17 Global Step: 289870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:06,194-Speed 9319.51 samples/sec Loss 3.5386 LearningRate 0.0017 Epoch: 17 Global Step: 289880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:07,338-Speed 8950.02 samples/sec Loss 3.5469 LearningRate 0.0017 Epoch: 17 Global Step: 289890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:08,437-Speed 9330.97 samples/sec Loss 3.5506 LearningRate 0.0017 Epoch: 17 Global Step: 289900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:09,568-Speed 9055.76 samples/sec Loss 3.5844 LearningRate 0.0017 Epoch: 17 Global Step: 289910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:10,691-Speed 9122.72 samples/sec Loss 3.5112 LearningRate 0.0017 Epoch: 17 Global Step: 289920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:11,861-Speed 8757.37 samples/sec Loss 3.6260 LearningRate 0.0017 Epoch: 17 Global Step: 289930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:12,949-Speed 9416.68 samples/sec Loss 3.5668 LearningRate 0.0017 Epoch: 17 Global Step: 289940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:14,110-Speed 8824.14 samples/sec Loss 3.5764 LearningRate 0.0017 Epoch: 17 Global Step: 289950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:15,204-Speed 9368.82 samples/sec Loss 3.5369 LearningRate 0.0017 Epoch: 17 Global Step: 289960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:16,328-Speed 9117.01 samples/sec Loss 3.6219 LearningRate 0.0017 Epoch: 17 Global Step: 289970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:17,469-Speed 8980.43 samples/sec Loss 3.5500 LearningRate 0.0017 Epoch: 17 Global Step: 289980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:18,601-Speed 9054.13 samples/sec Loss 3.5787 LearningRate 0.0017 Epoch: 17 Global Step: 289990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:19,778-Speed 8704.99 samples/sec Loss 3.6273 LearningRate 0.0017 Epoch: 17 Global Step: 290000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:16:41,938-[lfw][290000]XNorm: 6.694873 Training: 2022-04-11 23:16:41,939-[lfw][290000]Accuracy-Flip: 0.99667+-0.00289 Training: 2022-04-11 23:16:41,939-[lfw][290000]Accuracy-Highest: 0.99733 Training: 2022-04-11 23:17:07,587-[cfp_fp][290000]XNorm: 5.841547 Training: 2022-04-11 23:17:07,588-[cfp_fp][290000]Accuracy-Flip: 0.97257+-0.00885 Training: 2022-04-11 23:17:07,588-[cfp_fp][290000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:17:29,700-[agedb_30][290000]XNorm: 6.521216 Training: 2022-04-11 23:17:29,700-[agedb_30][290000]Accuracy-Flip: 0.97050+-0.00931 Training: 2022-04-11 23:17:29,701-[agedb_30][290000]Accuracy-Highest: 0.97417 Training: 2022-04-11 23:17:30,802-Speed 144.18 samples/sec Loss 3.5535 LearningRate 0.0017 Epoch: 17 Global Step: 290010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:31,917-Speed 9183.48 samples/sec Loss 3.6246 LearningRate 0.0017 Epoch: 17 Global Step: 290020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:33,047-Speed 9069.04 samples/sec Loss 3.5733 LearningRate 0.0017 Epoch: 17 Global Step: 290030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:34,171-Speed 9115.84 samples/sec Loss 3.5779 LearningRate 0.0017 Epoch: 17 Global Step: 290040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:17:35,279-Speed 9246.62 samples/sec Loss 3.5380 LearningRate 0.0017 Epoch: 17 Global Step: 290050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:17:36,347-Speed 9591.21 samples/sec Loss 3.5807 LearningRate 0.0017 Epoch: 17 Global Step: 290060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:37,457-Speed 9237.94 samples/sec Loss 3.5892 LearningRate 0.0017 Epoch: 17 Global Step: 290070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:38,610-Speed 8881.58 samples/sec Loss 3.5305 LearningRate 0.0017 Epoch: 17 Global Step: 290080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:39,745-Speed 9027.43 samples/sec Loss 3.5621 LearningRate 0.0017 Epoch: 17 Global Step: 290090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:40,881-Speed 9022.59 samples/sec Loss 3.5577 LearningRate 0.0017 Epoch: 17 Global Step: 290100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:42,010-Speed 9078.53 samples/sec Loss 3.5466 LearningRate 0.0017 Epoch: 17 Global Step: 290110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:43,165-Speed 8868.36 samples/sec Loss 3.6631 LearningRate 0.0017 Epoch: 17 Global Step: 290120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:44,274-Speed 9240.23 samples/sec Loss 3.5547 LearningRate 0.0017 Epoch: 17 Global Step: 290130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:45,413-Speed 8998.66 samples/sec Loss 3.6177 LearningRate 0.0017 Epoch: 17 Global Step: 290140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:46,528-Speed 9191.10 samples/sec Loss 3.5878 LearningRate 0.0017 Epoch: 17 Global Step: 290150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:47,664-Speed 9016.05 samples/sec Loss 3.5554 LearningRate 0.0017 Epoch: 17 Global Step: 290160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:17:48,771-Speed 9255.13 samples/sec Loss 3.5942 LearningRate 0.0017 Epoch: 17 Global Step: 290170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:17:49,889-Speed 9170.34 samples/sec Loss 3.6135 LearningRate 0.0017 Epoch: 17 Global Step: 290180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:17:50,984-Speed 9357.00 samples/sec Loss 3.5697 LearningRate 0.0017 Epoch: 17 Global Step: 290190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:52,117-Speed 9038.30 samples/sec Loss 3.5513 LearningRate 0.0017 Epoch: 17 Global Step: 290200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:53,216-Speed 9322.24 samples/sec Loss 3.5494 LearningRate 0.0017 Epoch: 17 Global Step: 290210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:54,333-Speed 9172.49 samples/sec Loss 3.6253 LearningRate 0.0017 Epoch: 17 Global Step: 290220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:55,452-Speed 9160.82 samples/sec Loss 3.6060 LearningRate 0.0017 Epoch: 17 Global Step: 290230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:56,553-Speed 9303.92 samples/sec Loss 3.6745 LearningRate 0.0017 Epoch: 17 Global Step: 290240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:57,685-Speed 9051.48 samples/sec Loss 3.6491 LearningRate 0.0017 Epoch: 17 Global Step: 290250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:58,789-Speed 9278.92 samples/sec Loss 3.5752 LearningRate 0.0017 Epoch: 17 Global Step: 290260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:17:59,923-Speed 9033.91 samples/sec Loss 3.5374 LearningRate 0.0017 Epoch: 17 Global Step: 290270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:01,069-Speed 8939.25 samples/sec Loss 3.5821 LearningRate 0.0017 Epoch: 17 Global Step: 290280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:02,191-Speed 9133.19 samples/sec Loss 3.5482 LearningRate 0.0017 Epoch: 17 Global Step: 290290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:03,298-Speed 9252.15 samples/sec Loss 3.5978 LearningRate 0.0017 Epoch: 17 Global Step: 290300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:04,429-Speed 9064.28 samples/sec Loss 3.5563 LearningRate 0.0017 Epoch: 17 Global Step: 290310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:05,587-Speed 8851.42 samples/sec Loss 3.6069 LearningRate 0.0017 Epoch: 17 Global Step: 290320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:06,732-Speed 8943.88 samples/sec Loss 3.5397 LearningRate 0.0017 Epoch: 17 Global Step: 290330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:07,812-Speed 9490.82 samples/sec Loss 3.5634 LearningRate 0.0017 Epoch: 17 Global Step: 290340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:08,909-Speed 9341.56 samples/sec Loss 3.5532 LearningRate 0.0017 Epoch: 17 Global Step: 290350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:10,026-Speed 9171.04 samples/sec Loss 3.5234 LearningRate 0.0017 Epoch: 17 Global Step: 290360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:11,172-Speed 8938.41 samples/sec Loss 3.6600 LearningRate 0.0017 Epoch: 17 Global Step: 290370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:12,288-Speed 9183.65 samples/sec Loss 3.6173 LearningRate 0.0017 Epoch: 17 Global Step: 290380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:13,393-Speed 9272.63 samples/sec Loss 3.5145 LearningRate 0.0017 Epoch: 17 Global Step: 290390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:14,534-Speed 8980.91 samples/sec Loss 3.5088 LearningRate 0.0017 Epoch: 17 Global Step: 290400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:15,688-Speed 8875.85 samples/sec Loss 3.5627 LearningRate 0.0017 Epoch: 17 Global Step: 290410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:16,788-Speed 9316.78 samples/sec Loss 3.6464 LearningRate 0.0017 Epoch: 17 Global Step: 290420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:17,959-Speed 8751.59 samples/sec Loss 3.6755 LearningRate 0.0017 Epoch: 17 Global Step: 290430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:19,078-Speed 9154.04 samples/sec Loss 3.5415 LearningRate 0.0017 Epoch: 17 Global Step: 290440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:20,273-Speed 8573.33 samples/sec Loss 3.5610 LearningRate 0.0017 Epoch: 17 Global Step: 290450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:21,443-Speed 8760.11 samples/sec Loss 3.4857 LearningRate 0.0017 Epoch: 17 Global Step: 290460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:22,599-Speed 8865.22 samples/sec Loss 3.5427 LearningRate 0.0017 Epoch: 17 Global Step: 290470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:23,707-Speed 9246.90 samples/sec Loss 3.6089 LearningRate 0.0017 Epoch: 17 Global Step: 290480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:24,806-Speed 9322.89 samples/sec Loss 3.5816 LearningRate 0.0017 Epoch: 17 Global Step: 290490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:18:25,899-Speed 9371.57 samples/sec Loss 3.6728 LearningRate 0.0017 Epoch: 17 Global Step: 290500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:27,061-Speed 8824.28 samples/sec Loss 3.6197 LearningRate 0.0017 Epoch: 17 Global Step: 290510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:18:28,202-Speed 8976.23 samples/sec Loss 3.6307 LearningRate 0.0017 Epoch: 17 Global Step: 290520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:18:29,302-Speed 9313.54 samples/sec Loss 3.5202 LearningRate 0.0017 Epoch: 17 Global Step: 290530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:18:30,405-Speed 9292.37 samples/sec Loss 3.6254 LearningRate 0.0017 Epoch: 17 Global Step: 290540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:18:31,545-Speed 8986.04 samples/sec Loss 3.6348 LearningRate 0.0017 Epoch: 17 Global Step: 290550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:18:32,670-Speed 9105.87 samples/sec Loss 3.5763 LearningRate 0.0017 Epoch: 17 Global Step: 290560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:18:33,770-Speed 9322.49 samples/sec Loss 3.5426 LearningRate 0.0017 Epoch: 17 Global Step: 290570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:18:34,876-Speed 9259.54 samples/sec Loss 3.5827 LearningRate 0.0017 Epoch: 17 Global Step: 290580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:18:35,978-Speed 9297.87 samples/sec Loss 3.5428 LearningRate 0.0017 Epoch: 17 Global Step: 290590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:18:37,170-Speed 8599.26 samples/sec Loss 3.6033 LearningRate 0.0017 Epoch: 17 Global Step: 290600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:18:38,310-Speed 8984.71 samples/sec Loss 3.5560 LearningRate 0.0017 Epoch: 17 Global Step: 290610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:39,473-Speed 8811.98 samples/sec Loss 3.6403 LearningRate 0.0017 Epoch: 17 Global Step: 290620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:40,624-Speed 8899.51 samples/sec Loss 3.5307 LearningRate 0.0017 Epoch: 17 Global Step: 290630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:41,797-Speed 8733.17 samples/sec Loss 3.5413 LearningRate 0.0017 Epoch: 17 Global Step: 290640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:42,963-Speed 8789.22 samples/sec Loss 3.6521 LearningRate 0.0017 Epoch: 17 Global Step: 290650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:44,123-Speed 8839.48 samples/sec Loss 3.4396 LearningRate 0.0017 Epoch: 17 Global Step: 290660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:45,253-Speed 9061.97 samples/sec Loss 3.5046 LearningRate 0.0017 Epoch: 17 Global Step: 290670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:46,362-Speed 9247.63 samples/sec Loss 3.5486 LearningRate 0.0017 Epoch: 17 Global Step: 290680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:47,484-Speed 9129.94 samples/sec Loss 3.5632 LearningRate 0.0017 Epoch: 17 Global Step: 290690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:48,623-Speed 8989.09 samples/sec Loss 3.5453 LearningRate 0.0017 Epoch: 17 Global Step: 290700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:49,753-Speed 9070.23 samples/sec Loss 3.5409 LearningRate 0.0017 Epoch: 17 Global Step: 290710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:18:50,929-Speed 8710.77 samples/sec Loss 3.5180 LearningRate 0.0017 Epoch: 17 Global Step: 290720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:18:52,084-Speed 8869.53 samples/sec Loss 3.6496 LearningRate 0.0017 Epoch: 17 Global Step: 290730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:53,152-Speed 9594.05 samples/sec Loss 3.6104 LearningRate 0.0017 Epoch: 17 Global Step: 290740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:54,307-Speed 8871.15 samples/sec Loss 3.6173 LearningRate 0.0017 Epoch: 17 Global Step: 290750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:55,492-Speed 8647.83 samples/sec Loss 3.5854 LearningRate 0.0017 Epoch: 17 Global Step: 290760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:56,617-Speed 9109.52 samples/sec Loss 3.5458 LearningRate 0.0017 Epoch: 17 Global Step: 290770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:57,791-Speed 8729.67 samples/sec Loss 3.5441 LearningRate 0.0017 Epoch: 17 Global Step: 290780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:18:58,920-Speed 9071.21 samples/sec Loss 3.5902 LearningRate 0.0017 Epoch: 17 Global Step: 290790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:00,049-Speed 9079.02 samples/sec Loss 3.5848 LearningRate 0.0017 Epoch: 17 Global Step: 290800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:01,160-Speed 9227.13 samples/sec Loss 3.5624 LearningRate 0.0017 Epoch: 17 Global Step: 290810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:02,305-Speed 8947.22 samples/sec Loss 3.5832 LearningRate 0.0017 Epoch: 17 Global Step: 290820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:03,415-Speed 9230.38 samples/sec Loss 3.6171 LearningRate 0.0017 Epoch: 17 Global Step: 290830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:04,531-Speed 9189.00 samples/sec Loss 3.6372 LearningRate 0.0017 Epoch: 17 Global Step: 290840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:05,606-Speed 9533.64 samples/sec Loss 3.5964 LearningRate 0.0017 Epoch: 17 Global Step: 290850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:06,713-Speed 9248.87 samples/sec Loss 3.5274 LearningRate 0.0017 Epoch: 17 Global Step: 290860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:07,845-Speed 9051.82 samples/sec Loss 3.5169 LearningRate 0.0017 Epoch: 17 Global Step: 290870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:08,962-Speed 9177.24 samples/sec Loss 3.5445 LearningRate 0.0017 Epoch: 17 Global Step: 290880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:10,065-Speed 9281.99 samples/sec Loss 3.6187 LearningRate 0.0017 Epoch: 17 Global Step: 290890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:11,202-Speed 9016.40 samples/sec Loss 3.5547 LearningRate 0.0017 Epoch: 17 Global Step: 290900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:12,326-Speed 9109.95 samples/sec Loss 3.6111 LearningRate 0.0017 Epoch: 17 Global Step: 290910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:13,455-Speed 9079.09 samples/sec Loss 3.5310 LearningRate 0.0017 Epoch: 17 Global Step: 290920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:14,567-Speed 9215.00 samples/sec Loss 3.6917 LearningRate 0.0017 Epoch: 17 Global Step: 290930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:15,686-Speed 9154.71 samples/sec Loss 3.5294 LearningRate 0.0017 Epoch: 17 Global Step: 290940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:16,831-Speed 8952.33 samples/sec Loss 3.5913 LearningRate 0.0016 Epoch: 17 Global Step: 290950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:17,965-Speed 9033.67 samples/sec Loss 3.5834 LearningRate 0.0016 Epoch: 17 Global Step: 290960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:19,115-Speed 8906.51 samples/sec Loss 3.5443 LearningRate 0.0016 Epoch: 17 Global Step: 290970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:20,311-Speed 8565.90 samples/sec Loss 3.5171 LearningRate 0.0016 Epoch: 17 Global Step: 290980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:21,477-Speed 8789.64 samples/sec Loss 3.5372 LearningRate 0.0016 Epoch: 17 Global Step: 290990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:22,562-Speed 9439.74 samples/sec Loss 3.5781 LearningRate 0.0016 Epoch: 17 Global Step: 291000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:23,734-Speed 8743.34 samples/sec Loss 3.5642 LearningRate 0.0016 Epoch: 17 Global Step: 291010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:24,877-Speed 8962.59 samples/sec Loss 3.5361 LearningRate 0.0016 Epoch: 17 Global Step: 291020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:26,016-Speed 8999.96 samples/sec Loss 3.5905 LearningRate 0.0016 Epoch: 17 Global Step: 291030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:19:27,132-Speed 9186.47 samples/sec Loss 3.6136 LearningRate 0.0016 Epoch: 17 Global Step: 291040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:19:28,243-Speed 9219.41 samples/sec Loss 3.5154 LearningRate 0.0016 Epoch: 17 Global Step: 291050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:29,406-Speed 8810.91 samples/sec Loss 3.6191 LearningRate 0.0016 Epoch: 17 Global Step: 291060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:30,542-Speed 9014.83 samples/sec Loss 3.5701 LearningRate 0.0016 Epoch: 17 Global Step: 291070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:31,641-Speed 9320.19 samples/sec Loss 3.6044 LearningRate 0.0016 Epoch: 17 Global Step: 291080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:32,746-Speed 9273.84 samples/sec Loss 3.5330 LearningRate 0.0016 Epoch: 17 Global Step: 291090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:33,890-Speed 8963.40 samples/sec Loss 3.5150 LearningRate 0.0016 Epoch: 17 Global Step: 291100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:35,031-Speed 8978.59 samples/sec Loss 3.4376 LearningRate 0.0016 Epoch: 17 Global Step: 291110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:36,108-Speed 9512.28 samples/sec Loss 3.5437 LearningRate 0.0016 Epoch: 17 Global Step: 291120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:37,219-Speed 9220.90 samples/sec Loss 3.5562 LearningRate 0.0016 Epoch: 17 Global Step: 291130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:38,366-Speed 8935.76 samples/sec Loss 3.5920 LearningRate 0.0016 Epoch: 17 Global Step: 291140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:39,554-Speed 8626.48 samples/sec Loss 3.5527 LearningRate 0.0016 Epoch: 17 Global Step: 291150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:19:40,645-Speed 9388.89 samples/sec Loss 3.5748 LearningRate 0.0016 Epoch: 17 Global Step: 291160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:41,762-Speed 9172.56 samples/sec Loss 3.5452 LearningRate 0.0016 Epoch: 17 Global Step: 291170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:42,895-Speed 9047.22 samples/sec Loss 3.5946 LearningRate 0.0016 Epoch: 17 Global Step: 291180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:44,026-Speed 9052.66 samples/sec Loss 3.5140 LearningRate 0.0016 Epoch: 17 Global Step: 291190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:45,215-Speed 8618.36 samples/sec Loss 3.6288 LearningRate 0.0016 Epoch: 17 Global Step: 291200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:46,341-Speed 9101.37 samples/sec Loss 3.5977 LearningRate 0.0016 Epoch: 17 Global Step: 291210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:47,447-Speed 9262.65 samples/sec Loss 3.5961 LearningRate 0.0016 Epoch: 17 Global Step: 291220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:48,601-Speed 8881.92 samples/sec Loss 3.6004 LearningRate 0.0016 Epoch: 17 Global Step: 291230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:49,718-Speed 9174.35 samples/sec Loss 3.5890 LearningRate 0.0016 Epoch: 17 Global Step: 291240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:50,847-Speed 9070.98 samples/sec Loss 3.5453 LearningRate 0.0016 Epoch: 17 Global Step: 291250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:51,971-Speed 9112.83 samples/sec Loss 3.5922 LearningRate 0.0016 Epoch: 17 Global Step: 291260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:19:53,097-Speed 9104.42 samples/sec Loss 3.5514 LearningRate 0.0016 Epoch: 17 Global Step: 291270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:19:54,261-Speed 8800.19 samples/sec Loss 3.5788 LearningRate 0.0016 Epoch: 17 Global Step: 291280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:55,356-Speed 9350.36 samples/sec Loss 3.6730 LearningRate 0.0016 Epoch: 17 Global Step: 291290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:56,479-Speed 9128.39 samples/sec Loss 3.5654 LearningRate 0.0016 Epoch: 17 Global Step: 291300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:57,603-Speed 9122.27 samples/sec Loss 3.5346 LearningRate 0.0016 Epoch: 17 Global Step: 291310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:58,705-Speed 9297.83 samples/sec Loss 3.5624 LearningRate 0.0016 Epoch: 17 Global Step: 291320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:19:59,858-Speed 8882.37 samples/sec Loss 3.5574 LearningRate 0.0016 Epoch: 17 Global Step: 291330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:01,030-Speed 8745.33 samples/sec Loss 3.5468 LearningRate 0.0016 Epoch: 17 Global Step: 291340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:02,151-Speed 9136.91 samples/sec Loss 3.5545 LearningRate 0.0016 Epoch: 17 Global Step: 291350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:03,289-Speed 9004.93 samples/sec Loss 3.6086 LearningRate 0.0016 Epoch: 17 Global Step: 291360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:04,416-Speed 9094.48 samples/sec Loss 3.5288 LearningRate 0.0016 Epoch: 17 Global Step: 291370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:05,502-Speed 9433.88 samples/sec Loss 3.6269 LearningRate 0.0016 Epoch: 17 Global Step: 291380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:20:06,612-Speed 9229.19 samples/sec Loss 3.5843 LearningRate 0.0016 Epoch: 17 Global Step: 291390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:07,705-Speed 9377.46 samples/sec Loss 3.5385 LearningRate 0.0016 Epoch: 17 Global Step: 291400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:08,836-Speed 9054.49 samples/sec Loss 3.5593 LearningRate 0.0016 Epoch: 17 Global Step: 291410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:09,964-Speed 9086.90 samples/sec Loss 3.5631 LearningRate 0.0016 Epoch: 17 Global Step: 291420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:11,120-Speed 8858.47 samples/sec Loss 3.5964 LearningRate 0.0016 Epoch: 17 Global Step: 291430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:12,262-Speed 8973.37 samples/sec Loss 3.6106 LearningRate 0.0016 Epoch: 17 Global Step: 291440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:13,475-Speed 8447.20 samples/sec Loss 3.5657 LearningRate 0.0016 Epoch: 17 Global Step: 291450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:14,606-Speed 9059.93 samples/sec Loss 3.6189 LearningRate 0.0016 Epoch: 17 Global Step: 291460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:15,730-Speed 9113.17 samples/sec Loss 3.5684 LearningRate 0.0016 Epoch: 17 Global Step: 291470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:16,887-Speed 8864.51 samples/sec Loss 3.5653 LearningRate 0.0016 Epoch: 17 Global Step: 291480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:18,063-Speed 8710.36 samples/sec Loss 3.6504 LearningRate 0.0016 Epoch: 17 Global Step: 291490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:20:19,155-Speed 9378.63 samples/sec Loss 3.5588 LearningRate 0.0016 Epoch: 17 Global Step: 291500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:20,305-Speed 8910.04 samples/sec Loss 3.5642 LearningRate 0.0016 Epoch: 17 Global Step: 291510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:21,441-Speed 9020.47 samples/sec Loss 3.5799 LearningRate 0.0016 Epoch: 17 Global Step: 291520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:22,562-Speed 9144.45 samples/sec Loss 3.5502 LearningRate 0.0016 Epoch: 17 Global Step: 291530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:23,693-Speed 9054.44 samples/sec Loss 3.6658 LearningRate 0.0016 Epoch: 17 Global Step: 291540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:24,834-Speed 8982.06 samples/sec Loss 3.6280 LearningRate 0.0016 Epoch: 17 Global Step: 291550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:25,934-Speed 9311.95 samples/sec Loss 3.7194 LearningRate 0.0016 Epoch: 17 Global Step: 291560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:27,079-Speed 8951.50 samples/sec Loss 3.4827 LearningRate 0.0016 Epoch: 17 Global Step: 291570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:28,258-Speed 8691.38 samples/sec Loss 3.6632 LearningRate 0.0016 Epoch: 17 Global Step: 291580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:29,378-Speed 9151.33 samples/sec Loss 3.5624 LearningRate 0.0016 Epoch: 17 Global Step: 291590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:30,481-Speed 9282.66 samples/sec Loss 3.5212 LearningRate 0.0016 Epoch: 17 Global Step: 291600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:20:31,583-Speed 9296.15 samples/sec Loss 3.6366 LearningRate 0.0016 Epoch: 17 Global Step: 291610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:32,729-Speed 8943.02 samples/sec Loss 3.5390 LearningRate 0.0016 Epoch: 17 Global Step: 291620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:33,876-Speed 8941.09 samples/sec Loss 3.6013 LearningRate 0.0016 Epoch: 17 Global Step: 291630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:35,025-Speed 8914.29 samples/sec Loss 3.5400 LearningRate 0.0016 Epoch: 17 Global Step: 291640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:36,183-Speed 8849.36 samples/sec Loss 3.6145 LearningRate 0.0016 Epoch: 17 Global Step: 291650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:37,320-Speed 9008.21 samples/sec Loss 3.5810 LearningRate 0.0016 Epoch: 17 Global Step: 291660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:38,432-Speed 9218.58 samples/sec Loss 3.5766 LearningRate 0.0016 Epoch: 17 Global Step: 291670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:39,568-Speed 9011.03 samples/sec Loss 3.5956 LearningRate 0.0016 Epoch: 17 Global Step: 291680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:40,664-Speed 9356.04 samples/sec Loss 3.5571 LearningRate 0.0016 Epoch: 17 Global Step: 291690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:41,746-Speed 9463.32 samples/sec Loss 3.5471 LearningRate 0.0016 Epoch: 17 Global Step: 291700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:42,852-Speed 9265.86 samples/sec Loss 3.7030 LearningRate 0.0016 Epoch: 17 Global Step: 291710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:20:43,988-Speed 9015.54 samples/sec Loss 3.6558 LearningRate 0.0016 Epoch: 17 Global Step: 291720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:20:45,058-Speed 9575.32 samples/sec Loss 3.5829 LearningRate 0.0016 Epoch: 17 Global Step: 291730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:46,183-Speed 9112.35 samples/sec Loss 3.5553 LearningRate 0.0016 Epoch: 17 Global Step: 291740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:47,340-Speed 8860.98 samples/sec Loss 3.6493 LearningRate 0.0016 Epoch: 17 Global Step: 291750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:48,469-Speed 9070.25 samples/sec Loss 3.5246 LearningRate 0.0016 Epoch: 17 Global Step: 291760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:49,567-Speed 9334.25 samples/sec Loss 3.5335 LearningRate 0.0016 Epoch: 17 Global Step: 291770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:50,658-Speed 9386.00 samples/sec Loss 3.5312 LearningRate 0.0016 Epoch: 17 Global Step: 291780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:20:51,776-Speed 9165.48 samples/sec Loss 3.5767 LearningRate 0.0016 Epoch: 17 Global Step: 291790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:20:52,888-Speed 9217.25 samples/sec Loss 3.6238 LearningRate 0.0016 Epoch: 17 Global Step: 291800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:20:54,100-Speed 8449.89 samples/sec Loss 3.5489 LearningRate 0.0016 Epoch: 17 Global Step: 291810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:20:55,221-Speed 9144.56 samples/sec Loss 3.5426 LearningRate 0.0016 Epoch: 17 Global Step: 291820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:20:56,389-Speed 8768.05 samples/sec Loss 3.5791 LearningRate 0.0016 Epoch: 17 Global Step: 291830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:20:57,511-Speed 9134.16 samples/sec Loss 3.6477 LearningRate 0.0016 Epoch: 17 Global Step: 291840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:20:58,667-Speed 8866.65 samples/sec Loss 3.5573 LearningRate 0.0016 Epoch: 17 Global Step: 291850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:20:59,800-Speed 9040.76 samples/sec Loss 3.6644 LearningRate 0.0016 Epoch: 17 Global Step: 291860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:21:00,904-Speed 9280.85 samples/sec Loss 3.5894 LearningRate 0.0016 Epoch: 17 Global Step: 291870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:21:02,003-Speed 9326.49 samples/sec Loss 3.6174 LearningRate 0.0016 Epoch: 17 Global Step: 291880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:21:03,120-Speed 9169.51 samples/sec Loss 3.6221 LearningRate 0.0016 Epoch: 17 Global Step: 291890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:21:04,217-Speed 9344.64 samples/sec Loss 3.5744 LearningRate 0.0016 Epoch: 17 Global Step: 291900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:21:05,365-Speed 8923.32 samples/sec Loss 3.5073 LearningRate 0.0016 Epoch: 17 Global Step: 291910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:21:06,510-Speed 8943.49 samples/sec Loss 3.6098 LearningRate 0.0016 Epoch: 17 Global Step: 291920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:21:07,654-Speed 8960.82 samples/sec Loss 3.6011 LearningRate 0.0016 Epoch: 17 Global Step: 291930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:21:08,762-Speed 9243.15 samples/sec Loss 3.5216 LearningRate 0.0016 Epoch: 17 Global Step: 291940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:21:09,889-Speed 9093.26 samples/sec Loss 3.5389 LearningRate 0.0016 Epoch: 17 Global Step: 291950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:21:11,025-Speed 9021.18 samples/sec Loss 3.5402 LearningRate 0.0016 Epoch: 17 Global Step: 291960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:21:12,138-Speed 9212.77 samples/sec Loss 3.5270 LearningRate 0.0016 Epoch: 17 Global Step: 291970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:21:13,266-Speed 9077.75 samples/sec Loss 3.6037 LearningRate 0.0016 Epoch: 17 Global Step: 291980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:21:14,442-Speed 8716.87 samples/sec Loss 3.6674 LearningRate 0.0016 Epoch: 17 Global Step: 291990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:21:15,534-Speed 9377.59 samples/sec Loss 3.4707 LearningRate 0.0016 Epoch: 17 Global Step: 292000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:21:37,471-[lfw][292000]XNorm: 6.713849 Training: 2022-04-11 23:21:37,472-[lfw][292000]Accuracy-Flip: 0.99717+-0.00299 Training: 2022-04-11 23:21:37,472-[lfw][292000]Accuracy-Highest: 0.99733 Training: 2022-04-11 23:22:02,822-[cfp_fp][292000]XNorm: 5.849598 Training: 2022-04-11 23:22:02,823-[cfp_fp][292000]Accuracy-Flip: 0.97314+-0.00845 Training: 2022-04-11 23:22:02,823-[cfp_fp][292000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:22:24,760-[agedb_30][292000]XNorm: 6.526053 Training: 2022-04-11 23:22:24,760-[agedb_30][292000]Accuracy-Flip: 0.97300+-0.00859 Training: 2022-04-11 23:22:24,760-[agedb_30][292000]Accuracy-Highest: 0.97417 Training: 2022-04-11 23:22:25,863-Speed 145.60 samples/sec Loss 3.5518 LearningRate 0.0016 Epoch: 17 Global Step: 292010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:26,992-Speed 9075.84 samples/sec Loss 3.5565 LearningRate 0.0016 Epoch: 17 Global Step: 292020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:28,120-Speed 9089.73 samples/sec Loss 3.6113 LearningRate 0.0016 Epoch: 17 Global Step: 292030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:29,261-Speed 8977.95 samples/sec Loss 3.6589 LearningRate 0.0016 Epoch: 17 Global Step: 292040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:30,334-Speed 9546.77 samples/sec Loss 3.5877 LearningRate 0.0016 Epoch: 17 Global Step: 292050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:31,429-Speed 9357.37 samples/sec Loss 3.6744 LearningRate 0.0016 Epoch: 17 Global Step: 292060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:32,574-Speed 8953.52 samples/sec Loss 3.5929 LearningRate 0.0016 Epoch: 17 Global Step: 292070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:33,735-Speed 8825.31 samples/sec Loss 3.6835 LearningRate 0.0016 Epoch: 17 Global Step: 292080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:22:34,855-Speed 9144.35 samples/sec Loss 3.6174 LearningRate 0.0016 Epoch: 17 Global Step: 292090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:22:35,989-Speed 9035.43 samples/sec Loss 3.5440 LearningRate 0.0016 Epoch: 17 Global Step: 292100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:22:37,115-Speed 9101.82 samples/sec Loss 3.5765 LearningRate 0.0016 Epoch: 17 Global Step: 292110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:22:38,235-Speed 9147.68 samples/sec Loss 3.5756 LearningRate 0.0016 Epoch: 17 Global Step: 292120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:22:39,373-Speed 9003.21 samples/sec Loss 3.5941 LearningRate 0.0016 Epoch: 17 Global Step: 292130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:22:40,540-Speed 8781.73 samples/sec Loss 3.6422 LearningRate 0.0016 Epoch: 17 Global Step: 292140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:22:41,665-Speed 9108.17 samples/sec Loss 3.6388 LearningRate 0.0016 Epoch: 17 Global Step: 292150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:22:42,767-Speed 9296.94 samples/sec Loss 3.6539 LearningRate 0.0016 Epoch: 17 Global Step: 292160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:22:43,896-Speed 9073.14 samples/sec Loss 3.6163 LearningRate 0.0016 Epoch: 17 Global Step: 292170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:22:44,992-Speed 9347.09 samples/sec Loss 3.5855 LearningRate 0.0016 Epoch: 17 Global Step: 292180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:46,085-Speed 9372.75 samples/sec Loss 3.5646 LearningRate 0.0016 Epoch: 17 Global Step: 292190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:47,196-Speed 9229.33 samples/sec Loss 3.5609 LearningRate 0.0016 Epoch: 17 Global Step: 292200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:48,319-Speed 9123.48 samples/sec Loss 3.5652 LearningRate 0.0016 Epoch: 17 Global Step: 292210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:49,426-Speed 9250.85 samples/sec Loss 3.5982 LearningRate 0.0016 Epoch: 17 Global Step: 292220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:50,566-Speed 8991.08 samples/sec Loss 3.5614 LearningRate 0.0016 Epoch: 17 Global Step: 292230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:51,710-Speed 8957.15 samples/sec Loss 3.6314 LearningRate 0.0016 Epoch: 17 Global Step: 292240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:52,844-Speed 9032.67 samples/sec Loss 3.6098 LearningRate 0.0016 Epoch: 17 Global Step: 292250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:53,955-Speed 9217.06 samples/sec Loss 3.6026 LearningRate 0.0015 Epoch: 17 Global Step: 292260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:55,100-Speed 8950.64 samples/sec Loss 3.5710 LearningRate 0.0015 Epoch: 17 Global Step: 292270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:56,224-Speed 9122.01 samples/sec Loss 3.6568 LearningRate 0.0015 Epoch: 17 Global Step: 292280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:22:57,341-Speed 9171.65 samples/sec Loss 3.6319 LearningRate 0.0015 Epoch: 17 Global Step: 292290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:58,434-Speed 9373.10 samples/sec Loss 3.6415 LearningRate 0.0015 Epoch: 17 Global Step: 292300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:22:59,532-Speed 9335.93 samples/sec Loss 3.5771 LearningRate 0.0015 Epoch: 17 Global Step: 292310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:00,623-Speed 9390.74 samples/sec Loss 3.5981 LearningRate 0.0015 Epoch: 17 Global Step: 292320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:01,765-Speed 8970.41 samples/sec Loss 3.6005 LearningRate 0.0015 Epoch: 17 Global Step: 292330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:02,877-Speed 9210.22 samples/sec Loss 3.5590 LearningRate 0.0015 Epoch: 17 Global Step: 292340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:04,023-Speed 8944.55 samples/sec Loss 3.5985 LearningRate 0.0015 Epoch: 17 Global Step: 292350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:05,128-Speed 9274.89 samples/sec Loss 3.5958 LearningRate 0.0015 Epoch: 17 Global Step: 292360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:06,249-Speed 9135.18 samples/sec Loss 3.6468 LearningRate 0.0015 Epoch: 17 Global Step: 292370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:07,380-Speed 9061.68 samples/sec Loss 3.5687 LearningRate 0.0015 Epoch: 17 Global Step: 292380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:08,540-Speed 8829.87 samples/sec Loss 3.6056 LearningRate 0.0015 Epoch: 17 Global Step: 292390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:09,650-Speed 9231.84 samples/sec Loss 3.5389 LearningRate 0.0015 Epoch: 17 Global Step: 292400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:10,758-Speed 9249.13 samples/sec Loss 3.5520 LearningRate 0.0015 Epoch: 17 Global Step: 292410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:11,913-Speed 8873.17 samples/sec Loss 3.6107 LearningRate 0.0015 Epoch: 17 Global Step: 292420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:13,048-Speed 9026.83 samples/sec Loss 3.5594 LearningRate 0.0015 Epoch: 17 Global Step: 292430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:14,194-Speed 8942.89 samples/sec Loss 3.5492 LearningRate 0.0015 Epoch: 17 Global Step: 292440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:15,354-Speed 8830.43 samples/sec Loss 3.5225 LearningRate 0.0015 Epoch: 17 Global Step: 292450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:16,472-Speed 9164.77 samples/sec Loss 3.6119 LearningRate 0.0015 Epoch: 17 Global Step: 292460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:17,609-Speed 9012.10 samples/sec Loss 3.6281 LearningRate 0.0015 Epoch: 17 Global Step: 292470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:18,747-Speed 9004.08 samples/sec Loss 3.5933 LearningRate 0.0015 Epoch: 17 Global Step: 292480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:19,885-Speed 9007.19 samples/sec Loss 3.6087 LearningRate 0.0015 Epoch: 17 Global Step: 292490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:20,971-Speed 9433.37 samples/sec Loss 3.6283 LearningRate 0.0015 Epoch: 17 Global Step: 292500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:22,089-Speed 9161.88 samples/sec Loss 3.6398 LearningRate 0.0015 Epoch: 17 Global Step: 292510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:23,214-Speed 9102.24 samples/sec Loss 3.4980 LearningRate 0.0015 Epoch: 17 Global Step: 292520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:24,320-Speed 9267.83 samples/sec Loss 3.5453 LearningRate 0.0015 Epoch: 17 Global Step: 292530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:25,470-Speed 8911.31 samples/sec Loss 3.5647 LearningRate 0.0015 Epoch: 17 Global Step: 292540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:26,598-Speed 9075.40 samples/sec Loss 3.6174 LearningRate 0.0015 Epoch: 17 Global Step: 292550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:27,717-Speed 9160.44 samples/sec Loss 3.5804 LearningRate 0.0015 Epoch: 17 Global Step: 292560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:28,843-Speed 9097.28 samples/sec Loss 3.6580 LearningRate 0.0015 Epoch: 17 Global Step: 292570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:29,974-Speed 9059.67 samples/sec Loss 3.6745 LearningRate 0.0015 Epoch: 17 Global Step: 292580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:31,121-Speed 8939.40 samples/sec Loss 3.6046 LearningRate 0.0015 Epoch: 17 Global Step: 292590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:32,304-Speed 8662.00 samples/sec Loss 3.6147 LearningRate 0.0015 Epoch: 17 Global Step: 292600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:33,428-Speed 9115.71 samples/sec Loss 3.6179 LearningRate 0.0015 Epoch: 17 Global Step: 292610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:34,552-Speed 9119.08 samples/sec Loss 3.6203 LearningRate 0.0015 Epoch: 17 Global Step: 292620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:35,680-Speed 9081.22 samples/sec Loss 3.7232 LearningRate 0.0015 Epoch: 17 Global Step: 292630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:36,819-Speed 8995.27 samples/sec Loss 3.4914 LearningRate 0.0015 Epoch: 17 Global Step: 292640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:37,923-Speed 9278.49 samples/sec Loss 3.6209 LearningRate 0.0015 Epoch: 17 Global Step: 292650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:39,069-Speed 8944.61 samples/sec Loss 3.6362 LearningRate 0.0015 Epoch: 17 Global Step: 292660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:40,216-Speed 8929.48 samples/sec Loss 3.5735 LearningRate 0.0015 Epoch: 17 Global Step: 292670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:41,309-Speed 9372.85 samples/sec Loss 3.5742 LearningRate 0.0015 Epoch: 17 Global Step: 292680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:42,409-Speed 9320.29 samples/sec Loss 3.5978 LearningRate 0.0015 Epoch: 17 Global Step: 292690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:43,508-Speed 9319.66 samples/sec Loss 3.4981 LearningRate 0.0015 Epoch: 17 Global Step: 292700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:44,642-Speed 9033.93 samples/sec Loss 3.6553 LearningRate 0.0015 Epoch: 17 Global Step: 292710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:45,772-Speed 9063.92 samples/sec Loss 3.5302 LearningRate 0.0015 Epoch: 17 Global Step: 292720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:46,890-Speed 9170.16 samples/sec Loss 3.5972 LearningRate 0.0015 Epoch: 17 Global Step: 292730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:48,022-Speed 9046.59 samples/sec Loss 3.5812 LearningRate 0.0015 Epoch: 17 Global Step: 292740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:49,161-Speed 8999.83 samples/sec Loss 3.6006 LearningRate 0.0015 Epoch: 17 Global Step: 292750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:50,338-Speed 8706.24 samples/sec Loss 3.5781 LearningRate 0.0015 Epoch: 17 Global Step: 292760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:51,465-Speed 9093.94 samples/sec Loss 3.6201 LearningRate 0.0015 Epoch: 17 Global Step: 292770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:52,546-Speed 9472.05 samples/sec Loss 3.5717 LearningRate 0.0015 Epoch: 17 Global Step: 292780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:53,678-Speed 9053.01 samples/sec Loss 3.4791 LearningRate 0.0015 Epoch: 17 Global Step: 292790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:23:54,846-Speed 8768.18 samples/sec Loss 3.5628 LearningRate 0.0015 Epoch: 17 Global Step: 292800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:55,926-Speed 9489.82 samples/sec Loss 3.5987 LearningRate 0.0015 Epoch: 17 Global Step: 292810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:57,053-Speed 9088.70 samples/sec Loss 3.5986 LearningRate 0.0015 Epoch: 17 Global Step: 292820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:58,212-Speed 8849.04 samples/sec Loss 3.5768 LearningRate 0.0015 Epoch: 17 Global Step: 292830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:23:59,365-Speed 8884.02 samples/sec Loss 3.6610 LearningRate 0.0015 Epoch: 17 Global Step: 292840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:00,510-Speed 8948.79 samples/sec Loss 3.5925 LearningRate 0.0015 Epoch: 17 Global Step: 292850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:01,626-Speed 9177.66 samples/sec Loss 3.5231 LearningRate 0.0015 Epoch: 17 Global Step: 292860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:02,775-Speed 8916.65 samples/sec Loss 3.5606 LearningRate 0.0015 Epoch: 17 Global Step: 292870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:03,888-Speed 9206.00 samples/sec Loss 3.6073 LearningRate 0.0015 Epoch: 17 Global Step: 292880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:05,026-Speed 9006.52 samples/sec Loss 3.6125 LearningRate 0.0015 Epoch: 17 Global Step: 292890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:06,158-Speed 9053.87 samples/sec Loss 3.6022 LearningRate 0.0015 Epoch: 17 Global Step: 292900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:07,358-Speed 8538.52 samples/sec Loss 3.5835 LearningRate 0.0015 Epoch: 17 Global Step: 292910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:08,483-Speed 9109.32 samples/sec Loss 3.6719 LearningRate 0.0015 Epoch: 17 Global Step: 292920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:09,631-Speed 8928.25 samples/sec Loss 3.5948 LearningRate 0.0015 Epoch: 17 Global Step: 292930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:10,768-Speed 9010.68 samples/sec Loss 3.5401 LearningRate 0.0015 Epoch: 17 Global Step: 292940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:11,920-Speed 8894.68 samples/sec Loss 3.6335 LearningRate 0.0015 Epoch: 17 Global Step: 292950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:13,071-Speed 8896.64 samples/sec Loss 3.6842 LearningRate 0.0015 Epoch: 17 Global Step: 292960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:14,177-Speed 9268.00 samples/sec Loss 3.7259 LearningRate 0.0015 Epoch: 17 Global Step: 292970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:15,299-Speed 9129.18 samples/sec Loss 3.5491 LearningRate 0.0015 Epoch: 17 Global Step: 292980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:16,376-Speed 9515.19 samples/sec Loss 3.5628 LearningRate 0.0015 Epoch: 17 Global Step: 292990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:17,452-Speed 9526.00 samples/sec Loss 3.5818 LearningRate 0.0015 Epoch: 17 Global Step: 293000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:24:18,532-Speed 9477.78 samples/sec Loss 3.5717 LearningRate 0.0015 Epoch: 17 Global Step: 293010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:19,617-Speed 9447.37 samples/sec Loss 3.5374 LearningRate 0.0015 Epoch: 17 Global Step: 293020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:20,779-Speed 8816.54 samples/sec Loss 3.6331 LearningRate 0.0015 Epoch: 17 Global Step: 293030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:21,891-Speed 9214.25 samples/sec Loss 3.6111 LearningRate 0.0015 Epoch: 17 Global Step: 293040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:22,982-Speed 9392.41 samples/sec Loss 3.6154 LearningRate 0.0015 Epoch: 17 Global Step: 293050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:24,096-Speed 9192.97 samples/sec Loss 3.6368 LearningRate 0.0015 Epoch: 17 Global Step: 293060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:25,239-Speed 8965.11 samples/sec Loss 3.6646 LearningRate 0.0015 Epoch: 17 Global Step: 293070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:26,358-Speed 9156.19 samples/sec Loss 3.5270 LearningRate 0.0015 Epoch: 17 Global Step: 293080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:27,452-Speed 9370.90 samples/sec Loss 3.5048 LearningRate 0.0015 Epoch: 17 Global Step: 293090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:28,606-Speed 8876.19 samples/sec Loss 3.6374 LearningRate 0.0015 Epoch: 17 Global Step: 293100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:29,717-Speed 9225.33 samples/sec Loss 3.6489 LearningRate 0.0015 Epoch: 17 Global Step: 293110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:24:30,853-Speed 9018.80 samples/sec Loss 3.5571 LearningRate 0.0015 Epoch: 17 Global Step: 293120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:24:31,950-Speed 9343.36 samples/sec Loss 3.5917 LearningRate 0.0015 Epoch: 17 Global Step: 293130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:24:33,086-Speed 9014.96 samples/sec Loss 3.5398 LearningRate 0.0015 Epoch: 17 Global Step: 293140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:24:34,184-Speed 9334.36 samples/sec Loss 3.6733 LearningRate 0.0015 Epoch: 17 Global Step: 293150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:24:35,309-Speed 9104.78 samples/sec Loss 3.5507 LearningRate 0.0015 Epoch: 17 Global Step: 293160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:24:36,410-Speed 9313.22 samples/sec Loss 3.6248 LearningRate 0.0015 Epoch: 17 Global Step: 293170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:24:37,588-Speed 8690.68 samples/sec Loss 3.6390 LearningRate 0.0015 Epoch: 17 Global Step: 293180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:24:38,739-Speed 8902.01 samples/sec Loss 3.5644 LearningRate 0.0015 Epoch: 17 Global Step: 293190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:24:39,834-Speed 9359.53 samples/sec Loss 3.5886 LearningRate 0.0015 Epoch: 17 Global Step: 293200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:24:40,932-Speed 9328.51 samples/sec Loss 3.6822 LearningRate 0.0015 Epoch: 17 Global Step: 293210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:24:42,013-Speed 9480.34 samples/sec Loss 3.5772 LearningRate 0.0015 Epoch: 17 Global Step: 293220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:24:43,176-Speed 8807.33 samples/sec Loss 3.5020 LearningRate 0.0015 Epoch: 17 Global Step: 293230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:44,335-Speed 8843.23 samples/sec Loss 3.5604 LearningRate 0.0015 Epoch: 17 Global Step: 293240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:45,418-Speed 9456.78 samples/sec Loss 3.5601 LearningRate 0.0015 Epoch: 17 Global Step: 293250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:46,557-Speed 9000.43 samples/sec Loss 3.6361 LearningRate 0.0015 Epoch: 17 Global Step: 293260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:47,665-Speed 9248.22 samples/sec Loss 3.6230 LearningRate 0.0015 Epoch: 17 Global Step: 293270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:48,805-Speed 8989.77 samples/sec Loss 3.5789 LearningRate 0.0015 Epoch: 17 Global Step: 293280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:49,973-Speed 8769.27 samples/sec Loss 3.5829 LearningRate 0.0015 Epoch: 17 Global Step: 293290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:51,096-Speed 9126.42 samples/sec Loss 3.5653 LearningRate 0.0015 Epoch: 17 Global Step: 293300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:52,246-Speed 8912.24 samples/sec Loss 3.5614 LearningRate 0.0015 Epoch: 17 Global Step: 293310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:53,358-Speed 9211.02 samples/sec Loss 3.5948 LearningRate 0.0015 Epoch: 17 Global Step: 293320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:54,514-Speed 8864.42 samples/sec Loss 3.5544 LearningRate 0.0015 Epoch: 17 Global Step: 293330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:24:55,670-Speed 8863.40 samples/sec Loss 3.6229 LearningRate 0.0015 Epoch: 17 Global Step: 293340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:24:56,817-Speed 8927.28 samples/sec Loss 3.5770 LearningRate 0.0015 Epoch: 17 Global Step: 293350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:57,969-Speed 8900.28 samples/sec Loss 3.6541 LearningRate 0.0015 Epoch: 17 Global Step: 293360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:24:59,149-Speed 8677.78 samples/sec Loss 3.5694 LearningRate 0.0015 Epoch: 17 Global Step: 293370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:00,241-Speed 9387.91 samples/sec Loss 3.6173 LearningRate 0.0015 Epoch: 17 Global Step: 293380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:01,349-Speed 9245.86 samples/sec Loss 3.5136 LearningRate 0.0015 Epoch: 17 Global Step: 293390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:02,468-Speed 9154.39 samples/sec Loss 3.6321 LearningRate 0.0015 Epoch: 17 Global Step: 293400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:03,608-Speed 8987.15 samples/sec Loss 3.6284 LearningRate 0.0015 Epoch: 17 Global Step: 293410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:04,732-Speed 9118.56 samples/sec Loss 3.6161 LearningRate 0.0015 Epoch: 17 Global Step: 293420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:05,928-Speed 8570.76 samples/sec Loss 3.6236 LearningRate 0.0015 Epoch: 17 Global Step: 293430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:07,074-Speed 8942.29 samples/sec Loss 3.6016 LearningRate 0.0015 Epoch: 17 Global Step: 293440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:08,189-Speed 9185.40 samples/sec Loss 3.5572 LearningRate 0.0015 Epoch: 17 Global Step: 293450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:25:09,300-Speed 9220.05 samples/sec Loss 3.6272 LearningRate 0.0015 Epoch: 17 Global Step: 293460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:10,408-Speed 9244.53 samples/sec Loss 3.5806 LearningRate 0.0015 Epoch: 17 Global Step: 293470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:11,535-Speed 9089.95 samples/sec Loss 3.6428 LearningRate 0.0015 Epoch: 17 Global Step: 293480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:12,673-Speed 9006.78 samples/sec Loss 3.5599 LearningRate 0.0015 Epoch: 17 Global Step: 293490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:13,815-Speed 8971.71 samples/sec Loss 3.5548 LearningRate 0.0015 Epoch: 17 Global Step: 293500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:14,926-Speed 9223.06 samples/sec Loss 3.6591 LearningRate 0.0015 Epoch: 17 Global Step: 293510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:16,090-Speed 8799.83 samples/sec Loss 3.6485 LearningRate 0.0015 Epoch: 17 Global Step: 293520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:17,188-Speed 9338.58 samples/sec Loss 3.5937 LearningRate 0.0015 Epoch: 17 Global Step: 293530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:18,286-Speed 9330.10 samples/sec Loss 3.5876 LearningRate 0.0015 Epoch: 17 Global Step: 293540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:19,430-Speed 8955.70 samples/sec Loss 3.6794 LearningRate 0.0015 Epoch: 17 Global Step: 293550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:20,601-Speed 8751.43 samples/sec Loss 3.6140 LearningRate 0.0015 Epoch: 17 Global Step: 293560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:25:21,712-Speed 9213.49 samples/sec Loss 3.5449 LearningRate 0.0015 Epoch: 17 Global Step: 293570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:25:22,825-Speed 9214.23 samples/sec Loss 3.5668 LearningRate 0.0015 Epoch: 17 Global Step: 293580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:25:23,921-Speed 9351.92 samples/sec Loss 3.6162 LearningRate 0.0015 Epoch: 17 Global Step: 293590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:25,032-Speed 9222.47 samples/sec Loss 3.5455 LearningRate 0.0015 Epoch: 17 Global Step: 293600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:26,160-Speed 9076.36 samples/sec Loss 3.5743 LearningRate 0.0015 Epoch: 17 Global Step: 293610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:27,326-Speed 8787.47 samples/sec Loss 3.6672 LearningRate 0.0015 Epoch: 17 Global Step: 293620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:28,445-Speed 9164.55 samples/sec Loss 3.6550 LearningRate 0.0014 Epoch: 17 Global Step: 293630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:29,579-Speed 9033.25 samples/sec Loss 3.6379 LearningRate 0.0014 Epoch: 17 Global Step: 293640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:30,654-Speed 9528.66 samples/sec Loss 3.6204 LearningRate 0.0014 Epoch: 17 Global Step: 293650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:31,797-Speed 8962.63 samples/sec Loss 3.5371 LearningRate 0.0014 Epoch: 17 Global Step: 293660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:32,986-Speed 8615.98 samples/sec Loss 3.5061 LearningRate 0.0014 Epoch: 17 Global Step: 293670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:34,141-Speed 8869.65 samples/sec Loss 3.5734 LearningRate 0.0014 Epoch: 17 Global Step: 293680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:35,303-Speed 8821.84 samples/sec Loss 3.5929 LearningRate 0.0014 Epoch: 17 Global Step: 293690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:25:36,440-Speed 9013.67 samples/sec Loss 3.5504 LearningRate 0.0014 Epoch: 17 Global Step: 293700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:37,596-Speed 8861.81 samples/sec Loss 3.6601 LearningRate 0.0014 Epoch: 17 Global Step: 293710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:38,751-Speed 8869.31 samples/sec Loss 3.6237 LearningRate 0.0014 Epoch: 17 Global Step: 293720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:39,885-Speed 9034.06 samples/sec Loss 3.5605 LearningRate 0.0014 Epoch: 17 Global Step: 293730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:41,047-Speed 8822.29 samples/sec Loss 3.5597 LearningRate 0.0014 Epoch: 17 Global Step: 293740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:42,125-Speed 9507.89 samples/sec Loss 3.5787 LearningRate 0.0014 Epoch: 17 Global Step: 293750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:43,210-Speed 9443.58 samples/sec Loss 3.5943 LearningRate 0.0014 Epoch: 17 Global Step: 293760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:44,352-Speed 8971.48 samples/sec Loss 3.5715 LearningRate 0.0014 Epoch: 17 Global Step: 293770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:45,449-Speed 9335.08 samples/sec Loss 3.6062 LearningRate 0.0014 Epoch: 17 Global Step: 293780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:46,548-Speed 9320.91 samples/sec Loss 3.6753 LearningRate 0.0014 Epoch: 17 Global Step: 293790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:47,655-Speed 9260.06 samples/sec Loss 3.5315 LearningRate 0.0014 Epoch: 17 Global Step: 293800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:48,745-Speed 9397.40 samples/sec Loss 3.6209 LearningRate 0.0014 Epoch: 17 Global Step: 293810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:49,884-Speed 8998.01 samples/sec Loss 3.5581 LearningRate 0.0014 Epoch: 17 Global Step: 293820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:51,002-Speed 9166.64 samples/sec Loss 3.6496 LearningRate 0.0014 Epoch: 17 Global Step: 293830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:52,112-Speed 9231.75 samples/sec Loss 3.4629 LearningRate 0.0014 Epoch: 17 Global Step: 293840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:53,296-Speed 8648.80 samples/sec Loss 3.5201 LearningRate 0.0014 Epoch: 17 Global Step: 293850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:54,431-Speed 9027.97 samples/sec Loss 3.6269 LearningRate 0.0014 Epoch: 17 Global Step: 293860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:55,585-Speed 8880.22 samples/sec Loss 3.5986 LearningRate 0.0014 Epoch: 17 Global Step: 293870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:56,703-Speed 9158.35 samples/sec Loss 3.6023 LearningRate 0.0014 Epoch: 17 Global Step: 293880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:57,853-Speed 8915.49 samples/sec Loss 3.6045 LearningRate 0.0014 Epoch: 17 Global Step: 293890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:25:58,980-Speed 9091.20 samples/sec Loss 3.6096 LearningRate 0.0014 Epoch: 17 Global Step: 293900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:26:00,090-Speed 9234.26 samples/sec Loss 3.6347 LearningRate 0.0014 Epoch: 17 Global Step: 293910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:26:01,207-Speed 9169.27 samples/sec Loss 3.6331 LearningRate 0.0014 Epoch: 17 Global Step: 293920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:26:02,353-Speed 8943.32 samples/sec Loss 3.5930 LearningRate 0.0014 Epoch: 17 Global Step: 293930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:26:03,504-Speed 8900.59 samples/sec Loss 3.5826 LearningRate 0.0014 Epoch: 17 Global Step: 293940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:26:04,643-Speed 9001.94 samples/sec Loss 3.5570 LearningRate 0.0014 Epoch: 17 Global Step: 293950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:26:05,767-Speed 9114.01 samples/sec Loss 3.5864 LearningRate 0.0014 Epoch: 17 Global Step: 293960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:26:06,901-Speed 9035.75 samples/sec Loss 3.6109 LearningRate 0.0014 Epoch: 17 Global Step: 293970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:26:08,045-Speed 8955.89 samples/sec Loss 3.6092 LearningRate 0.0014 Epoch: 17 Global Step: 293980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:26:09,159-Speed 9197.23 samples/sec Loss 3.5715 LearningRate 0.0014 Epoch: 17 Global Step: 293990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:26:10,299-Speed 8987.38 samples/sec Loss 3.5933 LearningRate 0.0014 Epoch: 17 Global Step: 294000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:26:32,314-[lfw][294000]XNorm: 6.678325 Training: 2022-04-11 23:26:32,315-[lfw][294000]Accuracy-Flip: 0.99733+-0.00309 Training: 2022-04-11 23:26:32,315-[lfw][294000]Accuracy-Highest: 0.99733 Training: 2022-04-11 23:26:57,790-[cfp_fp][294000]XNorm: 5.814234 Training: 2022-04-11 23:26:57,791-[cfp_fp][294000]Accuracy-Flip: 0.97214+-0.00909 Training: 2022-04-11 23:26:57,791-[cfp_fp][294000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:27:19,777-[agedb_30][294000]XNorm: 6.492500 Training: 2022-04-11 23:27:19,778-[agedb_30][294000]Accuracy-Flip: 0.97017+-0.00831 Training: 2022-04-11 23:27:19,778-[agedb_30][294000]Accuracy-Highest: 0.97417 Training: 2022-04-11 23:27:20,912-Speed 145.02 samples/sec Loss 3.5274 LearningRate 0.0014 Epoch: 17 Global Step: 294010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:22,013-Speed 9307.29 samples/sec Loss 3.5308 LearningRate 0.0014 Epoch: 17 Global Step: 294020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:27:23,146-Speed 9037.81 samples/sec Loss 3.5980 LearningRate 0.0014 Epoch: 17 Global Step: 294030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:24,273-Speed 9093.16 samples/sec Loss 3.6103 LearningRate 0.0014 Epoch: 17 Global Step: 294040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:25,382-Speed 9240.43 samples/sec Loss 3.5567 LearningRate 0.0014 Epoch: 17 Global Step: 294050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:26,517-Speed 9029.62 samples/sec Loss 3.6077 LearningRate 0.0014 Epoch: 17 Global Step: 294060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:27,719-Speed 8523.44 samples/sec Loss 3.6317 LearningRate 0.0014 Epoch: 17 Global Step: 294070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:28,851-Speed 9050.65 samples/sec Loss 3.6269 LearningRate 0.0014 Epoch: 17 Global Step: 294080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:29,979-Speed 9083.84 samples/sec Loss 3.5526 LearningRate 0.0014 Epoch: 17 Global Step: 294090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:31,080-Speed 9308.00 samples/sec Loss 3.6313 LearningRate 0.0014 Epoch: 17 Global Step: 294100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:32,258-Speed 8696.69 samples/sec Loss 3.5565 LearningRate 0.0014 Epoch: 17 Global Step: 294110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:33,370-Speed 9209.52 samples/sec Loss 3.6602 LearningRate 0.0014 Epoch: 17 Global Step: 294120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:34,494-Speed 9117.29 samples/sec Loss 3.6134 LearningRate 0.0014 Epoch: 17 Global Step: 294130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:27:35,578-Speed 9452.29 samples/sec Loss 3.5652 LearningRate 0.0014 Epoch: 17 Global Step: 294140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:36,719-Speed 8986.45 samples/sec Loss 3.5971 LearningRate 0.0014 Epoch: 17 Global Step: 294150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:37,891-Speed 8741.52 samples/sec Loss 3.6306 LearningRate 0.0014 Epoch: 17 Global Step: 294160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:38,990-Speed 9324.06 samples/sec Loss 3.6288 LearningRate 0.0014 Epoch: 17 Global Step: 294170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:40,136-Speed 8933.44 samples/sec Loss 3.6021 LearningRate 0.0014 Epoch: 17 Global Step: 294180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:41,307-Speed 8751.17 samples/sec Loss 3.6385 LearningRate 0.0014 Epoch: 17 Global Step: 294190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:42,428-Speed 9145.39 samples/sec Loss 3.5327 LearningRate 0.0014 Epoch: 17 Global Step: 294200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:43,526-Speed 9331.41 samples/sec Loss 3.6508 LearningRate 0.0014 Epoch: 17 Global Step: 294210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:44,641-Speed 9184.82 samples/sec Loss 3.6010 LearningRate 0.0014 Epoch: 17 Global Step: 294220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:45,735-Speed 9367.58 samples/sec Loss 3.5564 LearningRate 0.0014 Epoch: 17 Global Step: 294230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:46,857-Speed 9129.35 samples/sec Loss 3.6773 LearningRate 0.0014 Epoch: 17 Global Step: 294240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:27:47,945-Speed 9422.63 samples/sec Loss 3.6087 LearningRate 0.0014 Epoch: 17 Global Step: 294250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:49,120-Speed 8714.36 samples/sec Loss 3.5738 LearningRate 0.0014 Epoch: 17 Global Step: 294260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:50,248-Speed 9085.14 samples/sec Loss 3.6756 LearningRate 0.0014 Epoch: 17 Global Step: 294270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:51,395-Speed 8935.14 samples/sec Loss 3.5886 LearningRate 0.0014 Epoch: 17 Global Step: 294280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:52,540-Speed 8952.00 samples/sec Loss 3.6313 LearningRate 0.0014 Epoch: 17 Global Step: 294290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:53,668-Speed 9083.69 samples/sec Loss 3.5446 LearningRate 0.0014 Epoch: 17 Global Step: 294300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:54,801-Speed 9040.01 samples/sec Loss 3.6463 LearningRate 0.0014 Epoch: 17 Global Step: 294310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:55,938-Speed 9011.17 samples/sec Loss 3.5638 LearningRate 0.0014 Epoch: 17 Global Step: 294320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:57,054-Speed 9184.36 samples/sec Loss 3.5769 LearningRate 0.0014 Epoch: 17 Global Step: 294330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:58,163-Speed 9233.63 samples/sec Loss 3.5966 LearningRate 0.0014 Epoch: 17 Global Step: 294340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:27:59,253-Speed 9404.22 samples/sec Loss 3.5546 LearningRate 0.0014 Epoch: 17 Global Step: 294350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:28:00,313-Speed 9664.41 samples/sec Loss 3.6372 LearningRate 0.0014 Epoch: 17 Global Step: 294360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:01,449-Speed 9023.21 samples/sec Loss 3.6250 LearningRate 0.0014 Epoch: 17 Global Step: 294370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:02,606-Speed 8853.51 samples/sec Loss 3.6581 LearningRate 0.0014 Epoch: 17 Global Step: 294380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:03,743-Speed 9012.32 samples/sec Loss 3.6573 LearningRate 0.0014 Epoch: 17 Global Step: 294390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:04,895-Speed 8889.16 samples/sec Loss 3.6810 LearningRate 0.0014 Epoch: 17 Global Step: 294400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:05,996-Speed 9309.24 samples/sec Loss 3.5590 LearningRate 0.0014 Epoch: 17 Global Step: 294410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:07,168-Speed 8746.45 samples/sec Loss 3.5725 LearningRate 0.0014 Epoch: 17 Global Step: 294420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:08,293-Speed 9110.36 samples/sec Loss 3.6037 LearningRate 0.0014 Epoch: 17 Global Step: 294430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:09,420-Speed 9087.52 samples/sec Loss 3.5652 LearningRate 0.0014 Epoch: 17 Global Step: 294440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:10,566-Speed 8942.80 samples/sec Loss 3.5632 LearningRate 0.0014 Epoch: 17 Global Step: 294450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:11,735-Speed 8769.82 samples/sec Loss 3.6496 LearningRate 0.0014 Epoch: 17 Global Step: 294460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:12,898-Speed 8804.82 samples/sec Loss 3.5595 LearningRate 0.0014 Epoch: 17 Global Step: 294470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:14,011-Speed 9214.01 samples/sec Loss 3.6022 LearningRate 0.0014 Epoch: 17 Global Step: 294480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:15,102-Speed 9390.36 samples/sec Loss 3.6473 LearningRate 0.0014 Epoch: 17 Global Step: 294490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:16,221-Speed 9156.40 samples/sec Loss 3.5587 LearningRate 0.0014 Epoch: 17 Global Step: 294500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:17,345-Speed 9114.06 samples/sec Loss 3.6590 LearningRate 0.0014 Epoch: 17 Global Step: 294510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:18,532-Speed 8626.01 samples/sec Loss 3.5838 LearningRate 0.0014 Epoch: 17 Global Step: 294520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:19,648-Speed 9182.01 samples/sec Loss 3.6833 LearningRate 0.0014 Epoch: 17 Global Step: 294530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:20,763-Speed 9190.68 samples/sec Loss 3.5269 LearningRate 0.0014 Epoch: 17 Global Step: 294540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:21,920-Speed 8853.53 samples/sec Loss 3.6227 LearningRate 0.0014 Epoch: 17 Global Step: 294550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:23,059-Speed 8995.55 samples/sec Loss 3.5314 LearningRate 0.0014 Epoch: 17 Global Step: 294560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:24,192-Speed 9044.56 samples/sec Loss 3.5841 LearningRate 0.0014 Epoch: 17 Global Step: 294570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:25,359-Speed 8781.18 samples/sec Loss 3.6103 LearningRate 0.0014 Epoch: 17 Global Step: 294580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:26,496-Speed 9007.83 samples/sec Loss 3.5919 LearningRate 0.0014 Epoch: 17 Global Step: 294590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:27,640-Speed 8961.96 samples/sec Loss 3.5976 LearningRate 0.0014 Epoch: 17 Global Step: 294600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:28,758-Speed 9165.19 samples/sec Loss 3.6235 LearningRate 0.0014 Epoch: 17 Global Step: 294610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:29,960-Speed 8523.93 samples/sec Loss 3.6385 LearningRate 0.0014 Epoch: 17 Global Step: 294620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:31,076-Speed 9173.67 samples/sec Loss 3.5226 LearningRate 0.0014 Epoch: 17 Global Step: 294630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 23:28:32,215-Speed 9001.75 samples/sec Loss 3.5473 LearningRate 0.0014 Epoch: 17 Global Step: 294640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:33,417-Speed 8521.07 samples/sec Loss 3.5414 LearningRate 0.0014 Epoch: 17 Global Step: 294650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:34,578-Speed 8824.05 samples/sec Loss 3.5388 LearningRate 0.0014 Epoch: 17 Global Step: 294660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:35,700-Speed 9135.83 samples/sec Loss 3.5858 LearningRate 0.0014 Epoch: 17 Global Step: 294670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:36,798-Speed 9332.96 samples/sec Loss 3.6382 LearningRate 0.0014 Epoch: 17 Global Step: 294680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:37,949-Speed 8904.18 samples/sec Loss 3.5816 LearningRate 0.0014 Epoch: 17 Global Step: 294690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:39,087-Speed 8999.22 samples/sec Loss 3.5869 LearningRate 0.0014 Epoch: 17 Global Step: 294700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:40,244-Speed 8860.10 samples/sec Loss 3.6215 LearningRate 0.0014 Epoch: 17 Global Step: 294710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:41,403-Speed 8832.64 samples/sec Loss 3.5505 LearningRate 0.0014 Epoch: 17 Global Step: 294720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:42,535-Speed 9057.06 samples/sec Loss 3.5691 LearningRate 0.0014 Epoch: 17 Global Step: 294730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:43,700-Speed 8793.98 samples/sec Loss 3.6085 LearningRate 0.0014 Epoch: 17 Global Step: 294740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:44,806-Speed 9260.77 samples/sec Loss 3.6831 LearningRate 0.0014 Epoch: 17 Global Step: 294750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:45,954-Speed 8928.59 samples/sec Loss 3.6073 LearningRate 0.0014 Epoch: 17 Global Step: 294760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:47,087-Speed 9042.55 samples/sec Loss 3.6399 LearningRate 0.0014 Epoch: 17 Global Step: 294770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:48,261-Speed 8727.01 samples/sec Loss 3.5288 LearningRate 0.0014 Epoch: 17 Global Step: 294780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:49,424-Speed 8807.74 samples/sec Loss 3.6417 LearningRate 0.0014 Epoch: 17 Global Step: 294790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:50,539-Speed 9191.30 samples/sec Loss 3.6330 LearningRate 0.0014 Epoch: 17 Global Step: 294800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:51,657-Speed 9163.54 samples/sec Loss 3.5747 LearningRate 0.0014 Epoch: 17 Global Step: 294810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:52,810-Speed 8887.12 samples/sec Loss 3.5730 LearningRate 0.0014 Epoch: 17 Global Step: 294820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:53,946-Speed 9020.52 samples/sec Loss 3.5967 LearningRate 0.0014 Epoch: 17 Global Step: 294830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:55,080-Speed 9031.50 samples/sec Loss 3.5557 LearningRate 0.0014 Epoch: 17 Global Step: 294840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:56,237-Speed 8860.51 samples/sec Loss 3.6018 LearningRate 0.0014 Epoch: 17 Global Step: 294850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:57,340-Speed 9287.44 samples/sec Loss 3.5835 LearningRate 0.0014 Epoch: 17 Global Step: 294860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:58,486-Speed 8939.82 samples/sec Loss 3.6206 LearningRate 0.0014 Epoch: 17 Global Step: 294870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:28:59,618-Speed 9051.25 samples/sec Loss 3.6465 LearningRate 0.0014 Epoch: 17 Global Step: 294880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:00,719-Speed 9309.08 samples/sec Loss 3.6117 LearningRate 0.0014 Epoch: 17 Global Step: 294890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:01,846-Speed 9086.77 samples/sec Loss 3.5224 LearningRate 0.0014 Epoch: 17 Global Step: 294900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:03,007-Speed 8827.62 samples/sec Loss 3.5911 LearningRate 0.0014 Epoch: 17 Global Step: 294910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:04,163-Speed 8866.80 samples/sec Loss 3.6101 LearningRate 0.0014 Epoch: 17 Global Step: 294920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:05,332-Speed 8770.00 samples/sec Loss 3.5767 LearningRate 0.0014 Epoch: 17 Global Step: 294930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:06,438-Speed 9258.99 samples/sec Loss 3.6099 LearningRate 0.0014 Epoch: 17 Global Step: 294940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:29:07,580-Speed 8976.12 samples/sec Loss 3.6488 LearningRate 0.0014 Epoch: 17 Global Step: 294950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:08,708-Speed 9084.80 samples/sec Loss 3.6146 LearningRate 0.0014 Epoch: 17 Global Step: 294960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:09,815-Speed 9255.62 samples/sec Loss 3.5794 LearningRate 0.0014 Epoch: 17 Global Step: 294970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:10,935-Speed 9149.74 samples/sec Loss 3.5947 LearningRate 0.0014 Epoch: 17 Global Step: 294980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:12,026-Speed 9392.33 samples/sec Loss 3.5757 LearningRate 0.0014 Epoch: 17 Global Step: 294990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:13,135-Speed 9236.80 samples/sec Loss 3.5609 LearningRate 0.0014 Epoch: 17 Global Step: 295000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:14,303-Speed 8771.98 samples/sec Loss 3.6121 LearningRate 0.0014 Epoch: 17 Global Step: 295010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:15,414-Speed 9225.18 samples/sec Loss 3.6371 LearningRate 0.0014 Epoch: 17 Global Step: 295020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:16,568-Speed 8879.54 samples/sec Loss 3.5202 LearningRate 0.0014 Epoch: 17 Global Step: 295030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:17,683-Speed 9185.67 samples/sec Loss 3.6147 LearningRate 0.0013 Epoch: 17 Global Step: 295040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:18,805-Speed 9128.41 samples/sec Loss 3.6675 LearningRate 0.0013 Epoch: 17 Global Step: 295050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:29:19,937-Speed 9053.02 samples/sec Loss 3.5913 LearningRate 0.0013 Epoch: 17 Global Step: 295060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:21,020-Speed 9462.86 samples/sec Loss 3.5780 LearningRate 0.0013 Epoch: 17 Global Step: 295070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:22,092-Speed 9558.69 samples/sec Loss 3.5357 LearningRate 0.0013 Epoch: 17 Global Step: 295080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:23,236-Speed 8957.71 samples/sec Loss 3.6238 LearningRate 0.0013 Epoch: 17 Global Step: 295090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:24,355-Speed 9156.70 samples/sec Loss 3.6507 LearningRate 0.0013 Epoch: 17 Global Step: 295100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:25,438-Speed 9459.13 samples/sec Loss 3.5715 LearningRate 0.0013 Epoch: 17 Global Step: 295110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:26,554-Speed 9181.23 samples/sec Loss 3.5130 LearningRate 0.0013 Epoch: 17 Global Step: 295120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:27,701-Speed 8932.52 samples/sec Loss 3.5731 LearningRate 0.0013 Epoch: 17 Global Step: 295130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:28,835-Speed 9036.46 samples/sec Loss 3.5409 LearningRate 0.0013 Epoch: 17 Global Step: 295140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:29,947-Speed 9213.48 samples/sec Loss 3.5538 LearningRate 0.0013 Epoch: 17 Global Step: 295150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:31,110-Speed 8807.00 samples/sec Loss 3.5522 LearningRate 0.0013 Epoch: 17 Global Step: 295160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:29:32,212-Speed 9300.40 samples/sec Loss 3.5863 LearningRate 0.0013 Epoch: 17 Global Step: 295170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:33,299-Speed 9423.32 samples/sec Loss 3.6772 LearningRate 0.0013 Epoch: 17 Global Step: 295180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:34,427-Speed 9081.72 samples/sec Loss 3.6300 LearningRate 0.0013 Epoch: 17 Global Step: 295190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:35,557-Speed 9066.36 samples/sec Loss 3.6294 LearningRate 0.0013 Epoch: 17 Global Step: 295200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:36,667-Speed 9234.28 samples/sec Loss 3.6567 LearningRate 0.0013 Epoch: 17 Global Step: 295210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:37,800-Speed 9046.16 samples/sec Loss 3.5487 LearningRate 0.0013 Epoch: 17 Global Step: 295220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:38,934-Speed 9028.34 samples/sec Loss 3.5918 LearningRate 0.0013 Epoch: 17 Global Step: 295230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:40,042-Speed 9248.06 samples/sec Loss 3.6137 LearningRate 0.0013 Epoch: 17 Global Step: 295240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:41,166-Speed 9118.88 samples/sec Loss 3.6678 LearningRate 0.0013 Epoch: 17 Global Step: 295250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:42,332-Speed 8788.88 samples/sec Loss 3.6006 LearningRate 0.0013 Epoch: 17 Global Step: 295260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:43,471-Speed 8999.11 samples/sec Loss 3.6180 LearningRate 0.0013 Epoch: 17 Global Step: 295270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:29:44,572-Speed 9310.52 samples/sec Loss 3.6352 LearningRate 0.0013 Epoch: 17 Global Step: 295280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:45,660-Speed 9413.47 samples/sec Loss 3.5744 LearningRate 0.0013 Epoch: 17 Global Step: 295290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:46,776-Speed 9177.64 samples/sec Loss 3.5821 LearningRate 0.0013 Epoch: 17 Global Step: 295300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:47,908-Speed 9051.89 samples/sec Loss 3.6365 LearningRate 0.0013 Epoch: 17 Global Step: 295310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:49,028-Speed 9150.42 samples/sec Loss 3.5480 LearningRate 0.0013 Epoch: 17 Global Step: 295320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:50,119-Speed 9390.35 samples/sec Loss 3.5417 LearningRate 0.0013 Epoch: 17 Global Step: 295330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:51,256-Speed 9006.57 samples/sec Loss 3.6423 LearningRate 0.0013 Epoch: 17 Global Step: 295340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:52,388-Speed 9054.78 samples/sec Loss 3.5156 LearningRate 0.0013 Epoch: 17 Global Step: 295350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:53,608-Speed 8400.03 samples/sec Loss 3.6115 LearningRate 0.0013 Epoch: 17 Global Step: 295360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:54,728-Speed 9142.45 samples/sec Loss 3.6755 LearningRate 0.0013 Epoch: 17 Global Step: 295370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:55,861-Speed 9047.39 samples/sec Loss 3.5555 LearningRate 0.0013 Epoch: 17 Global Step: 295380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:29:56,940-Speed 9496.05 samples/sec Loss 3.5793 LearningRate 0.0013 Epoch: 17 Global Step: 295390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:58,106-Speed 8787.79 samples/sec Loss 3.5440 LearningRate 0.0013 Epoch: 17 Global Step: 295400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:29:59,269-Speed 8813.23 samples/sec Loss 3.6943 LearningRate 0.0013 Epoch: 17 Global Step: 295410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:00,423-Speed 8880.93 samples/sec Loss 3.7340 LearningRate 0.0013 Epoch: 17 Global Step: 295420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:01,559-Speed 9019.04 samples/sec Loss 3.6071 LearningRate 0.0013 Epoch: 17 Global Step: 295430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:02,699-Speed 8987.66 samples/sec Loss 3.5516 LearningRate 0.0013 Epoch: 17 Global Step: 295440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:03,840-Speed 8975.41 samples/sec Loss 3.5230 LearningRate 0.0013 Epoch: 17 Global Step: 295450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:04,983-Speed 8962.47 samples/sec Loss 3.5653 LearningRate 0.0013 Epoch: 17 Global Step: 295460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:06,120-Speed 9013.80 samples/sec Loss 3.5906 LearningRate 0.0013 Epoch: 17 Global Step: 295470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:07,209-Speed 9408.09 samples/sec Loss 3.5369 LearningRate 0.0013 Epoch: 17 Global Step: 295480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:08,318-Speed 9241.10 samples/sec Loss 3.6112 LearningRate 0.0013 Epoch: 17 Global Step: 295490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:30:09,442-Speed 9114.43 samples/sec Loss 3.5953 LearningRate 0.0013 Epoch: 17 Global Step: 295500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:10,589-Speed 8932.73 samples/sec Loss 3.5720 LearningRate 0.0013 Epoch: 17 Global Step: 295510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:11,725-Speed 9017.47 samples/sec Loss 3.4987 LearningRate 0.0013 Epoch: 17 Global Step: 295520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:12,860-Speed 9028.87 samples/sec Loss 3.6179 LearningRate 0.0013 Epoch: 17 Global Step: 295530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:14,003-Speed 8967.08 samples/sec Loss 3.6237 LearningRate 0.0013 Epoch: 17 Global Step: 295540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:15,127-Speed 9116.22 samples/sec Loss 3.5854 LearningRate 0.0013 Epoch: 17 Global Step: 295550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:16,264-Speed 9009.86 samples/sec Loss 3.6127 LearningRate 0.0013 Epoch: 17 Global Step: 295560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:17,347-Speed 9462.01 samples/sec Loss 3.5886 LearningRate 0.0013 Epoch: 17 Global Step: 295570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:18,467-Speed 9141.33 samples/sec Loss 3.6143 LearningRate 0.0013 Epoch: 17 Global Step: 295580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:19,579-Speed 9225.64 samples/sec Loss 3.5688 LearningRate 0.0013 Epoch: 17 Global Step: 295590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:20,697-Speed 9159.86 samples/sec Loss 3.6099 LearningRate 0.0013 Epoch: 17 Global Step: 295600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:30:21,823-Speed 9099.18 samples/sec Loss 3.5240 LearningRate 0.0013 Epoch: 17 Global Step: 295610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:22,944-Speed 9139.03 samples/sec Loss 3.6153 LearningRate 0.0013 Epoch: 17 Global Step: 295620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:24,090-Speed 8946.05 samples/sec Loss 3.6349 LearningRate 0.0013 Epoch: 17 Global Step: 295630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:25,194-Speed 9279.20 samples/sec Loss 3.6551 LearningRate 0.0013 Epoch: 17 Global Step: 295640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:26,301-Speed 9257.84 samples/sec Loss 3.5387 LearningRate 0.0013 Epoch: 17 Global Step: 295650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:27,430-Speed 9074.52 samples/sec Loss 3.5649 LearningRate 0.0013 Epoch: 17 Global Step: 295660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:28,598-Speed 8771.08 samples/sec Loss 3.5759 LearningRate 0.0013 Epoch: 17 Global Step: 295670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:29,783-Speed 8646.28 samples/sec Loss 3.6522 LearningRate 0.0013 Epoch: 17 Global Step: 295680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:30,899-Speed 9178.25 samples/sec Loss 3.7082 LearningRate 0.0013 Epoch: 17 Global Step: 295690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:32,064-Speed 8793.64 samples/sec Loss 3.5680 LearningRate 0.0013 Epoch: 17 Global Step: 295700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:33,210-Speed 8940.45 samples/sec Loss 3.6275 LearningRate 0.0013 Epoch: 17 Global Step: 295710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:30:34,358-Speed 8923.00 samples/sec Loss 3.5876 LearningRate 0.0013 Epoch: 17 Global Step: 295720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:30:35,478-Speed 9150.20 samples/sec Loss 3.6184 LearningRate 0.0013 Epoch: 17 Global Step: 295730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:30:36,601-Speed 9124.90 samples/sec Loss 3.6254 LearningRate 0.0013 Epoch: 17 Global Step: 295740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:37,735-Speed 9039.57 samples/sec Loss 3.5337 LearningRate 0.0013 Epoch: 17 Global Step: 295750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:38,829-Speed 9370.14 samples/sec Loss 3.5802 LearningRate 0.0013 Epoch: 17 Global Step: 295760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:39,962-Speed 9040.12 samples/sec Loss 3.6332 LearningRate 0.0013 Epoch: 17 Global Step: 295770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:41,068-Speed 9269.02 samples/sec Loss 3.5997 LearningRate 0.0013 Epoch: 17 Global Step: 295780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:42,217-Speed 8910.88 samples/sec Loss 3.5502 LearningRate 0.0013 Epoch: 17 Global Step: 295790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:43,353-Speed 9024.21 samples/sec Loss 3.6000 LearningRate 0.0013 Epoch: 17 Global Step: 295800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:44,495-Speed 8971.80 samples/sec Loss 3.5280 LearningRate 0.0013 Epoch: 17 Global Step: 295810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:45,599-Speed 9280.70 samples/sec Loss 3.6284 LearningRate 0.0013 Epoch: 17 Global Step: 295820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:46,744-Speed 8945.64 samples/sec Loss 3.5420 LearningRate 0.0013 Epoch: 17 Global Step: 295830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:47,875-Speed 9060.68 samples/sec Loss 3.6885 LearningRate 0.0013 Epoch: 17 Global Step: 295840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:30:49,023-Speed 8921.83 samples/sec Loss 3.5668 LearningRate 0.0013 Epoch: 17 Global Step: 295850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:50,157-Speed 9038.89 samples/sec Loss 3.5716 LearningRate 0.0013 Epoch: 17 Global Step: 295860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:51,263-Speed 9261.02 samples/sec Loss 3.5538 LearningRate 0.0013 Epoch: 17 Global Step: 295870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:52,399-Speed 9021.14 samples/sec Loss 3.5721 LearningRate 0.0013 Epoch: 17 Global Step: 295880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:53,511-Speed 9212.81 samples/sec Loss 3.6324 LearningRate 0.0013 Epoch: 17 Global Step: 295890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:54,662-Speed 8900.05 samples/sec Loss 3.6579 LearningRate 0.0013 Epoch: 17 Global Step: 295900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:55,794-Speed 9050.73 samples/sec Loss 3.5035 LearningRate 0.0013 Epoch: 17 Global Step: 295910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:56,933-Speed 9002.53 samples/sec Loss 3.6017 LearningRate 0.0013 Epoch: 17 Global Step: 295920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:58,054-Speed 9138.77 samples/sec Loss 3.5387 LearningRate 0.0013 Epoch: 17 Global Step: 295930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:30:59,198-Speed 8958.19 samples/sec Loss 3.5976 LearningRate 0.0013 Epoch: 17 Global Step: 295940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:31:00,318-Speed 9149.13 samples/sec Loss 3.6602 LearningRate 0.0013 Epoch: 17 Global Step: 295950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:31:01,475-Speed 8857.54 samples/sec Loss 3.6195 LearningRate 0.0013 Epoch: 17 Global Step: 295960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:31:02,617-Speed 8970.39 samples/sec Loss 3.5902 LearningRate 0.0013 Epoch: 17 Global Step: 295970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:31:03,727-Speed 9225.69 samples/sec Loss 3.6035 LearningRate 0.0013 Epoch: 17 Global Step: 295980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:31:04,872-Speed 8947.03 samples/sec Loss 3.5529 LearningRate 0.0013 Epoch: 17 Global Step: 295990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:31:05,998-Speed 9103.32 samples/sec Loss 3.5897 LearningRate 0.0013 Epoch: 17 Global Step: 296000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:31:28,083-[lfw][296000]XNorm: 6.651927 Training: 2022-04-11 23:31:28,084-[lfw][296000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-04-11 23:31:28,085-[lfw][296000]Accuracy-Highest: 0.99733 Training: 2022-04-11 23:31:53,721-[cfp_fp][296000]XNorm: 5.814749 Training: 2022-04-11 23:31:53,722-[cfp_fp][296000]Accuracy-Flip: 0.97371+-0.00811 Training: 2022-04-11 23:31:53,722-[cfp_fp][296000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:32:15,864-[agedb_30][296000]XNorm: 6.480942 Training: 2022-04-11 23:32:15,865-[agedb_30][296000]Accuracy-Flip: 0.97133+-0.00859 Training: 2022-04-11 23:32:15,866-[agedb_30][296000]Accuracy-Highest: 0.97417 Training: 2022-04-11 23:32:16,945-Speed 144.33 samples/sec Loss 3.6813 LearningRate 0.0013 Epoch: 17 Global Step: 296010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:18,050-Speed 9277.27 samples/sec Loss 3.6317 LearningRate 0.0013 Epoch: 17 Global Step: 296020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:19,197-Speed 8932.36 samples/sec Loss 3.5965 LearningRate 0.0013 Epoch: 17 Global Step: 296030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:20,347-Speed 8910.21 samples/sec Loss 3.6347 LearningRate 0.0013 Epoch: 17 Global Step: 296040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:21,477-Speed 9065.76 samples/sec Loss 3.6276 LearningRate 0.0013 Epoch: 17 Global Step: 296050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:32:22,599-Speed 9128.41 samples/sec Loss 3.6345 LearningRate 0.0013 Epoch: 17 Global Step: 296060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:23,714-Speed 9197.63 samples/sec Loss 3.5498 LearningRate 0.0013 Epoch: 17 Global Step: 296070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:24,843-Speed 9076.58 samples/sec Loss 3.5390 LearningRate 0.0013 Epoch: 17 Global Step: 296080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:25,915-Speed 9553.72 samples/sec Loss 3.5350 LearningRate 0.0013 Epoch: 17 Global Step: 296090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:27,041-Speed 9101.01 samples/sec Loss 3.6764 LearningRate 0.0013 Epoch: 17 Global Step: 296100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:28,146-Speed 9267.13 samples/sec Loss 3.6137 LearningRate 0.0013 Epoch: 17 Global Step: 296110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:29,315-Speed 8767.79 samples/sec Loss 3.6829 LearningRate 0.0013 Epoch: 17 Global Step: 296120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:30,441-Speed 9100.22 samples/sec Loss 3.5883 LearningRate 0.0013 Epoch: 17 Global Step: 296130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:31,565-Speed 9113.34 samples/sec Loss 3.5655 LearningRate 0.0013 Epoch: 17 Global Step: 296140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:32,686-Speed 9142.09 samples/sec Loss 3.7027 LearningRate 0.0013 Epoch: 17 Global Step: 296150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:33,794-Speed 9243.30 samples/sec Loss 3.5477 LearningRate 0.0013 Epoch: 17 Global Step: 296160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:32:34,925-Speed 9062.32 samples/sec Loss 3.6734 LearningRate 0.0013 Epoch: 17 Global Step: 296170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:32:36,073-Speed 8923.54 samples/sec Loss 3.4974 LearningRate 0.0013 Epoch: 17 Global Step: 296180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:32:37,203-Speed 9075.17 samples/sec Loss 3.5208 LearningRate 0.0013 Epoch: 17 Global Step: 296190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:32:38,354-Speed 8896.87 samples/sec Loss 3.5677 LearningRate 0.0013 Epoch: 17 Global Step: 296200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:32:39,461-Speed 9254.70 samples/sec Loss 3.5822 LearningRate 0.0013 Epoch: 17 Global Step: 296210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:40,572-Speed 9222.33 samples/sec Loss 3.5760 LearningRate 0.0013 Epoch: 17 Global Step: 296220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:41,645-Speed 9549.88 samples/sec Loss 3.6269 LearningRate 0.0013 Epoch: 17 Global Step: 296230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:42,765-Speed 9147.60 samples/sec Loss 3.5422 LearningRate 0.0013 Epoch: 17 Global Step: 296240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:43,876-Speed 9226.95 samples/sec Loss 3.6400 LearningRate 0.0013 Epoch: 17 Global Step: 296250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:45,017-Speed 8978.80 samples/sec Loss 3.6111 LearningRate 0.0013 Epoch: 17 Global Step: 296260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:46,100-Speed 9462.81 samples/sec Loss 3.6508 LearningRate 0.0013 Epoch: 17 Global Step: 296270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:47,201-Speed 9304.72 samples/sec Loss 3.6179 LearningRate 0.0013 Epoch: 17 Global Step: 296280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:48,322-Speed 9138.39 samples/sec Loss 3.6135 LearningRate 0.0013 Epoch: 17 Global Step: 296290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:49,499-Speed 8706.37 samples/sec Loss 3.6188 LearningRate 0.0013 Epoch: 17 Global Step: 296300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:50,630-Speed 9056.65 samples/sec Loss 3.6055 LearningRate 0.0013 Epoch: 17 Global Step: 296310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:32:51,739-Speed 9236.07 samples/sec Loss 3.6478 LearningRate 0.0013 Epoch: 17 Global Step: 296320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:32:52,847-Speed 9253.53 samples/sec Loss 3.5524 LearningRate 0.0013 Epoch: 17 Global Step: 296330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:53,985-Speed 8997.16 samples/sec Loss 3.6223 LearningRate 0.0013 Epoch: 17 Global Step: 296340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:55,087-Speed 9302.34 samples/sec Loss 3.5798 LearningRate 0.0013 Epoch: 17 Global Step: 296350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:56,208-Speed 9144.90 samples/sec Loss 3.5408 LearningRate 0.0013 Epoch: 17 Global Step: 296360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:57,345-Speed 9005.38 samples/sec Loss 3.6043 LearningRate 0.0013 Epoch: 17 Global Step: 296370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:58,455-Speed 9232.82 samples/sec Loss 3.5697 LearningRate 0.0013 Epoch: 17 Global Step: 296380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:32:59,546-Speed 9392.27 samples/sec Loss 3.6732 LearningRate 0.0013 Epoch: 17 Global Step: 296390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:00,697-Speed 8896.12 samples/sec Loss 3.5968 LearningRate 0.0013 Epoch: 17 Global Step: 296400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:01,871-Speed 8728.43 samples/sec Loss 3.6073 LearningRate 0.0013 Epoch: 17 Global Step: 296410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:03,007-Speed 9024.78 samples/sec Loss 3.5750 LearningRate 0.0013 Epoch: 17 Global Step: 296420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:04,132-Speed 9105.47 samples/sec Loss 3.5312 LearningRate 0.0013 Epoch: 17 Global Step: 296430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:33:05,278-Speed 8939.06 samples/sec Loss 3.5621 LearningRate 0.0013 Epoch: 17 Global Step: 296440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:33:06,439-Speed 8822.60 samples/sec Loss 3.6411 LearningRate 0.0013 Epoch: 17 Global Step: 296450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:33:07,554-Speed 9198.82 samples/sec Loss 3.5810 LearningRate 0.0013 Epoch: 17 Global Step: 296460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:08,693-Speed 8993.83 samples/sec Loss 3.5863 LearningRate 0.0013 Epoch: 17 Global Step: 296470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:09,846-Speed 8883.77 samples/sec Loss 3.6000 LearningRate 0.0013 Epoch: 17 Global Step: 296480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:10,953-Speed 9255.89 samples/sec Loss 3.6906 LearningRate 0.0013 Epoch: 17 Global Step: 296490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:12,059-Speed 9263.12 samples/sec Loss 3.5641 LearningRate 0.0012 Epoch: 17 Global Step: 296500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:13,237-Speed 8692.85 samples/sec Loss 3.6846 LearningRate 0.0012 Epoch: 17 Global Step: 296510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:14,385-Speed 8934.50 samples/sec Loss 3.5222 LearningRate 0.0012 Epoch: 17 Global Step: 296520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:15,521-Speed 9017.64 samples/sec Loss 3.6580 LearningRate 0.0012 Epoch: 17 Global Step: 296530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:16,681-Speed 8833.38 samples/sec Loss 3.6004 LearningRate 0.0012 Epoch: 17 Global Step: 296540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:17,797-Speed 9174.73 samples/sec Loss 3.6254 LearningRate 0.0012 Epoch: 17 Global Step: 296550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:18,942-Speed 8951.54 samples/sec Loss 3.5220 LearningRate 0.0012 Epoch: 17 Global Step: 296560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:33:20,091-Speed 8917.00 samples/sec Loss 3.5661 LearningRate 0.0012 Epoch: 17 Global Step: 296570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:33:21,187-Speed 9353.30 samples/sec Loss 3.6093 LearningRate 0.0012 Epoch: 17 Global Step: 296580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:22,280-Speed 9377.12 samples/sec Loss 3.6791 LearningRate 0.0012 Epoch: 17 Global Step: 296590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:23,362-Speed 9464.56 samples/sec Loss 3.5236 LearningRate 0.0012 Epoch: 17 Global Step: 296600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:24,492-Speed 9065.84 samples/sec Loss 3.5418 LearningRate 0.0012 Epoch: 17 Global Step: 296610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:25,652-Speed 8834.88 samples/sec Loss 3.6615 LearningRate 0.0012 Epoch: 17 Global Step: 296620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:26,790-Speed 9008.19 samples/sec Loss 3.5972 LearningRate 0.0012 Epoch: 17 Global Step: 296630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:27,882-Speed 9377.39 samples/sec Loss 3.6773 LearningRate 0.0012 Epoch: 17 Global Step: 296640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:28,981-Speed 9319.34 samples/sec Loss 3.6426 LearningRate 0.0012 Epoch: 17 Global Step: 296650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:30,096-Speed 9190.23 samples/sec Loss 3.6110 LearningRate 0.0012 Epoch: 17 Global Step: 296660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:31,195-Speed 9326.48 samples/sec Loss 3.5924 LearningRate 0.0012 Epoch: 17 Global Step: 296670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:32,308-Speed 9200.14 samples/sec Loss 3.6330 LearningRate 0.0012 Epoch: 17 Global Step: 296680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 23:33:33,396-Speed 9422.30 samples/sec Loss 3.6084 LearningRate 0.0012 Epoch: 17 Global Step: 296690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:34,537-Speed 8977.26 samples/sec Loss 3.6880 LearningRate 0.0012 Epoch: 17 Global Step: 296700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:35,678-Speed 8978.64 samples/sec Loss 3.6650 LearningRate 0.0012 Epoch: 17 Global Step: 296710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:36,803-Speed 9105.94 samples/sec Loss 3.6112 LearningRate 0.0012 Epoch: 17 Global Step: 296720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:37,918-Speed 9190.84 samples/sec Loss 3.6385 LearningRate 0.0012 Epoch: 17 Global Step: 296730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:39,065-Speed 8933.47 samples/sec Loss 3.6256 LearningRate 0.0012 Epoch: 17 Global Step: 296740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:40,199-Speed 9040.37 samples/sec Loss 3.6770 LearningRate 0.0012 Epoch: 17 Global Step: 296750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:41,357-Speed 8848.04 samples/sec Loss 3.5698 LearningRate 0.0012 Epoch: 17 Global Step: 296760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 23:33:42,516-Speed 8840.42 samples/sec Loss 3.5906 LearningRate 0.0012 Epoch: 17 Global Step: 296770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:43,651-Speed 9021.81 samples/sec Loss 3.5983 LearningRate 0.0012 Epoch: 17 Global Step: 296780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:44,778-Speed 9097.13 samples/sec Loss 3.5755 LearningRate 0.0012 Epoch: 17 Global Step: 296790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:33:45,912-Speed 9036.66 samples/sec Loss 3.6753 LearningRate 0.0012 Epoch: 17 Global Step: 296800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:47,014-Speed 9296.51 samples/sec Loss 3.6647 LearningRate 0.0012 Epoch: 17 Global Step: 296810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:48,116-Speed 9297.68 samples/sec Loss 3.6629 LearningRate 0.0012 Epoch: 17 Global Step: 296820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:49,248-Speed 9055.03 samples/sec Loss 3.6316 LearningRate 0.0012 Epoch: 17 Global Step: 296830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:50,406-Speed 8840.37 samples/sec Loss 3.5790 LearningRate 0.0012 Epoch: 17 Global Step: 296840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:51,530-Speed 9116.34 samples/sec Loss 3.5315 LearningRate 0.0012 Epoch: 17 Global Step: 296850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:52,678-Speed 8927.85 samples/sec Loss 3.6228 LearningRate 0.0012 Epoch: 17 Global Step: 296860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:53,831-Speed 8886.14 samples/sec Loss 3.5551 LearningRate 0.0012 Epoch: 17 Global Step: 296870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:55,017-Speed 8636.49 samples/sec Loss 3.6934 LearningRate 0.0012 Epoch: 17 Global Step: 296880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:56,151-Speed 9035.12 samples/sec Loss 3.5980 LearningRate 0.0012 Epoch: 17 Global Step: 296890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:33:57,303-Speed 8908.59 samples/sec Loss 3.5899 LearningRate 0.0012 Epoch: 17 Global Step: 296900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:33:58,445-Speed 8971.59 samples/sec Loss 3.6579 LearningRate 0.0012 Epoch: 17 Global Step: 296910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:33:59,585-Speed 8991.96 samples/sec Loss 3.6331 LearningRate 0.0012 Epoch: 17 Global Step: 296920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:00,702-Speed 9166.61 samples/sec Loss 3.5378 LearningRate 0.0012 Epoch: 17 Global Step: 296930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:01,853-Speed 8902.29 samples/sec Loss 3.6231 LearningRate 0.0012 Epoch: 17 Global Step: 296940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:02,986-Speed 9042.81 samples/sec Loss 3.5920 LearningRate 0.0012 Epoch: 17 Global Step: 296950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:04,096-Speed 9227.50 samples/sec Loss 3.6274 LearningRate 0.0012 Epoch: 17 Global Step: 296960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:05,189-Speed 9375.43 samples/sec Loss 3.6384 LearningRate 0.0012 Epoch: 17 Global Step: 296970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:06,291-Speed 9303.16 samples/sec Loss 3.5958 LearningRate 0.0012 Epoch: 17 Global Step: 296980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:07,372-Speed 9482.20 samples/sec Loss 3.6401 LearningRate 0.0012 Epoch: 17 Global Step: 296990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:08,494-Speed 9125.25 samples/sec Loss 3.5928 LearningRate 0.0012 Epoch: 17 Global Step: 297000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:09,626-Speed 9049.06 samples/sec Loss 3.5656 LearningRate 0.0012 Epoch: 17 Global Step: 297010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:10,764-Speed 9005.17 samples/sec Loss 3.5841 LearningRate 0.0012 Epoch: 17 Global Step: 297020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:34:11,857-Speed 9372.46 samples/sec Loss 3.6606 LearningRate 0.0012 Epoch: 17 Global Step: 297030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:12,992-Speed 9028.97 samples/sec Loss 3.6414 LearningRate 0.0012 Epoch: 17 Global Step: 297040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:14,104-Speed 9216.08 samples/sec Loss 3.5915 LearningRate 0.0012 Epoch: 17 Global Step: 297050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:15,218-Speed 9198.76 samples/sec Loss 3.5826 LearningRate 0.0012 Epoch: 17 Global Step: 297060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:16,352-Speed 9036.43 samples/sec Loss 3.6290 LearningRate 0.0012 Epoch: 17 Global Step: 297070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:17,473-Speed 9136.10 samples/sec Loss 3.6439 LearningRate 0.0012 Epoch: 17 Global Step: 297080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:18,628-Speed 8877.69 samples/sec Loss 3.5647 LearningRate 0.0012 Epoch: 17 Global Step: 297090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:19,736-Speed 9241.21 samples/sec Loss 3.5970 LearningRate 0.0012 Epoch: 17 Global Step: 297100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:20,826-Speed 9404.15 samples/sec Loss 3.5964 LearningRate 0.0012 Epoch: 17 Global Step: 297110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:21,955-Speed 9069.81 samples/sec Loss 3.6113 LearningRate 0.0012 Epoch: 17 Global Step: 297120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:23,034-Speed 9496.52 samples/sec Loss 3.6120 LearningRate 0.0012 Epoch: 17 Global Step: 297130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:24,188-Speed 8878.52 samples/sec Loss 3.5406 LearningRate 0.0012 Epoch: 17 Global Step: 297140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:25,331-Speed 8969.63 samples/sec Loss 3.6609 LearningRate 0.0012 Epoch: 17 Global Step: 297150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:26,490-Speed 8840.70 samples/sec Loss 3.5242 LearningRate 0.0012 Epoch: 17 Global Step: 297160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:27,625-Speed 9026.60 samples/sec Loss 3.6853 LearningRate 0.0012 Epoch: 17 Global Step: 297170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:28,757-Speed 9046.27 samples/sec Loss 3.6395 LearningRate 0.0012 Epoch: 17 Global Step: 297180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:29,895-Speed 9001.85 samples/sec Loss 3.6446 LearningRate 0.0012 Epoch: 17 Global Step: 297190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:31,037-Speed 8975.90 samples/sec Loss 3.5823 LearningRate 0.0012 Epoch: 17 Global Step: 297200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:32,119-Speed 9469.61 samples/sec Loss 3.5882 LearningRate 0.0012 Epoch: 17 Global Step: 297210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:33,201-Speed 9465.27 samples/sec Loss 3.6606 LearningRate 0.0012 Epoch: 17 Global Step: 297220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:34,339-Speed 9006.82 samples/sec Loss 3.6229 LearningRate 0.0012 Epoch: 17 Global Step: 297230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:34:35,430-Speed 9394.32 samples/sec Loss 3.6169 LearningRate 0.0012 Epoch: 17 Global Step: 297240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:36,603-Speed 8733.15 samples/sec Loss 3.5700 LearningRate 0.0012 Epoch: 17 Global Step: 297250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:37,737-Speed 9036.94 samples/sec Loss 3.6781 LearningRate 0.0012 Epoch: 17 Global Step: 297260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:38,874-Speed 9011.32 samples/sec Loss 3.5980 LearningRate 0.0012 Epoch: 17 Global Step: 297270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:39,976-Speed 9302.71 samples/sec Loss 3.5692 LearningRate 0.0012 Epoch: 17 Global Step: 297280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:34:41,091-Speed 9187.28 samples/sec Loss 3.5307 LearningRate 0.0012 Epoch: 17 Global Step: 297290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:34:42,238-Speed 8930.19 samples/sec Loss 3.6916 LearningRate 0.0012 Epoch: 17 Global Step: 297300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:34:43,318-Speed 9495.20 samples/sec Loss 3.5118 LearningRate 0.0012 Epoch: 17 Global Step: 297310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:34:44,519-Speed 8526.93 samples/sec Loss 3.5715 LearningRate 0.0012 Epoch: 17 Global Step: 297320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:34:45,655-Speed 9019.01 samples/sec Loss 3.6291 LearningRate 0.0012 Epoch: 17 Global Step: 297330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:34:46,790-Speed 9031.26 samples/sec Loss 3.6822 LearningRate 0.0012 Epoch: 17 Global Step: 297340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:34:47,925-Speed 9022.81 samples/sec Loss 3.6399 LearningRate 0.0012 Epoch: 17 Global Step: 297350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:34:49,012-Speed 9429.69 samples/sec Loss 3.6782 LearningRate 0.0012 Epoch: 17 Global Step: 297360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:34:50,170-Speed 8844.54 samples/sec Loss 3.6189 LearningRate 0.0012 Epoch: 17 Global Step: 297370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:34:51,303-Speed 9044.49 samples/sec Loss 3.5732 LearningRate 0.0012 Epoch: 17 Global Step: 297380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:52,406-Speed 9292.42 samples/sec Loss 3.6113 LearningRate 0.0012 Epoch: 17 Global Step: 297390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:53,539-Speed 9039.43 samples/sec Loss 3.5960 LearningRate 0.0012 Epoch: 17 Global Step: 297400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:54,669-Speed 9069.30 samples/sec Loss 3.5532 LearningRate 0.0012 Epoch: 17 Global Step: 297410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:55,748-Speed 9498.71 samples/sec Loss 3.6109 LearningRate 0.0012 Epoch: 17 Global Step: 297420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:56,859-Speed 9220.76 samples/sec Loss 3.6645 LearningRate 0.0012 Epoch: 17 Global Step: 297430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:58,020-Speed 8823.18 samples/sec Loss 3.5909 LearningRate 0.0012 Epoch: 17 Global Step: 297440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:34:59,152-Speed 9051.34 samples/sec Loss 3.5877 LearningRate 0.0012 Epoch: 17 Global Step: 297450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:00,317-Speed 8790.97 samples/sec Loss 3.6305 LearningRate 0.0012 Epoch: 17 Global Step: 297460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:01,415-Speed 9332.70 samples/sec Loss 3.5550 LearningRate 0.0012 Epoch: 17 Global Step: 297470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:02,586-Speed 8755.99 samples/sec Loss 3.5921 LearningRate 0.0012 Epoch: 17 Global Step: 297480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:35:03,721-Speed 9026.92 samples/sec Loss 3.6651 LearningRate 0.0012 Epoch: 17 Global Step: 297490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:35:04,827-Speed 9258.50 samples/sec Loss 3.5351 LearningRate 0.0012 Epoch: 17 Global Step: 297500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:05,984-Speed 8857.62 samples/sec Loss 3.6114 LearningRate 0.0012 Epoch: 17 Global Step: 297510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:07,104-Speed 9149.40 samples/sec Loss 3.4914 LearningRate 0.0012 Epoch: 17 Global Step: 297520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:08,228-Speed 9108.88 samples/sec Loss 3.5753 LearningRate 0.0012 Epoch: 17 Global Step: 297530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:09,357-Speed 9075.09 samples/sec Loss 3.6490 LearningRate 0.0012 Epoch: 17 Global Step: 297540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:10,491-Speed 9041.34 samples/sec Loss 3.6078 LearningRate 0.0012 Epoch: 17 Global Step: 297550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:11,625-Speed 9027.43 samples/sec Loss 3.5389 LearningRate 0.0012 Epoch: 17 Global Step: 297560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:12,786-Speed 8834.15 samples/sec Loss 3.5531 LearningRate 0.0012 Epoch: 17 Global Step: 297570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:13,907-Speed 9137.48 samples/sec Loss 3.5443 LearningRate 0.0012 Epoch: 17 Global Step: 297580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:15,051-Speed 8960.74 samples/sec Loss 3.6095 LearningRate 0.0012 Epoch: 17 Global Step: 297590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:16,170-Speed 9154.57 samples/sec Loss 3.5415 LearningRate 0.0012 Epoch: 17 Global Step: 297600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:35:17,304-Speed 9034.03 samples/sec Loss 3.6615 LearningRate 0.0012 Epoch: 17 Global Step: 297610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:18,428-Speed 9112.08 samples/sec Loss 3.5897 LearningRate 0.0012 Epoch: 17 Global Step: 297620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:19,545-Speed 9177.62 samples/sec Loss 3.6232 LearningRate 0.0012 Epoch: 17 Global Step: 297630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:20,661-Speed 9178.84 samples/sec Loss 3.6919 LearningRate 0.0012 Epoch: 17 Global Step: 297640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:21,779-Speed 9163.51 samples/sec Loss 3.6215 LearningRate 0.0012 Epoch: 17 Global Step: 297650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:22,943-Speed 8800.63 samples/sec Loss 3.6527 LearningRate 0.0012 Epoch: 17 Global Step: 297660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:24,096-Speed 8888.55 samples/sec Loss 3.6493 LearningRate 0.0012 Epoch: 17 Global Step: 297670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:25,227-Speed 9061.48 samples/sec Loss 3.5445 LearningRate 0.0012 Epoch: 17 Global Step: 297680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:26,348-Speed 9135.94 samples/sec Loss 3.6734 LearningRate 0.0012 Epoch: 17 Global Step: 297690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:27,459-Speed 9223.26 samples/sec Loss 3.6014 LearningRate 0.0012 Epoch: 17 Global Step: 297700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:28,569-Speed 9234.43 samples/sec Loss 3.5615 LearningRate 0.0012 Epoch: 17 Global Step: 297710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:35:29,658-Speed 9401.39 samples/sec Loss 3.5016 LearningRate 0.0012 Epoch: 17 Global Step: 297720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:30,757-Speed 9323.25 samples/sec Loss 3.4956 LearningRate 0.0012 Epoch: 17 Global Step: 297730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:31,900-Speed 8968.42 samples/sec Loss 3.5895 LearningRate 0.0012 Epoch: 17 Global Step: 297740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:33,039-Speed 8997.21 samples/sec Loss 3.6489 LearningRate 0.0012 Epoch: 17 Global Step: 297750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:34,229-Speed 8613.60 samples/sec Loss 3.5858 LearningRate 0.0012 Epoch: 17 Global Step: 297760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:35,364-Speed 9025.07 samples/sec Loss 3.6519 LearningRate 0.0012 Epoch: 17 Global Step: 297770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:36,464-Speed 9311.81 samples/sec Loss 3.6891 LearningRate 0.0012 Epoch: 17 Global Step: 297780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:37,603-Speed 8994.69 samples/sec Loss 3.5909 LearningRate 0.0012 Epoch: 17 Global Step: 297790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:38,692-Speed 9407.33 samples/sec Loss 3.6613 LearningRate 0.0012 Epoch: 17 Global Step: 297800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:39,843-Speed 8901.26 samples/sec Loss 3.5782 LearningRate 0.0012 Epoch: 17 Global Step: 297810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:41,014-Speed 8753.44 samples/sec Loss 3.6613 LearningRate 0.0012 Epoch: 17 Global Step: 297820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:42,125-Speed 9221.86 samples/sec Loss 3.6099 LearningRate 0.0012 Epoch: 17 Global Step: 297830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:43,253-Speed 9080.03 samples/sec Loss 3.5786 LearningRate 0.0012 Epoch: 17 Global Step: 297840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:44,389-Speed 9023.13 samples/sec Loss 3.5783 LearningRate 0.0012 Epoch: 17 Global Step: 297850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:45,523-Speed 9031.07 samples/sec Loss 3.6406 LearningRate 0.0012 Epoch: 17 Global Step: 297860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:46,658-Speed 9032.26 samples/sec Loss 3.5616 LearningRate 0.0012 Epoch: 17 Global Step: 297870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:47,805-Speed 8927.62 samples/sec Loss 3.6403 LearningRate 0.0012 Epoch: 17 Global Step: 297880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:48,950-Speed 8949.08 samples/sec Loss 3.6061 LearningRate 0.0012 Epoch: 17 Global Step: 297890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:50,053-Speed 9294.68 samples/sec Loss 3.6803 LearningRate 0.0012 Epoch: 17 Global Step: 297900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:51,153-Speed 9320.86 samples/sec Loss 3.5675 LearningRate 0.0012 Epoch: 17 Global Step: 297910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:35:52,332-Speed 8688.40 samples/sec Loss 3.5804 LearningRate 0.0012 Epoch: 17 Global Step: 297920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:35:53,465-Speed 9040.69 samples/sec Loss 3.6144 LearningRate 0.0012 Epoch: 17 Global Step: 297930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:35:54,608-Speed 8961.18 samples/sec Loss 3.6543 LearningRate 0.0012 Epoch: 17 Global Step: 297940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:35:55,765-Speed 8855.81 samples/sec Loss 3.6131 LearningRate 0.0012 Epoch: 17 Global Step: 297950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:35:56,898-Speed 9048.96 samples/sec Loss 3.6470 LearningRate 0.0012 Epoch: 17 Global Step: 297960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:35:58,019-Speed 9136.35 samples/sec Loss 3.6963 LearningRate 0.0012 Epoch: 17 Global Step: 297970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:35:59,129-Speed 9227.82 samples/sec Loss 3.6786 LearningRate 0.0012 Epoch: 17 Global Step: 297980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:36:00,312-Speed 8664.91 samples/sec Loss 3.6096 LearningRate 0.0012 Epoch: 17 Global Step: 297990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:36:01,425-Speed 9207.15 samples/sec Loss 3.7219 LearningRate 0.0012 Epoch: 17 Global Step: 298000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:36:23,429-[lfw][298000]XNorm: 6.631757 Training: 2022-04-11 23:36:23,429-[lfw][298000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-04-11 23:36:23,430-[lfw][298000]Accuracy-Highest: 0.99733 Training: 2022-04-11 23:36:48,935-[cfp_fp][298000]XNorm: 5.795037 Training: 2022-04-11 23:36:48,936-[cfp_fp][298000]Accuracy-Flip: 0.97186+-0.00888 Training: 2022-04-11 23:36:48,936-[cfp_fp][298000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:37:10,975-[agedb_30][298000]XNorm: 6.459667 Training: 2022-04-11 23:37:10,975-[agedb_30][298000]Accuracy-Flip: 0.97383+-0.00823 Training: 2022-04-11 23:37:10,976-[agedb_30][298000]Accuracy-Highest: 0.97417 Training: 2022-04-11 23:37:12,120-Speed 144.85 samples/sec Loss 3.6005 LearningRate 0.0012 Epoch: 17 Global Step: 298010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:37:13,252-Speed 9048.65 samples/sec Loss 3.6458 LearningRate 0.0012 Epoch: 17 Global Step: 298020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:14,346-Speed 9367.40 samples/sec Loss 3.6031 LearningRate 0.0011 Epoch: 17 Global Step: 298030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:15,478-Speed 9051.44 samples/sec Loss 3.5254 LearningRate 0.0011 Epoch: 17 Global Step: 298040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:16,629-Speed 8902.83 samples/sec Loss 3.6564 LearningRate 0.0011 Epoch: 17 Global Step: 298050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:17,783-Speed 8875.55 samples/sec Loss 3.6315 LearningRate 0.0011 Epoch: 17 Global Step: 298060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:18,893-Speed 9231.98 samples/sec Loss 3.5010 LearningRate 0.0011 Epoch: 17 Global Step: 298070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:20,022-Speed 9081.58 samples/sec Loss 3.5984 LearningRate 0.0011 Epoch: 17 Global Step: 298080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:21,126-Speed 9278.28 samples/sec Loss 3.6468 LearningRate 0.0011 Epoch: 17 Global Step: 298090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:22,317-Speed 8607.53 samples/sec Loss 3.6147 LearningRate 0.0011 Epoch: 17 Global Step: 298100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:23,424-Speed 9254.26 samples/sec Loss 3.5280 LearningRate 0.0011 Epoch: 17 Global Step: 298110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:24,596-Speed 8739.16 samples/sec Loss 3.6503 LearningRate 0.0011 Epoch: 17 Global Step: 298120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:37:25,732-Speed 9016.02 samples/sec Loss 3.6179 LearningRate 0.0011 Epoch: 17 Global Step: 298130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:26,883-Speed 8903.38 samples/sec Loss 3.6171 LearningRate 0.0011 Epoch: 17 Global Step: 298140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:28,080-Speed 8558.26 samples/sec Loss 3.5133 LearningRate 0.0011 Epoch: 17 Global Step: 298150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:29,203-Speed 9125.73 samples/sec Loss 3.6155 LearningRate 0.0011 Epoch: 17 Global Step: 298160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:30,326-Speed 9124.75 samples/sec Loss 3.6151 LearningRate 0.0011 Epoch: 17 Global Step: 298170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:31,418-Speed 9379.52 samples/sec Loss 3.6133 LearningRate 0.0011 Epoch: 17 Global Step: 298180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:32,521-Speed 9291.68 samples/sec Loss 3.6657 LearningRate 0.0011 Epoch: 17 Global Step: 298190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:33,649-Speed 9088.71 samples/sec Loss 3.5520 LearningRate 0.0011 Epoch: 17 Global Step: 298200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:34,782-Speed 9041.03 samples/sec Loss 3.5887 LearningRate 0.0011 Epoch: 17 Global Step: 298210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:35,861-Speed 9492.25 samples/sec Loss 3.5993 LearningRate 0.0011 Epoch: 17 Global Step: 298220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:36,992-Speed 9059.28 samples/sec Loss 3.6122 LearningRate 0.0011 Epoch: 17 Global Step: 298230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:37:38,078-Speed 9440.09 samples/sec Loss 3.6033 LearningRate 0.0011 Epoch: 17 Global Step: 298240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:39,215-Speed 9011.17 samples/sec Loss 3.6208 LearningRate 0.0011 Epoch: 17 Global Step: 298250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:40,361-Speed 8938.52 samples/sec Loss 3.5617 LearningRate 0.0011 Epoch: 17 Global Step: 298260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:41,463-Speed 9296.27 samples/sec Loss 3.6070 LearningRate 0.0011 Epoch: 17 Global Step: 298270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:42,626-Speed 8815.47 samples/sec Loss 3.6850 LearningRate 0.0011 Epoch: 17 Global Step: 298280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:43,801-Speed 8718.50 samples/sec Loss 3.5222 LearningRate 0.0011 Epoch: 17 Global Step: 298290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:44,926-Speed 9106.75 samples/sec Loss 3.6210 LearningRate 0.0011 Epoch: 17 Global Step: 298300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:46,074-Speed 8925.35 samples/sec Loss 3.5667 LearningRate 0.0011 Epoch: 17 Global Step: 298310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:47,219-Speed 8948.69 samples/sec Loss 3.6047 LearningRate 0.0011 Epoch: 17 Global Step: 298320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:48,362-Speed 8963.08 samples/sec Loss 3.6740 LearningRate 0.0011 Epoch: 17 Global Step: 298330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:49,489-Speed 9096.48 samples/sec Loss 3.6973 LearningRate 0.0011 Epoch: 17 Global Step: 298340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:37:50,589-Speed 9318.23 samples/sec Loss 3.6582 LearningRate 0.0011 Epoch: 17 Global Step: 298350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:51,674-Speed 9438.19 samples/sec Loss 3.6321 LearningRate 0.0011 Epoch: 17 Global Step: 298360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:52,784-Speed 9230.03 samples/sec Loss 3.6651 LearningRate 0.0011 Epoch: 17 Global Step: 298370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:53,930-Speed 8939.69 samples/sec Loss 3.5855 LearningRate 0.0011 Epoch: 17 Global Step: 298380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:55,014-Speed 9450.72 samples/sec Loss 3.5640 LearningRate 0.0011 Epoch: 17 Global Step: 298390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:56,151-Speed 9012.69 samples/sec Loss 3.5742 LearningRate 0.0011 Epoch: 17 Global Step: 298400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:57,258-Speed 9258.98 samples/sec Loss 3.5237 LearningRate 0.0011 Epoch: 17 Global Step: 298410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:58,369-Speed 9221.58 samples/sec Loss 3.5318 LearningRate 0.0011 Epoch: 17 Global Step: 298420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:37:59,512-Speed 8970.98 samples/sec Loss 3.6357 LearningRate 0.0011 Epoch: 17 Global Step: 298430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:00,641-Speed 9071.20 samples/sec Loss 3.6371 LearningRate 0.0011 Epoch: 17 Global Step: 298440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:01,754-Speed 9205.64 samples/sec Loss 3.6480 LearningRate 0.0011 Epoch: 17 Global Step: 298450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:38:02,829-Speed 9537.17 samples/sec Loss 3.5977 LearningRate 0.0011 Epoch: 17 Global Step: 298460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:03,950-Speed 9134.12 samples/sec Loss 3.5088 LearningRate 0.0011 Epoch: 17 Global Step: 298470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:05,108-Speed 8848.80 samples/sec Loss 3.6095 LearningRate 0.0011 Epoch: 17 Global Step: 298480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:06,208-Speed 9315.11 samples/sec Loss 3.5673 LearningRate 0.0011 Epoch: 17 Global Step: 298490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:07,304-Speed 9350.89 samples/sec Loss 3.5691 LearningRate 0.0011 Epoch: 17 Global Step: 298500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:08,435-Speed 9058.65 samples/sec Loss 3.6123 LearningRate 0.0011 Epoch: 17 Global Step: 298510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:09,524-Speed 9400.91 samples/sec Loss 3.6509 LearningRate 0.0011 Epoch: 17 Global Step: 298520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:10,649-Speed 9114.75 samples/sec Loss 3.6200 LearningRate 0.0011 Epoch: 17 Global Step: 298530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:11,747-Speed 9326.46 samples/sec Loss 3.6104 LearningRate 0.0011 Epoch: 17 Global Step: 298540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:12,895-Speed 8925.94 samples/sec Loss 3.5617 LearningRate 0.0011 Epoch: 17 Global Step: 298550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:14,008-Speed 9205.44 samples/sec Loss 3.6376 LearningRate 0.0011 Epoch: 17 Global Step: 298560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:38:15,183-Speed 8720.69 samples/sec Loss 3.5723 LearningRate 0.0011 Epoch: 17 Global Step: 298570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:38:16,313-Speed 9070.86 samples/sec Loss 3.6008 LearningRate 0.0011 Epoch: 17 Global Step: 298580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:38:17,438-Speed 9103.17 samples/sec Loss 3.5855 LearningRate 0.0011 Epoch: 17 Global Step: 298590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:18,561-Speed 9124.49 samples/sec Loss 3.6068 LearningRate 0.0011 Epoch: 17 Global Step: 298600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:19,686-Speed 9114.77 samples/sec Loss 3.6531 LearningRate 0.0011 Epoch: 17 Global Step: 298610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:20,800-Speed 9197.22 samples/sec Loss 3.7123 LearningRate 0.0011 Epoch: 17 Global Step: 298620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:21,923-Speed 9118.45 samples/sec Loss 3.5947 LearningRate 0.0011 Epoch: 17 Global Step: 298630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:23,028-Speed 9276.76 samples/sec Loss 3.5781 LearningRate 0.0011 Epoch: 17 Global Step: 298640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:24,160-Speed 9052.67 samples/sec Loss 3.5407 LearningRate 0.0011 Epoch: 17 Global Step: 298650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:25,317-Speed 8855.00 samples/sec Loss 3.5586 LearningRate 0.0011 Epoch: 17 Global Step: 298660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:26,432-Speed 9188.77 samples/sec Loss 3.6674 LearningRate 0.0011 Epoch: 17 Global Step: 298670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:27,598-Speed 8782.17 samples/sec Loss 3.5519 LearningRate 0.0011 Epoch: 17 Global Step: 298680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:28,762-Speed 8799.73 samples/sec Loss 3.6431 LearningRate 0.0011 Epoch: 17 Global Step: 298690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:38:29,900-Speed 9007.36 samples/sec Loss 3.5889 LearningRate 0.0011 Epoch: 17 Global Step: 298700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:38:31,009-Speed 9240.60 samples/sec Loss 3.5888 LearningRate 0.0011 Epoch: 17 Global Step: 298710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:32,135-Speed 9102.59 samples/sec Loss 3.5049 LearningRate 0.0011 Epoch: 17 Global Step: 298720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:33,295-Speed 8830.60 samples/sec Loss 3.6426 LearningRate 0.0011 Epoch: 17 Global Step: 298730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:34,454-Speed 8842.23 samples/sec Loss 3.5422 LearningRate 0.0011 Epoch: 17 Global Step: 298740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:35,606-Speed 8892.19 samples/sec Loss 3.6232 LearningRate 0.0011 Epoch: 17 Global Step: 298750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:36,747-Speed 8979.80 samples/sec Loss 3.6166 LearningRate 0.0011 Epoch: 17 Global Step: 298760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:37,817-Speed 9573.47 samples/sec Loss 3.7409 LearningRate 0.0011 Epoch: 17 Global Step: 298770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:38,936-Speed 9159.20 samples/sec Loss 3.5975 LearningRate 0.0011 Epoch: 17 Global Step: 298780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:40,069-Speed 9041.97 samples/sec Loss 3.6352 LearningRate 0.0011 Epoch: 17 Global Step: 298790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:41,213-Speed 8956.26 samples/sec Loss 3.6117 LearningRate 0.0011 Epoch: 17 Global Step: 298800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:42,358-Speed 8949.54 samples/sec Loss 3.6179 LearningRate 0.0011 Epoch: 17 Global Step: 298810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:38:43,517-Speed 8837.67 samples/sec Loss 3.6308 LearningRate 0.0011 Epoch: 17 Global Step: 298820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:38:44,634-Speed 9177.17 samples/sec Loss 3.6592 LearningRate 0.0011 Epoch: 17 Global Step: 298830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:45,783-Speed 8916.03 samples/sec Loss 3.5287 LearningRate 0.0011 Epoch: 17 Global Step: 298840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:46,925-Speed 8974.24 samples/sec Loss 3.5864 LearningRate 0.0011 Epoch: 17 Global Step: 298850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:48,122-Speed 8556.49 samples/sec Loss 3.5826 LearningRate 0.0011 Epoch: 17 Global Step: 298860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:49,247-Speed 9109.03 samples/sec Loss 3.6007 LearningRate 0.0011 Epoch: 17 Global Step: 298870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:50,341-Speed 9362.46 samples/sec Loss 3.6316 LearningRate 0.0011 Epoch: 17 Global Step: 298880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:51,452-Speed 9224.72 samples/sec Loss 3.5399 LearningRate 0.0011 Epoch: 17 Global Step: 298890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:52,579-Speed 9098.10 samples/sec Loss 3.5962 LearningRate 0.0011 Epoch: 17 Global Step: 298900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:53,706-Speed 9091.06 samples/sec Loss 3.6100 LearningRate 0.0011 Epoch: 17 Global Step: 298910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:54,805-Speed 9321.75 samples/sec Loss 3.5987 LearningRate 0.0011 Epoch: 17 Global Step: 298920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:38:55,877-Speed 9551.91 samples/sec Loss 3.6798 LearningRate 0.0011 Epoch: 17 Global Step: 298930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:38:57,011-Speed 9036.95 samples/sec Loss 3.5596 LearningRate 0.0011 Epoch: 17 Global Step: 298940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:38:58,169-Speed 8850.37 samples/sec Loss 3.6420 LearningRate 0.0011 Epoch: 17 Global Step: 298950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:38:59,340-Speed 8749.01 samples/sec Loss 3.6034 LearningRate 0.0011 Epoch: 17 Global Step: 298960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:00,458-Speed 9162.18 samples/sec Loss 3.5973 LearningRate 0.0011 Epoch: 17 Global Step: 298970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:01,601-Speed 8961.89 samples/sec Loss 3.6261 LearningRate 0.0011 Epoch: 17 Global Step: 298980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:02,707-Speed 9269.53 samples/sec Loss 3.6116 LearningRate 0.0011 Epoch: 17 Global Step: 298990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:03,858-Speed 8908.44 samples/sec Loss 3.6040 LearningRate 0.0011 Epoch: 17 Global Step: 299000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:04,947-Speed 9408.14 samples/sec Loss 3.5902 LearningRate 0.0011 Epoch: 17 Global Step: 299010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:06,112-Speed 8795.36 samples/sec Loss 3.6160 LearningRate 0.0011 Epoch: 17 Global Step: 299020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:07,284-Speed 8740.82 samples/sec Loss 3.6576 LearningRate 0.0011 Epoch: 17 Global Step: 299030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:08,457-Speed 8731.27 samples/sec Loss 3.5185 LearningRate 0.0011 Epoch: 17 Global Step: 299040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:09,598-Speed 8977.64 samples/sec Loss 3.6508 LearningRate 0.0011 Epoch: 17 Global Step: 299050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:10,726-Speed 9089.81 samples/sec Loss 3.5522 LearningRate 0.0011 Epoch: 17 Global Step: 299060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:11,907-Speed 8678.29 samples/sec Loss 3.5425 LearningRate 0.0011 Epoch: 17 Global Step: 299070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:13,035-Speed 9084.49 samples/sec Loss 3.5453 LearningRate 0.0011 Epoch: 17 Global Step: 299080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:14,206-Speed 8750.02 samples/sec Loss 3.5313 LearningRate 0.0011 Epoch: 17 Global Step: 299090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:15,344-Speed 8999.81 samples/sec Loss 3.5900 LearningRate 0.0011 Epoch: 17 Global Step: 299100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:16,532-Speed 8625.37 samples/sec Loss 3.6129 LearningRate 0.0011 Epoch: 17 Global Step: 299110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:17,648-Speed 9181.84 samples/sec Loss 3.5653 LearningRate 0.0011 Epoch: 17 Global Step: 299120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:18,840-Speed 8594.86 samples/sec Loss 3.5693 LearningRate 0.0011 Epoch: 17 Global Step: 299130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:39:19,945-Speed 9279.27 samples/sec Loss 3.6565 LearningRate 0.0011 Epoch: 17 Global Step: 299140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:21,062-Speed 9171.05 samples/sec Loss 3.6664 LearningRate 0.0011 Epoch: 17 Global Step: 299150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:22,218-Speed 8857.03 samples/sec Loss 3.6756 LearningRate 0.0011 Epoch: 17 Global Step: 299160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:23,388-Speed 8761.50 samples/sec Loss 3.6634 LearningRate 0.0011 Epoch: 17 Global Step: 299170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:24,521-Speed 9043.54 samples/sec Loss 3.5535 LearningRate 0.0011 Epoch: 17 Global Step: 299180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:25,616-Speed 9351.23 samples/sec Loss 3.5770 LearningRate 0.0011 Epoch: 17 Global Step: 299190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:26,698-Speed 9474.25 samples/sec Loss 3.6221 LearningRate 0.0011 Epoch: 17 Global Step: 299200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:27,793-Speed 9349.78 samples/sec Loss 3.6226 LearningRate 0.0011 Epoch: 17 Global Step: 299210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:28,874-Speed 9481.33 samples/sec Loss 3.5792 LearningRate 0.0011 Epoch: 17 Global Step: 299220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:30,028-Speed 8882.48 samples/sec Loss 3.6921 LearningRate 0.0011 Epoch: 17 Global Step: 299230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:31,155-Speed 9090.94 samples/sec Loss 3.6231 LearningRate 0.0011 Epoch: 17 Global Step: 299240 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:39:32,249-Speed 9368.23 samples/sec Loss 3.6282 LearningRate 0.0011 Epoch: 17 Global Step: 299250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:33,350-Speed 9307.68 samples/sec Loss 3.5401 LearningRate 0.0011 Epoch: 17 Global Step: 299260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:34,469-Speed 9156.57 samples/sec Loss 3.6108 LearningRate 0.0011 Epoch: 17 Global Step: 299270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:35,530-Speed 9652.13 samples/sec Loss 3.6378 LearningRate 0.0011 Epoch: 17 Global Step: 299280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:36,649-Speed 9156.43 samples/sec Loss 3.5635 LearningRate 0.0011 Epoch: 17 Global Step: 299290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:37,764-Speed 9189.76 samples/sec Loss 3.5516 LearningRate 0.0011 Epoch: 17 Global Step: 299300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:38,902-Speed 9004.63 samples/sec Loss 3.6261 LearningRate 0.0011 Epoch: 17 Global Step: 299310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:40,005-Speed 9284.80 samples/sec Loss 3.6522 LearningRate 0.0011 Epoch: 17 Global Step: 299320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:41,121-Speed 9183.90 samples/sec Loss 3.5994 LearningRate 0.0011 Epoch: 17 Global Step: 299330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:42,250-Speed 9074.27 samples/sec Loss 3.6523 LearningRate 0.0011 Epoch: 17 Global Step: 299340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:43,410-Speed 8837.06 samples/sec Loss 3.5429 LearningRate 0.0011 Epoch: 17 Global Step: 299350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:44,536-Speed 9095.51 samples/sec Loss 3.5439 LearningRate 0.0011 Epoch: 17 Global Step: 299360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:45,627-Speed 9394.84 samples/sec Loss 3.7330 LearningRate 0.0011 Epoch: 17 Global Step: 299370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:46,775-Speed 8920.62 samples/sec Loss 3.5339 LearningRate 0.0011 Epoch: 17 Global Step: 299380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:39:47,880-Speed 9281.34 samples/sec Loss 3.6576 LearningRate 0.0011 Epoch: 17 Global Step: 299390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:48,961-Speed 9475.25 samples/sec Loss 3.5864 LearningRate 0.0011 Epoch: 17 Global Step: 299400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:50,122-Speed 8826.57 samples/sec Loss 3.6921 LearningRate 0.0011 Epoch: 17 Global Step: 299410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:51,253-Speed 9063.46 samples/sec Loss 3.6371 LearningRate 0.0011 Epoch: 17 Global Step: 299420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:52,367-Speed 9195.73 samples/sec Loss 3.6203 LearningRate 0.0011 Epoch: 17 Global Step: 299430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:53,460-Speed 9374.89 samples/sec Loss 3.6608 LearningRate 0.0011 Epoch: 17 Global Step: 299440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:54,570-Speed 9224.90 samples/sec Loss 3.5611 LearningRate 0.0011 Epoch: 17 Global Step: 299450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:55,722-Speed 8896.64 samples/sec Loss 3.5512 LearningRate 0.0011 Epoch: 17 Global Step: 299460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:56,863-Speed 8980.16 samples/sec Loss 3.7280 LearningRate 0.0011 Epoch: 17 Global Step: 299470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:57,989-Speed 9100.24 samples/sec Loss 3.6364 LearningRate 0.0011 Epoch: 17 Global Step: 299480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:39:59,113-Speed 9113.40 samples/sec Loss 3.5930 LearningRate 0.0011 Epoch: 17 Global Step: 299490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:40:00,241-Speed 9081.44 samples/sec Loss 3.6128 LearningRate 0.0011 Epoch: 17 Global Step: 299500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:01,336-Speed 9358.28 samples/sec Loss 3.6399 LearningRate 0.0011 Epoch: 17 Global Step: 299510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:02,490-Speed 8880.27 samples/sec Loss 3.7003 LearningRate 0.0011 Epoch: 17 Global Step: 299520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:03,606-Speed 9181.32 samples/sec Loss 3.6184 LearningRate 0.0011 Epoch: 17 Global Step: 299530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:04,748-Speed 8976.55 samples/sec Loss 3.6087 LearningRate 0.0011 Epoch: 17 Global Step: 299540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:40:05,886-Speed 9001.35 samples/sec Loss 3.6049 LearningRate 0.0011 Epoch: 17 Global Step: 299550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:40:06,998-Speed 9217.01 samples/sec Loss 3.5763 LearningRate 0.0011 Epoch: 17 Global Step: 299560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:40:08,202-Speed 8506.58 samples/sec Loss 3.6215 LearningRate 0.0011 Epoch: 17 Global Step: 299570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:40:09,341-Speed 8999.25 samples/sec Loss 3.5845 LearningRate 0.0011 Epoch: 17 Global Step: 299580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:40:10,486-Speed 8951.42 samples/sec Loss 3.5567 LearningRate 0.0011 Epoch: 17 Global Step: 299590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:40:11,632-Speed 8936.39 samples/sec Loss 3.4977 LearningRate 0.0011 Epoch: 17 Global Step: 299600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:40:12,748-Speed 9186.54 samples/sec Loss 3.5731 LearningRate 0.0011 Epoch: 17 Global Step: 299610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:40:13,968-Speed 8399.71 samples/sec Loss 3.6239 LearningRate 0.0010 Epoch: 17 Global Step: 299620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:40:15,070-Speed 9293.85 samples/sec Loss 3.6547 LearningRate 0.0010 Epoch: 17 Global Step: 299630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:40:16,212-Speed 8976.93 samples/sec Loss 3.5759 LearningRate 0.0010 Epoch: 17 Global Step: 299640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:17,296-Speed 9449.21 samples/sec Loss 3.6793 LearningRate 0.0010 Epoch: 17 Global Step: 299650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:18,398-Speed 9292.59 samples/sec Loss 3.6368 LearningRate 0.0010 Epoch: 17 Global Step: 299660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:19,564-Speed 8792.52 samples/sec Loss 3.5945 LearningRate 0.0010 Epoch: 17 Global Step: 299670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:20,696-Speed 9055.58 samples/sec Loss 3.7155 LearningRate 0.0010 Epoch: 17 Global Step: 299680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:21,861-Speed 8788.84 samples/sec Loss 3.6179 LearningRate 0.0010 Epoch: 17 Global Step: 299690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:22,994-Speed 9048.76 samples/sec Loss 3.5238 LearningRate 0.0010 Epoch: 17 Global Step: 299700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:24,150-Speed 8857.41 samples/sec Loss 3.6400 LearningRate 0.0010 Epoch: 17 Global Step: 299710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:25,298-Speed 8930.73 samples/sec Loss 3.5985 LearningRate 0.0010 Epoch: 17 Global Step: 299720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:26,448-Speed 8907.67 samples/sec Loss 3.6178 LearningRate 0.0010 Epoch: 17 Global Step: 299730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:27,534-Speed 9433.43 samples/sec Loss 3.6059 LearningRate 0.0010 Epoch: 17 Global Step: 299740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:40:28,680-Speed 8941.29 samples/sec Loss 3.6095 LearningRate 0.0010 Epoch: 17 Global Step: 299750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:29,829-Speed 8919.59 samples/sec Loss 3.5727 LearningRate 0.0010 Epoch: 17 Global Step: 299760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:30,968-Speed 8995.25 samples/sec Loss 3.5922 LearningRate 0.0010 Epoch: 17 Global Step: 299770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:32,097-Speed 9079.28 samples/sec Loss 3.6348 LearningRate 0.0010 Epoch: 17 Global Step: 299780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:33,227-Speed 9065.22 samples/sec Loss 3.6242 LearningRate 0.0010 Epoch: 17 Global Step: 299790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:34,337-Speed 9230.66 samples/sec Loss 3.5566 LearningRate 0.0010 Epoch: 17 Global Step: 299800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:35,478-Speed 8981.79 samples/sec Loss 3.6526 LearningRate 0.0010 Epoch: 17 Global Step: 299810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:36,599-Speed 9141.45 samples/sec Loss 3.5958 LearningRate 0.0010 Epoch: 17 Global Step: 299820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:37,704-Speed 9265.46 samples/sec Loss 3.5292 LearningRate 0.0010 Epoch: 17 Global Step: 299830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:38,848-Speed 8962.54 samples/sec Loss 3.6860 LearningRate 0.0010 Epoch: 17 Global Step: 299840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:39,962-Speed 9190.04 samples/sec Loss 3.5658 LearningRate 0.0010 Epoch: 17 Global Step: 299850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:41,133-Speed 8754.09 samples/sec Loss 3.6207 LearningRate 0.0010 Epoch: 17 Global Step: 299860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:42,244-Speed 9217.47 samples/sec Loss 3.5566 LearningRate 0.0010 Epoch: 17 Global Step: 299870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:43,376-Speed 9056.51 samples/sec Loss 3.6609 LearningRate 0.0010 Epoch: 17 Global Step: 299880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:44,570-Speed 8583.96 samples/sec Loss 3.6316 LearningRate 0.0010 Epoch: 17 Global Step: 299890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:45,666-Speed 9345.93 samples/sec Loss 3.6014 LearningRate 0.0010 Epoch: 17 Global Step: 299900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:46,788-Speed 9130.39 samples/sec Loss 3.6667 LearningRate 0.0010 Epoch: 17 Global Step: 299910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:47,942-Speed 8886.49 samples/sec Loss 3.6703 LearningRate 0.0010 Epoch: 17 Global Step: 299920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:49,075-Speed 9038.31 samples/sec Loss 3.5757 LearningRate 0.0010 Epoch: 17 Global Step: 299930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:50,199-Speed 9115.55 samples/sec Loss 3.6126 LearningRate 0.0010 Epoch: 17 Global Step: 299940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:51,327-Speed 9084.87 samples/sec Loss 3.5398 LearningRate 0.0010 Epoch: 17 Global Step: 299950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:40:52,447-Speed 9146.51 samples/sec Loss 3.5903 LearningRate 0.0010 Epoch: 17 Global Step: 299960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:40:53,549-Speed 9296.98 samples/sec Loss 3.5906 LearningRate 0.0010 Epoch: 17 Global Step: 299970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:54,642-Speed 9378.46 samples/sec Loss 3.5982 LearningRate 0.0010 Epoch: 17 Global Step: 299980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:55,769-Speed 9093.01 samples/sec Loss 3.5253 LearningRate 0.0010 Epoch: 17 Global Step: 299990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:40:56,892-Speed 9121.10 samples/sec Loss 3.5668 LearningRate 0.0010 Epoch: 17 Global Step: 300000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:41:18,848-[lfw][300000]XNorm: 6.630975 Training: 2022-04-11 23:41:18,849-[lfw][300000]Accuracy-Flip: 0.99700+-0.00287 Training: 2022-04-11 23:41:18,850-[lfw][300000]Accuracy-Highest: 0.99733 Training: 2022-04-11 23:41:44,291-[cfp_fp][300000]XNorm: 5.798590 Training: 2022-04-11 23:41:44,291-[cfp_fp][300000]Accuracy-Flip: 0.97286+-0.00908 Training: 2022-04-11 23:41:44,292-[cfp_fp][300000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:42:06,178-[agedb_30][300000]XNorm: 6.468586 Training: 2022-04-11 23:42:06,179-[agedb_30][300000]Accuracy-Flip: 0.97083+-0.00814 Training: 2022-04-11 23:42:06,180-[agedb_30][300000]Accuracy-Highest: 0.97417 Training: 2022-04-11 23:42:07,287-Speed 145.47 samples/sec Loss 3.6515 LearningRate 0.0010 Epoch: 17 Global Step: 300010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:08,418-Speed 9057.17 samples/sec Loss 3.6476 LearningRate 0.0010 Epoch: 17 Global Step: 300020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:09,534-Speed 9180.64 samples/sec Loss 3.6389 LearningRate 0.0010 Epoch: 17 Global Step: 300030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:10,620-Speed 9433.08 samples/sec Loss 3.5735 LearningRate 0.0010 Epoch: 17 Global Step: 300040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:11,730-Speed 9229.80 samples/sec Loss 3.6222 LearningRate 0.0010 Epoch: 17 Global Step: 300050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:12,851-Speed 9138.48 samples/sec Loss 3.5939 LearningRate 0.0010 Epoch: 17 Global Step: 300060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:13,994-Speed 8966.22 samples/sec Loss 3.6879 LearningRate 0.0010 Epoch: 17 Global Step: 300070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:15,167-Speed 8733.54 samples/sec Loss 3.5766 LearningRate 0.0010 Epoch: 17 Global Step: 300080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:16,305-Speed 9000.65 samples/sec Loss 3.6813 LearningRate 0.0010 Epoch: 17 Global Step: 300090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:17,413-Speed 9246.09 samples/sec Loss 3.7307 LearningRate 0.0010 Epoch: 17 Global Step: 300100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:18,504-Speed 9397.35 samples/sec Loss 3.6408 LearningRate 0.0010 Epoch: 17 Global Step: 300110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:19,608-Speed 9281.78 samples/sec Loss 3.6052 LearningRate 0.0010 Epoch: 17 Global Step: 300120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:20,701-Speed 9377.10 samples/sec Loss 3.5985 LearningRate 0.0010 Epoch: 17 Global Step: 300130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:21,827-Speed 9097.10 samples/sec Loss 3.5886 LearningRate 0.0010 Epoch: 17 Global Step: 300140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:22,927-Speed 9315.69 samples/sec Loss 3.6677 LearningRate 0.0010 Epoch: 17 Global Step: 300150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:24,109-Speed 8665.14 samples/sec Loss 3.5269 LearningRate 0.0010 Epoch: 17 Global Step: 300160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:25,194-Speed 9449.27 samples/sec Loss 3.5638 LearningRate 0.0010 Epoch: 17 Global Step: 300170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:26,346-Speed 8890.98 samples/sec Loss 3.6593 LearningRate 0.0010 Epoch: 17 Global Step: 300180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:27,479-Speed 9039.98 samples/sec Loss 3.7380 LearningRate 0.0010 Epoch: 17 Global Step: 300190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:28,642-Speed 8816.59 samples/sec Loss 3.5506 LearningRate 0.0010 Epoch: 17 Global Step: 300200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:29,762-Speed 9147.63 samples/sec Loss 3.6100 LearningRate 0.0010 Epoch: 17 Global Step: 300210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:30,885-Speed 9124.30 samples/sec Loss 3.6183 LearningRate 0.0010 Epoch: 17 Global Step: 300220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:32,051-Speed 8782.38 samples/sec Loss 3.6770 LearningRate 0.0010 Epoch: 17 Global Step: 300230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:33,163-Speed 9216.53 samples/sec Loss 3.6136 LearningRate 0.0010 Epoch: 17 Global Step: 300240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:34,266-Speed 9289.37 samples/sec Loss 3.5962 LearningRate 0.0010 Epoch: 17 Global Step: 300250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:35,358-Speed 9383.94 samples/sec Loss 3.6556 LearningRate 0.0010 Epoch: 17 Global Step: 300260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:36,461-Speed 9290.59 samples/sec Loss 3.6988 LearningRate 0.0010 Epoch: 17 Global Step: 300270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:42:37,598-Speed 9012.39 samples/sec Loss 3.5403 LearningRate 0.0010 Epoch: 17 Global Step: 300280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:38,715-Speed 9171.52 samples/sec Loss 3.6185 LearningRate 0.0010 Epoch: 17 Global Step: 300290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:39,817-Speed 9298.75 samples/sec Loss 3.6248 LearningRate 0.0010 Epoch: 17 Global Step: 300300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:40,945-Speed 9078.16 samples/sec Loss 3.6627 LearningRate 0.0010 Epoch: 17 Global Step: 300310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:42,095-Speed 8913.48 samples/sec Loss 3.6659 LearningRate 0.0010 Epoch: 17 Global Step: 300320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:43,241-Speed 8941.49 samples/sec Loss 3.5771 LearningRate 0.0010 Epoch: 17 Global Step: 300330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:44,448-Speed 8484.39 samples/sec Loss 3.5368 LearningRate 0.0010 Epoch: 17 Global Step: 300340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:45,517-Speed 9585.74 samples/sec Loss 3.6665 LearningRate 0.0010 Epoch: 17 Global Step: 300350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:46,641-Speed 9114.76 samples/sec Loss 3.5153 LearningRate 0.0010 Epoch: 17 Global Step: 300360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:47,766-Speed 9109.82 samples/sec Loss 3.5904 LearningRate 0.0010 Epoch: 17 Global Step: 300370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:48,933-Speed 8781.19 samples/sec Loss 3.5545 LearningRate 0.0010 Epoch: 17 Global Step: 300380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:50,070-Speed 9012.78 samples/sec Loss 3.5860 LearningRate 0.0010 Epoch: 17 Global Step: 300390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:51,154-Speed 9450.15 samples/sec Loss 3.5096 LearningRate 0.0010 Epoch: 17 Global Step: 300400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:52,267-Speed 9208.72 samples/sec Loss 3.6048 LearningRate 0.0010 Epoch: 17 Global Step: 300410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:53,404-Speed 9013.31 samples/sec Loss 3.6340 LearningRate 0.0010 Epoch: 17 Global Step: 300420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:42:54,949-Speed 6627.55 samples/sec Loss 3.6367 LearningRate 0.0010 Epoch: 17 Global Step: 300430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:25,541-Speed 334.75 samples/sec Loss 3.6035 LearningRate 0.0010 Epoch: 18 Global Step: 300440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:26,713-Speed 8741.42 samples/sec Loss 3.2657 LearningRate 0.0010 Epoch: 18 Global Step: 300450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:27,904-Speed 8605.68 samples/sec Loss 3.3355 LearningRate 0.0010 Epoch: 18 Global Step: 300460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:29,443-Speed 6655.46 samples/sec Loss 3.3000 LearningRate 0.0010 Epoch: 18 Global Step: 300470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:43:31,200-Speed 5829.15 samples/sec Loss 3.3441 LearningRate 0.0010 Epoch: 18 Global Step: 300480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:43:32,513-Speed 7806.00 samples/sec Loss 3.4213 LearningRate 0.0010 Epoch: 18 Global Step: 300490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:43:33,871-Speed 7549.70 samples/sec Loss 3.3405 LearningRate 0.0010 Epoch: 18 Global Step: 300500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:43:35,045-Speed 8722.42 samples/sec Loss 3.3965 LearningRate 0.0010 Epoch: 18 Global Step: 300510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:43:36,192-Speed 8934.70 samples/sec Loss 3.3640 LearningRate 0.0010 Epoch: 18 Global Step: 300520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:43:37,317-Speed 9107.25 samples/sec Loss 3.3693 LearningRate 0.0010 Epoch: 18 Global Step: 300530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:43:38,450-Speed 9042.20 samples/sec Loss 3.3966 LearningRate 0.0010 Epoch: 18 Global Step: 300540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:43:39,595-Speed 8946.83 samples/sec Loss 3.3274 LearningRate 0.0010 Epoch: 18 Global Step: 300550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:43:40,796-Speed 8532.55 samples/sec Loss 3.3346 LearningRate 0.0010 Epoch: 18 Global Step: 300560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:43:42,032-Speed 8288.16 samples/sec Loss 3.4010 LearningRate 0.0010 Epoch: 18 Global Step: 300570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:43,192-Speed 8832.69 samples/sec Loss 3.3158 LearningRate 0.0010 Epoch: 18 Global Step: 300580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:44,385-Speed 8586.26 samples/sec Loss 3.3223 LearningRate 0.0010 Epoch: 18 Global Step: 300590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:45,510-Speed 9108.28 samples/sec Loss 3.3848 LearningRate 0.0010 Epoch: 18 Global Step: 300600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:46,658-Speed 8924.85 samples/sec Loss 3.3825 LearningRate 0.0010 Epoch: 18 Global Step: 300610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:47,779-Speed 9142.31 samples/sec Loss 3.3659 LearningRate 0.0010 Epoch: 18 Global Step: 300620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:48,920-Speed 8975.25 samples/sec Loss 3.3726 LearningRate 0.0010 Epoch: 18 Global Step: 300630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:50,022-Speed 9299.66 samples/sec Loss 3.3768 LearningRate 0.0010 Epoch: 18 Global Step: 300640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:51,193-Speed 8750.27 samples/sec Loss 3.3941 LearningRate 0.0010 Epoch: 18 Global Step: 300650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:52,320-Speed 9090.47 samples/sec Loss 3.3591 LearningRate 0.0010 Epoch: 18 Global Step: 300660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:53,551-Speed 8327.13 samples/sec Loss 3.2969 LearningRate 0.0010 Epoch: 18 Global Step: 300670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:43:54,652-Speed 9303.35 samples/sec Loss 3.3674 LearningRate 0.0010 Epoch: 18 Global Step: 300680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:55,812-Speed 8829.92 samples/sec Loss 3.3075 LearningRate 0.0010 Epoch: 18 Global Step: 300690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:56,961-Speed 8920.10 samples/sec Loss 3.3306 LearningRate 0.0010 Epoch: 18 Global Step: 300700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:58,108-Speed 8929.14 samples/sec Loss 3.3264 LearningRate 0.0010 Epoch: 18 Global Step: 300710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:43:59,255-Speed 8929.96 samples/sec Loss 3.3798 LearningRate 0.0010 Epoch: 18 Global Step: 300720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:00,440-Speed 8645.32 samples/sec Loss 3.3467 LearningRate 0.0010 Epoch: 18 Global Step: 300730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:01,569-Speed 9076.76 samples/sec Loss 3.3417 LearningRate 0.0010 Epoch: 18 Global Step: 300740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:02,719-Speed 8910.82 samples/sec Loss 3.3414 LearningRate 0.0010 Epoch: 18 Global Step: 300750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:03,837-Speed 9170.16 samples/sec Loss 3.3057 LearningRate 0.0010 Epoch: 18 Global Step: 300760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:04,989-Speed 8893.77 samples/sec Loss 3.3293 LearningRate 0.0010 Epoch: 18 Global Step: 300770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:06,151-Speed 8817.03 samples/sec Loss 3.4128 LearningRate 0.0010 Epoch: 18 Global Step: 300780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:07,334-Speed 8659.63 samples/sec Loss 3.4016 LearningRate 0.0010 Epoch: 18 Global Step: 300790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:08,537-Speed 8516.62 samples/sec Loss 3.3433 LearningRate 0.0010 Epoch: 18 Global Step: 300800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:09,812-Speed 8038.30 samples/sec Loss 3.3275 LearningRate 0.0010 Epoch: 18 Global Step: 300810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:10,935-Speed 9124.14 samples/sec Loss 3.3211 LearningRate 0.0010 Epoch: 18 Global Step: 300820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:12,070-Speed 9032.49 samples/sec Loss 3.3781 LearningRate 0.0010 Epoch: 18 Global Step: 300830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:13,196-Speed 9094.92 samples/sec Loss 3.4035 LearningRate 0.0010 Epoch: 18 Global Step: 300840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:14,332-Speed 9016.34 samples/sec Loss 3.3670 LearningRate 0.0010 Epoch: 18 Global Step: 300850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:15,469-Speed 9012.67 samples/sec Loss 3.3672 LearningRate 0.0010 Epoch: 18 Global Step: 300860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:16,580-Speed 9221.30 samples/sec Loss 3.4176 LearningRate 0.0010 Epoch: 18 Global Step: 300870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:17,707-Speed 9088.58 samples/sec Loss 3.3276 LearningRate 0.0010 Epoch: 18 Global Step: 300880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:18,842-Speed 9033.32 samples/sec Loss 3.3801 LearningRate 0.0010 Epoch: 18 Global Step: 300890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:19,993-Speed 8897.63 samples/sec Loss 3.3534 LearningRate 0.0010 Epoch: 18 Global Step: 300900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:21,139-Speed 8943.58 samples/sec Loss 3.3669 LearningRate 0.0010 Epoch: 18 Global Step: 300910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:22,266-Speed 9087.84 samples/sec Loss 3.4464 LearningRate 0.0010 Epoch: 18 Global Step: 300920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:23,407-Speed 8985.89 samples/sec Loss 3.3136 LearningRate 0.0010 Epoch: 18 Global Step: 300930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:24,545-Speed 9002.29 samples/sec Loss 3.3065 LearningRate 0.0010 Epoch: 18 Global Step: 300940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:25,667-Speed 9131.14 samples/sec Loss 3.3397 LearningRate 0.0010 Epoch: 18 Global Step: 300950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:26,812-Speed 8945.31 samples/sec Loss 3.3679 LearningRate 0.0010 Epoch: 18 Global Step: 300960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:27,945-Speed 9043.97 samples/sec Loss 3.3413 LearningRate 0.0010 Epoch: 18 Global Step: 300970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:29,058-Speed 9209.58 samples/sec Loss 3.3143 LearningRate 0.0010 Epoch: 18 Global Step: 300980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:30,172-Speed 9191.43 samples/sec Loss 3.3198 LearningRate 0.0010 Epoch: 18 Global Step: 300990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:31,321-Speed 8919.23 samples/sec Loss 3.2829 LearningRate 0.0010 Epoch: 18 Global Step: 301000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:32,489-Speed 8773.58 samples/sec Loss 3.4486 LearningRate 0.0010 Epoch: 18 Global Step: 301010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:33,651-Speed 8815.49 samples/sec Loss 3.3067 LearningRate 0.0010 Epoch: 18 Global Step: 301020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:34,765-Speed 9200.35 samples/sec Loss 3.2889 LearningRate 0.0010 Epoch: 18 Global Step: 301030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:35,884-Speed 9159.20 samples/sec Loss 3.2962 LearningRate 0.0010 Epoch: 18 Global Step: 301040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:37,026-Speed 8966.12 samples/sec Loss 3.3424 LearningRate 0.0010 Epoch: 18 Global Step: 301050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:38,169-Speed 8969.03 samples/sec Loss 3.3606 LearningRate 0.0010 Epoch: 18 Global Step: 301060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:39,309-Speed 8986.49 samples/sec Loss 3.3814 LearningRate 0.0010 Epoch: 18 Global Step: 301070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:40,470-Speed 8826.64 samples/sec Loss 3.3861 LearningRate 0.0010 Epoch: 18 Global Step: 301080 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:44:41,599-Speed 9076.22 samples/sec Loss 3.3470 LearningRate 0.0010 Epoch: 18 Global Step: 301090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:42,736-Speed 9006.26 samples/sec Loss 3.3765 LearningRate 0.0010 Epoch: 18 Global Step: 301100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:43,866-Speed 9071.46 samples/sec Loss 3.4373 LearningRate 0.0010 Epoch: 18 Global Step: 301110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:44,975-Speed 9238.74 samples/sec Loss 3.3831 LearningRate 0.0010 Epoch: 18 Global Step: 301120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:46,106-Speed 9057.80 samples/sec Loss 3.3215 LearningRate 0.0010 Epoch: 18 Global Step: 301130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:47,232-Speed 9102.57 samples/sec Loss 3.2248 LearningRate 0.0010 Epoch: 18 Global Step: 301140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:48,522-Speed 7939.82 samples/sec Loss 3.3714 LearningRate 0.0010 Epoch: 18 Global Step: 301150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:49,614-Speed 9386.26 samples/sec Loss 3.3030 LearningRate 0.0010 Epoch: 18 Global Step: 301160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:50,915-Speed 7875.62 samples/sec Loss 3.3846 LearningRate 0.0010 Epoch: 18 Global Step: 301170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:52,359-Speed 7093.43 samples/sec Loss 3.2810 LearningRate 0.0010 Epoch: 18 Global Step: 301180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:53,639-Speed 8004.41 samples/sec Loss 3.3859 LearningRate 0.0010 Epoch: 18 Global Step: 301190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:54,967-Speed 7714.59 samples/sec Loss 3.3231 LearningRate 0.0010 Epoch: 18 Global Step: 301200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:56,104-Speed 9006.36 samples/sec Loss 3.4226 LearningRate 0.0010 Epoch: 18 Global Step: 301210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:57,233-Speed 9078.04 samples/sec Loss 3.3780 LearningRate 0.0010 Epoch: 18 Global Step: 301220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:58,333-Speed 9316.64 samples/sec Loss 3.3492 LearningRate 0.0010 Epoch: 18 Global Step: 301230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:44:59,456-Speed 9123.25 samples/sec Loss 3.3328 LearningRate 0.0010 Epoch: 18 Global Step: 301240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:00,609-Speed 8890.08 samples/sec Loss 3.3411 LearningRate 0.0010 Epoch: 18 Global Step: 301250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:01,672-Speed 9641.67 samples/sec Loss 3.3169 LearningRate 0.0010 Epoch: 18 Global Step: 301260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:02,778-Speed 9261.28 samples/sec Loss 3.3871 LearningRate 0.0010 Epoch: 18 Global Step: 301270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:03,905-Speed 9095.61 samples/sec Loss 3.3934 LearningRate 0.0010 Epoch: 18 Global Step: 301280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:05,020-Speed 9187.43 samples/sec Loss 3.3578 LearningRate 0.0009 Epoch: 18 Global Step: 301290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:45:06,131-Speed 9221.13 samples/sec Loss 3.3896 LearningRate 0.0009 Epoch: 18 Global Step: 301300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:45:07,278-Speed 8930.17 samples/sec Loss 3.3706 LearningRate 0.0009 Epoch: 18 Global Step: 301310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:45:08,393-Speed 9193.02 samples/sec Loss 3.4040 LearningRate 0.0009 Epoch: 18 Global Step: 301320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:09,530-Speed 9013.06 samples/sec Loss 3.3405 LearningRate 0.0009 Epoch: 18 Global Step: 301330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:10,650-Speed 9147.44 samples/sec Loss 3.3691 LearningRate 0.0009 Epoch: 18 Global Step: 301340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:11,796-Speed 8943.46 samples/sec Loss 3.3614 LearningRate 0.0009 Epoch: 18 Global Step: 301350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:12,908-Speed 9212.84 samples/sec Loss 3.2757 LearningRate 0.0009 Epoch: 18 Global Step: 301360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:14,082-Speed 8725.69 samples/sec Loss 3.3323 LearningRate 0.0009 Epoch: 18 Global Step: 301370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:15,213-Speed 9063.90 samples/sec Loss 3.3567 LearningRate 0.0009 Epoch: 18 Global Step: 301380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:16,325-Speed 9215.07 samples/sec Loss 3.3453 LearningRate 0.0009 Epoch: 18 Global Step: 301390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:17,429-Speed 9279.69 samples/sec Loss 3.3471 LearningRate 0.0009 Epoch: 18 Global Step: 301400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:18,557-Speed 9084.13 samples/sec Loss 3.3294 LearningRate 0.0009 Epoch: 18 Global Step: 301410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:19,646-Speed 9400.30 samples/sec Loss 3.3106 LearningRate 0.0009 Epoch: 18 Global Step: 301420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:20,772-Speed 9105.37 samples/sec Loss 3.4118 LearningRate 0.0009 Epoch: 18 Global Step: 301430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:21,942-Speed 8754.90 samples/sec Loss 3.4259 LearningRate 0.0009 Epoch: 18 Global Step: 301440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:23,097-Speed 8876.34 samples/sec Loss 3.4084 LearningRate 0.0009 Epoch: 18 Global Step: 301450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:24,193-Speed 9346.72 samples/sec Loss 3.3560 LearningRate 0.0009 Epoch: 18 Global Step: 301460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:25,273-Speed 9488.55 samples/sec Loss 3.4110 LearningRate 0.0009 Epoch: 18 Global Step: 301470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:26,440-Speed 8779.93 samples/sec Loss 3.3648 LearningRate 0.0009 Epoch: 18 Global Step: 301480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:27,613-Speed 8730.90 samples/sec Loss 3.3157 LearningRate 0.0009 Epoch: 18 Global Step: 301490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:28,763-Speed 8916.23 samples/sec Loss 3.2067 LearningRate 0.0009 Epoch: 18 Global Step: 301500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:29,893-Speed 9064.29 samples/sec Loss 3.4352 LearningRate 0.0009 Epoch: 18 Global Step: 301510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:31,027-Speed 9033.47 samples/sec Loss 3.3757 LearningRate 0.0009 Epoch: 18 Global Step: 301520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:45:32,137-Speed 9231.64 samples/sec Loss 3.2625 LearningRate 0.0009 Epoch: 18 Global Step: 301530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:45:33,280-Speed 8971.87 samples/sec Loss 3.3882 LearningRate 0.0009 Epoch: 18 Global Step: 301540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:45:34,446-Speed 8782.84 samples/sec Loss 3.3341 LearningRate 0.0009 Epoch: 18 Global Step: 301550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:35,578-Speed 9052.90 samples/sec Loss 3.3535 LearningRate 0.0009 Epoch: 18 Global Step: 301560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:36,713-Speed 9025.75 samples/sec Loss 3.3673 LearningRate 0.0009 Epoch: 18 Global Step: 301570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:37,841-Speed 9086.43 samples/sec Loss 3.3566 LearningRate 0.0009 Epoch: 18 Global Step: 301580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:38,923-Speed 9472.40 samples/sec Loss 3.3747 LearningRate 0.0009 Epoch: 18 Global Step: 301590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:40,081-Speed 8845.38 samples/sec Loss 3.3730 LearningRate 0.0009 Epoch: 18 Global Step: 301600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:41,192-Speed 9223.75 samples/sec Loss 3.4211 LearningRate 0.0009 Epoch: 18 Global Step: 301610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:42,320-Speed 9084.70 samples/sec Loss 3.4473 LearningRate 0.0009 Epoch: 18 Global Step: 301620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:43,401-Speed 9476.41 samples/sec Loss 3.3146 LearningRate 0.0009 Epoch: 18 Global Step: 301630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:44,522-Speed 9134.84 samples/sec Loss 3.4192 LearningRate 0.0009 Epoch: 18 Global Step: 301640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:45,664-Speed 8976.98 samples/sec Loss 3.3220 LearningRate 0.0009 Epoch: 18 Global Step: 301650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:45:46,826-Speed 8811.64 samples/sec Loss 3.3567 LearningRate 0.0009 Epoch: 18 Global Step: 301660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:47,967-Speed 8981.74 samples/sec Loss 3.3654 LearningRate 0.0009 Epoch: 18 Global Step: 301670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:49,106-Speed 8992.68 samples/sec Loss 3.3265 LearningRate 0.0009 Epoch: 18 Global Step: 301680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:50,230-Speed 9116.41 samples/sec Loss 3.3447 LearningRate 0.0009 Epoch: 18 Global Step: 301690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:51,388-Speed 8848.14 samples/sec Loss 3.3719 LearningRate 0.0009 Epoch: 18 Global Step: 301700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:52,542-Speed 8883.92 samples/sec Loss 3.3059 LearningRate 0.0009 Epoch: 18 Global Step: 301710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:53,697-Speed 8879.86 samples/sec Loss 3.3613 LearningRate 0.0009 Epoch: 18 Global Step: 301720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:54,840-Speed 8962.27 samples/sec Loss 3.3322 LearningRate 0.0009 Epoch: 18 Global Step: 301730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:55,970-Speed 9066.25 samples/sec Loss 3.3443 LearningRate 0.0009 Epoch: 18 Global Step: 301740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:57,113-Speed 8960.86 samples/sec Loss 3.4072 LearningRate 0.0009 Epoch: 18 Global Step: 301750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:45:58,229-Speed 9186.46 samples/sec Loss 3.3774 LearningRate 0.0009 Epoch: 18 Global Step: 301760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:45:59,325-Speed 9345.37 samples/sec Loss 3.3849 LearningRate 0.0009 Epoch: 18 Global Step: 301770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:00,416-Speed 9391.73 samples/sec Loss 3.3092 LearningRate 0.0009 Epoch: 18 Global Step: 301780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:01,500-Speed 9453.39 samples/sec Loss 3.3880 LearningRate 0.0009 Epoch: 18 Global Step: 301790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:02,606-Speed 9260.49 samples/sec Loss 3.4493 LearningRate 0.0009 Epoch: 18 Global Step: 301800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:03,730-Speed 9121.05 samples/sec Loss 3.3064 LearningRate 0.0009 Epoch: 18 Global Step: 301810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:04,813-Speed 9465.37 samples/sec Loss 3.3934 LearningRate 0.0009 Epoch: 18 Global Step: 301820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:05,900-Speed 9425.96 samples/sec Loss 3.3279 LearningRate 0.0009 Epoch: 18 Global Step: 301830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:07,065-Speed 8789.14 samples/sec Loss 3.3286 LearningRate 0.0009 Epoch: 18 Global Step: 301840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:08,215-Speed 8913.23 samples/sec Loss 3.5060 LearningRate 0.0009 Epoch: 18 Global Step: 301850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:09,347-Speed 9051.06 samples/sec Loss 3.4183 LearningRate 0.0009 Epoch: 18 Global Step: 301860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:10,478-Speed 9061.98 samples/sec Loss 3.3778 LearningRate 0.0009 Epoch: 18 Global Step: 301870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:46:11,669-Speed 8604.64 samples/sec Loss 3.3779 LearningRate 0.0009 Epoch: 18 Global Step: 301880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:12,817-Speed 8926.57 samples/sec Loss 3.3371 LearningRate 0.0009 Epoch: 18 Global Step: 301890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:13,967-Speed 8906.04 samples/sec Loss 3.3112 LearningRate 0.0009 Epoch: 18 Global Step: 301900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:15,092-Speed 9111.16 samples/sec Loss 3.4066 LearningRate 0.0009 Epoch: 18 Global Step: 301910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:16,237-Speed 8942.24 samples/sec Loss 3.3732 LearningRate 0.0009 Epoch: 18 Global Step: 301920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:17,329-Speed 9382.00 samples/sec Loss 3.3435 LearningRate 0.0009 Epoch: 18 Global Step: 301930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:18,476-Speed 8932.53 samples/sec Loss 3.3513 LearningRate 0.0009 Epoch: 18 Global Step: 301940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:19,636-Speed 8832.27 samples/sec Loss 3.3934 LearningRate 0.0009 Epoch: 18 Global Step: 301950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:20,756-Speed 9151.23 samples/sec Loss 3.4058 LearningRate 0.0009 Epoch: 18 Global Step: 301960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:21,852-Speed 9343.36 samples/sec Loss 3.3881 LearningRate 0.0009 Epoch: 18 Global Step: 301970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:22,951-Speed 9326.64 samples/sec Loss 3.3528 LearningRate 0.0009 Epoch: 18 Global Step: 301980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:46:24,151-Speed 8542.10 samples/sec Loss 3.2640 LearningRate 0.0009 Epoch: 18 Global Step: 301990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:25,290-Speed 8994.98 samples/sec Loss 3.3380 LearningRate 0.0009 Epoch: 18 Global Step: 302000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:46:47,231-[lfw][302000]XNorm: 6.591494 Training: 2022-04-11 23:46:47,232-[lfw][302000]Accuracy-Flip: 0.99750+-0.00300 Training: 2022-04-11 23:46:47,232-[lfw][302000]Accuracy-Highest: 0.99750 Training: 2022-04-11 23:47:12,505-[cfp_fp][302000]XNorm: 5.771600 Training: 2022-04-11 23:47:12,506-[cfp_fp][302000]Accuracy-Flip: 0.97257+-0.00837 Training: 2022-04-11 23:47:12,506-[cfp_fp][302000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:47:34,353-[agedb_30][302000]XNorm: 6.428232 Training: 2022-04-11 23:47:34,354-[agedb_30][302000]Accuracy-Flip: 0.97250+-0.00814 Training: 2022-04-11 23:47:34,354-[agedb_30][302000]Accuracy-Highest: 0.97417 Training: 2022-04-11 23:47:35,499-Speed 145.85 samples/sec Loss 3.4101 LearningRate 0.0009 Epoch: 18 Global Step: 302010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:36,618-Speed 9152.70 samples/sec Loss 3.2880 LearningRate 0.0009 Epoch: 18 Global Step: 302020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:37,721-Speed 9292.88 samples/sec Loss 3.3480 LearningRate 0.0009 Epoch: 18 Global Step: 302030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:38,816-Speed 9358.95 samples/sec Loss 3.4467 LearningRate 0.0009 Epoch: 18 Global Step: 302040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:39,960-Speed 8949.83 samples/sec Loss 3.3116 LearningRate 0.0009 Epoch: 18 Global Step: 302050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:41,095-Speed 9025.92 samples/sec Loss 3.4011 LearningRate 0.0009 Epoch: 18 Global Step: 302060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:42,262-Speed 8784.01 samples/sec Loss 3.3799 LearningRate 0.0009 Epoch: 18 Global Step: 302070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:43,432-Speed 8758.63 samples/sec Loss 3.3595 LearningRate 0.0009 Epoch: 18 Global Step: 302080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:44,579-Speed 8935.39 samples/sec Loss 3.3321 LearningRate 0.0009 Epoch: 18 Global Step: 302090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:47:45,696-Speed 9173.80 samples/sec Loss 3.2945 LearningRate 0.0009 Epoch: 18 Global Step: 302100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:47:46,817-Speed 9139.49 samples/sec Loss 3.2612 LearningRate 0.0009 Epoch: 18 Global Step: 302110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:47,935-Speed 9165.04 samples/sec Loss 3.3206 LearningRate 0.0009 Epoch: 18 Global Step: 302120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:49,074-Speed 8991.27 samples/sec Loss 3.3491 LearningRate 0.0009 Epoch: 18 Global Step: 302130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:50,167-Speed 9376.92 samples/sec Loss 3.3424 LearningRate 0.0009 Epoch: 18 Global Step: 302140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:51,355-Speed 8624.09 samples/sec Loss 3.4001 LearningRate 0.0009 Epoch: 18 Global Step: 302150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:52,468-Speed 9204.42 samples/sec Loss 3.3703 LearningRate 0.0009 Epoch: 18 Global Step: 302160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:53,566-Speed 9334.47 samples/sec Loss 3.3987 LearningRate 0.0009 Epoch: 18 Global Step: 302170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:54,665-Speed 9323.62 samples/sec Loss 3.3240 LearningRate 0.0009 Epoch: 18 Global Step: 302180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:55,758-Speed 9369.68 samples/sec Loss 3.3952 LearningRate 0.0009 Epoch: 18 Global Step: 302190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:56,920-Speed 8822.64 samples/sec Loss 3.4083 LearningRate 0.0009 Epoch: 18 Global Step: 302200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:58,023-Speed 9288.38 samples/sec Loss 3.3923 LearningRate 0.0009 Epoch: 18 Global Step: 302210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:47:59,170-Speed 8935.57 samples/sec Loss 3.4028 LearningRate 0.0009 Epoch: 18 Global Step: 302220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:00,297-Speed 9092.99 samples/sec Loss 3.3390 LearningRate 0.0009 Epoch: 18 Global Step: 302230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:01,370-Speed 9544.47 samples/sec Loss 3.3349 LearningRate 0.0009 Epoch: 18 Global Step: 302240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:02,489-Speed 9155.61 samples/sec Loss 3.2685 LearningRate 0.0009 Epoch: 18 Global Step: 302250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:03,588-Speed 9326.64 samples/sec Loss 3.3305 LearningRate 0.0009 Epoch: 18 Global Step: 302260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:04,705-Speed 9170.14 samples/sec Loss 3.3810 LearningRate 0.0009 Epoch: 18 Global Step: 302270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:05,848-Speed 8968.08 samples/sec Loss 3.3594 LearningRate 0.0009 Epoch: 18 Global Step: 302280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:06,954-Speed 9264.98 samples/sec Loss 3.4072 LearningRate 0.0009 Epoch: 18 Global Step: 302290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:08,036-Speed 9465.48 samples/sec Loss 3.3332 LearningRate 0.0009 Epoch: 18 Global Step: 302300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:09,180-Speed 8955.55 samples/sec Loss 3.3298 LearningRate 0.0009 Epoch: 18 Global Step: 302310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:48:10,285-Speed 9280.79 samples/sec Loss 3.3769 LearningRate 0.0009 Epoch: 18 Global Step: 302320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:48:11,415-Speed 9068.36 samples/sec Loss 3.3592 LearningRate 0.0009 Epoch: 18 Global Step: 302330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:48:12,546-Speed 9055.23 samples/sec Loss 3.3443 LearningRate 0.0009 Epoch: 18 Global Step: 302340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:48:13,637-Speed 9389.71 samples/sec Loss 3.3492 LearningRate 0.0009 Epoch: 18 Global Step: 302350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:14,729-Speed 9387.64 samples/sec Loss 3.4190 LearningRate 0.0009 Epoch: 18 Global Step: 302360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:15,851-Speed 9129.76 samples/sec Loss 3.3238 LearningRate 0.0009 Epoch: 18 Global Step: 302370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:16,995-Speed 8961.21 samples/sec Loss 3.2824 LearningRate 0.0009 Epoch: 18 Global Step: 302380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:48:18,103-Speed 9242.49 samples/sec Loss 3.3172 LearningRate 0.0009 Epoch: 18 Global Step: 302390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:48:19,241-Speed 9004.73 samples/sec Loss 3.4601 LearningRate 0.0009 Epoch: 18 Global Step: 302400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:48:20,367-Speed 9098.76 samples/sec Loss 3.3714 LearningRate 0.0009 Epoch: 18 Global Step: 302410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:48:21,513-Speed 8941.72 samples/sec Loss 3.4187 LearningRate 0.0009 Epoch: 18 Global Step: 302420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:48:22,707-Speed 8584.97 samples/sec Loss 3.4100 LearningRate 0.0009 Epoch: 18 Global Step: 302430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:48:23,823-Speed 9172.21 samples/sec Loss 3.4027 LearningRate 0.0009 Epoch: 18 Global Step: 302440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:48:24,904-Speed 9492.32 samples/sec Loss 3.3383 LearningRate 0.0009 Epoch: 18 Global Step: 302450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:48:26,019-Speed 9195.63 samples/sec Loss 3.4154 LearningRate 0.0009 Epoch: 18 Global Step: 302460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:48:27,137-Speed 9158.04 samples/sec Loss 3.3957 LearningRate 0.0009 Epoch: 18 Global Step: 302470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:48:28,233-Speed 9352.41 samples/sec Loss 3.4137 LearningRate 0.0009 Epoch: 18 Global Step: 302480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:29,340-Speed 9253.06 samples/sec Loss 3.3548 LearningRate 0.0009 Epoch: 18 Global Step: 302490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:30,437-Speed 9341.83 samples/sec Loss 3.4146 LearningRate 0.0009 Epoch: 18 Global Step: 302500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:31,599-Speed 8817.30 samples/sec Loss 3.3201 LearningRate 0.0009 Epoch: 18 Global Step: 302510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:32,805-Speed 8492.93 samples/sec Loss 3.4354 LearningRate 0.0009 Epoch: 18 Global Step: 302520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:33,950-Speed 8952.91 samples/sec Loss 3.3926 LearningRate 0.0009 Epoch: 18 Global Step: 302530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:35,062-Speed 9222.21 samples/sec Loss 3.3439 LearningRate 0.0009 Epoch: 18 Global Step: 302540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:36,230-Speed 8767.03 samples/sec Loss 3.3257 LearningRate 0.0009 Epoch: 18 Global Step: 302550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:37,325-Speed 9356.08 samples/sec Loss 3.3577 LearningRate 0.0009 Epoch: 18 Global Step: 302560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:38,415-Speed 9401.99 samples/sec Loss 3.3429 LearningRate 0.0009 Epoch: 18 Global Step: 302570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:39,512-Speed 9338.11 samples/sec Loss 3.3912 LearningRate 0.0009 Epoch: 18 Global Step: 302580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:48:40,589-Speed 9517.99 samples/sec Loss 3.3837 LearningRate 0.0009 Epoch: 18 Global Step: 302590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:48:41,675-Speed 9435.58 samples/sec Loss 3.3529 LearningRate 0.0009 Epoch: 18 Global Step: 302600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:42,785-Speed 9235.80 samples/sec Loss 3.3191 LearningRate 0.0009 Epoch: 18 Global Step: 302610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:43,870-Speed 9437.65 samples/sec Loss 3.3154 LearningRate 0.0009 Epoch: 18 Global Step: 302620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:44,942-Speed 9555.85 samples/sec Loss 3.3955 LearningRate 0.0009 Epoch: 18 Global Step: 302630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:46,115-Speed 8736.80 samples/sec Loss 3.3513 LearningRate 0.0009 Epoch: 18 Global Step: 302640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:47,239-Speed 9117.66 samples/sec Loss 3.3965 LearningRate 0.0009 Epoch: 18 Global Step: 302650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:48,306-Speed 9602.20 samples/sec Loss 3.3909 LearningRate 0.0009 Epoch: 18 Global Step: 302660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:49,451-Speed 8942.75 samples/sec Loss 3.4113 LearningRate 0.0009 Epoch: 18 Global Step: 302670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:50,630-Speed 8694.57 samples/sec Loss 3.3123 LearningRate 0.0009 Epoch: 18 Global Step: 302680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:51,776-Speed 8936.93 samples/sec Loss 3.3306 LearningRate 0.0009 Epoch: 18 Global Step: 302690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:52,907-Speed 9065.77 samples/sec Loss 3.3712 LearningRate 0.0009 Epoch: 18 Global Step: 302700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:54,036-Speed 9079.17 samples/sec Loss 3.3897 LearningRate 0.0009 Epoch: 18 Global Step: 302710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:55,166-Speed 9067.02 samples/sec Loss 3.3944 LearningRate 0.0009 Epoch: 18 Global Step: 302720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:56,267-Speed 9308.12 samples/sec Loss 3.3545 LearningRate 0.0009 Epoch: 18 Global Step: 302730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:57,406-Speed 8995.01 samples/sec Loss 3.3473 LearningRate 0.0009 Epoch: 18 Global Step: 302740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:58,508-Speed 9289.95 samples/sec Loss 3.4069 LearningRate 0.0009 Epoch: 18 Global Step: 302750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:48:59,593-Speed 9450.15 samples/sec Loss 3.3376 LearningRate 0.0009 Epoch: 18 Global Step: 302760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:00,748-Speed 8866.64 samples/sec Loss 3.3466 LearningRate 0.0009 Epoch: 18 Global Step: 302770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:01,878-Speed 9067.83 samples/sec Loss 3.3863 LearningRate 0.0009 Epoch: 18 Global Step: 302780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:02,983-Speed 9273.68 samples/sec Loss 3.3600 LearningRate 0.0009 Epoch: 18 Global Step: 302790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:04,088-Speed 9278.55 samples/sec Loss 3.3859 LearningRate 0.0009 Epoch: 18 Global Step: 302800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:49:05,175-Speed 9424.48 samples/sec Loss 3.3789 LearningRate 0.0009 Epoch: 18 Global Step: 302810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:06,341-Speed 8781.44 samples/sec Loss 3.3289 LearningRate 0.0009 Epoch: 18 Global Step: 302820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:07,504-Speed 8812.83 samples/sec Loss 3.3749 LearningRate 0.0009 Epoch: 18 Global Step: 302830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:08,642-Speed 9006.06 samples/sec Loss 3.3996 LearningRate 0.0009 Epoch: 18 Global Step: 302840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:09,812-Speed 8752.16 samples/sec Loss 3.3743 LearningRate 0.0009 Epoch: 18 Global Step: 302850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:10,977-Speed 8802.30 samples/sec Loss 3.3896 LearningRate 0.0009 Epoch: 18 Global Step: 302860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:12,072-Speed 9353.95 samples/sec Loss 3.3687 LearningRate 0.0009 Epoch: 18 Global Step: 302870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:13,213-Speed 8984.95 samples/sec Loss 3.3897 LearningRate 0.0009 Epoch: 18 Global Step: 302880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:14,282-Speed 9576.61 samples/sec Loss 3.4062 LearningRate 0.0009 Epoch: 18 Global Step: 302890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:15,396-Speed 9198.09 samples/sec Loss 3.3350 LearningRate 0.0009 Epoch: 18 Global Step: 302900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:16,488-Speed 9382.71 samples/sec Loss 3.3813 LearningRate 0.0009 Epoch: 18 Global Step: 302910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:17,607-Speed 9165.43 samples/sec Loss 3.3909 LearningRate 0.0009 Epoch: 18 Global Step: 302920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:18,724-Speed 9170.33 samples/sec Loss 3.3508 LearningRate 0.0009 Epoch: 18 Global Step: 302930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:19,844-Speed 9144.71 samples/sec Loss 3.3344 LearningRate 0.0009 Epoch: 18 Global Step: 302940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:20,931-Speed 9426.29 samples/sec Loss 3.3382 LearningRate 0.0009 Epoch: 18 Global Step: 302950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:22,032-Speed 9309.39 samples/sec Loss 3.2762 LearningRate 0.0009 Epoch: 18 Global Step: 302960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:23,167-Speed 9028.44 samples/sec Loss 3.2396 LearningRate 0.0009 Epoch: 18 Global Step: 302970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:24,275-Speed 9248.68 samples/sec Loss 3.3607 LearningRate 0.0009 Epoch: 18 Global Step: 302980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:49:25,429-Speed 8877.33 samples/sec Loss 3.3362 LearningRate 0.0009 Epoch: 18 Global Step: 302990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:49:26,548-Speed 9160.91 samples/sec Loss 3.3788 LearningRate 0.0009 Epoch: 18 Global Step: 303000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:49:27,714-Speed 8786.91 samples/sec Loss 3.3588 LearningRate 0.0009 Epoch: 18 Global Step: 303010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:49:28,794-Speed 9483.69 samples/sec Loss 3.3105 LearningRate 0.0009 Epoch: 18 Global Step: 303020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:49:29,883-Speed 9409.47 samples/sec Loss 3.4189 LearningRate 0.0009 Epoch: 18 Global Step: 303030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:49:30,985-Speed 9302.84 samples/sec Loss 3.3720 LearningRate 0.0009 Epoch: 18 Global Step: 303040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:49:32,095-Speed 9226.51 samples/sec Loss 3.3584 LearningRate 0.0008 Epoch: 18 Global Step: 303050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:49:33,221-Speed 9104.73 samples/sec Loss 3.3314 LearningRate 0.0008 Epoch: 18 Global Step: 303060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:49:34,355-Speed 9038.07 samples/sec Loss 3.3783 LearningRate 0.0008 Epoch: 18 Global Step: 303070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:49:35,512-Speed 8855.65 samples/sec Loss 3.4129 LearningRate 0.0008 Epoch: 18 Global Step: 303080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:36,620-Speed 9245.50 samples/sec Loss 3.3784 LearningRate 0.0008 Epoch: 18 Global Step: 303090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:37,747-Speed 9094.80 samples/sec Loss 3.3756 LearningRate 0.0008 Epoch: 18 Global Step: 303100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:38,874-Speed 9085.19 samples/sec Loss 3.3805 LearningRate 0.0008 Epoch: 18 Global Step: 303110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:39,986-Speed 9216.87 samples/sec Loss 3.4169 LearningRate 0.0008 Epoch: 18 Global Step: 303120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:41,083-Speed 9341.02 samples/sec Loss 3.2689 LearningRate 0.0008 Epoch: 18 Global Step: 303130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:42,229-Speed 8939.90 samples/sec Loss 3.3381 LearningRate 0.0008 Epoch: 18 Global Step: 303140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:43,365-Speed 9019.89 samples/sec Loss 3.3345 LearningRate 0.0008 Epoch: 18 Global Step: 303150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:44,513-Speed 8924.31 samples/sec Loss 3.3386 LearningRate 0.0008 Epoch: 18 Global Step: 303160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:45,674-Speed 8823.06 samples/sec Loss 3.3747 LearningRate 0.0008 Epoch: 18 Global Step: 303170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:46,825-Speed 8901.95 samples/sec Loss 3.3227 LearningRate 0.0008 Epoch: 18 Global Step: 303180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:47,984-Speed 8840.20 samples/sec Loss 3.3297 LearningRate 0.0008 Epoch: 18 Global Step: 303190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:49,126-Speed 8971.10 samples/sec Loss 3.3921 LearningRate 0.0008 Epoch: 18 Global Step: 303200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:50,183-Speed 9697.28 samples/sec Loss 3.3627 LearningRate 0.0008 Epoch: 18 Global Step: 303210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:51,281-Speed 9330.13 samples/sec Loss 3.3575 LearningRate 0.0008 Epoch: 18 Global Step: 303220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:52,431-Speed 8914.23 samples/sec Loss 3.3317 LearningRate 0.0008 Epoch: 18 Global Step: 303230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:53,627-Speed 8562.70 samples/sec Loss 3.3066 LearningRate 0.0008 Epoch: 18 Global Step: 303240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:54,721-Speed 9372.53 samples/sec Loss 3.3832 LearningRate 0.0008 Epoch: 18 Global Step: 303250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:55,779-Speed 9678.37 samples/sec Loss 3.3399 LearningRate 0.0008 Epoch: 18 Global Step: 303260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:56,869-Speed 9401.58 samples/sec Loss 3.3811 LearningRate 0.0008 Epoch: 18 Global Step: 303270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:49:57,980-Speed 9221.16 samples/sec Loss 3.4247 LearningRate 0.0008 Epoch: 18 Global Step: 303280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:49:59,126-Speed 8937.69 samples/sec Loss 3.4118 LearningRate 0.0008 Epoch: 18 Global Step: 303290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:00,219-Speed 9376.68 samples/sec Loss 3.4570 LearningRate 0.0008 Epoch: 18 Global Step: 303300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:01,341-Speed 9128.46 samples/sec Loss 3.2830 LearningRate 0.0008 Epoch: 18 Global Step: 303310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:02,482-Speed 8978.20 samples/sec Loss 3.3313 LearningRate 0.0008 Epoch: 18 Global Step: 303320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:03,619-Speed 9012.69 samples/sec Loss 3.3961 LearningRate 0.0008 Epoch: 18 Global Step: 303330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:04,790-Speed 8757.06 samples/sec Loss 3.3847 LearningRate 0.0008 Epoch: 18 Global Step: 303340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:05,911-Speed 9139.29 samples/sec Loss 3.3523 LearningRate 0.0008 Epoch: 18 Global Step: 303350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:07,015-Speed 9282.09 samples/sec Loss 3.3924 LearningRate 0.0008 Epoch: 18 Global Step: 303360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:08,166-Speed 8900.42 samples/sec Loss 3.2671 LearningRate 0.0008 Epoch: 18 Global Step: 303370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:09,317-Speed 8906.10 samples/sec Loss 3.3736 LearningRate 0.0008 Epoch: 18 Global Step: 303380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:10,460-Speed 8957.46 samples/sec Loss 3.3688 LearningRate 0.0008 Epoch: 18 Global Step: 303390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:11,518-Speed 9687.61 samples/sec Loss 3.3407 LearningRate 0.0008 Epoch: 18 Global Step: 303400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:12,613-Speed 9360.48 samples/sec Loss 3.3618 LearningRate 0.0008 Epoch: 18 Global Step: 303410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:13,765-Speed 8893.57 samples/sec Loss 3.3897 LearningRate 0.0008 Epoch: 18 Global Step: 303420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:14,903-Speed 9006.44 samples/sec Loss 3.3454 LearningRate 0.0008 Epoch: 18 Global Step: 303430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:15,977-Speed 9537.62 samples/sec Loss 3.2302 LearningRate 0.0008 Epoch: 18 Global Step: 303440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:17,091-Speed 9197.14 samples/sec Loss 3.4742 LearningRate 0.0008 Epoch: 18 Global Step: 303450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:18,250-Speed 8840.30 samples/sec Loss 3.3175 LearningRate 0.0008 Epoch: 18 Global Step: 303460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:19,370-Speed 9148.96 samples/sec Loss 3.3968 LearningRate 0.0008 Epoch: 18 Global Step: 303470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:20,527-Speed 8851.17 samples/sec Loss 3.3853 LearningRate 0.0008 Epoch: 18 Global Step: 303480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:21,605-Speed 9506.00 samples/sec Loss 3.3152 LearningRate 0.0008 Epoch: 18 Global Step: 303490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:22,788-Speed 8659.13 samples/sec Loss 3.3829 LearningRate 0.0008 Epoch: 18 Global Step: 303500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:23,999-Speed 8464.06 samples/sec Loss 3.3481 LearningRate 0.0008 Epoch: 18 Global Step: 303510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:25,131-Speed 9056.39 samples/sec Loss 3.4337 LearningRate 0.0008 Epoch: 18 Global Step: 303520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:26,279-Speed 8921.31 samples/sec Loss 3.3560 LearningRate 0.0008 Epoch: 18 Global Step: 303530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:27,409-Speed 9067.20 samples/sec Loss 3.3902 LearningRate 0.0008 Epoch: 18 Global Step: 303540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:28,526-Speed 9177.38 samples/sec Loss 3.3745 LearningRate 0.0008 Epoch: 18 Global Step: 303550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:29,661-Speed 9028.73 samples/sec Loss 3.4316 LearningRate 0.0008 Epoch: 18 Global Step: 303560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:30,789-Speed 9080.45 samples/sec Loss 3.3306 LearningRate 0.0008 Epoch: 18 Global Step: 303570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:31,882-Speed 9370.68 samples/sec Loss 3.3667 LearningRate 0.0008 Epoch: 18 Global Step: 303580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:33,000-Speed 9170.19 samples/sec Loss 3.3537 LearningRate 0.0008 Epoch: 18 Global Step: 303590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:50:34,068-Speed 9598.52 samples/sec Loss 3.4062 LearningRate 0.0008 Epoch: 18 Global Step: 303600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:35,214-Speed 8934.07 samples/sec Loss 3.2476 LearningRate 0.0008 Epoch: 18 Global Step: 303610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:36,351-Speed 9015.82 samples/sec Loss 3.3527 LearningRate 0.0008 Epoch: 18 Global Step: 303620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:37,486-Speed 9023.53 samples/sec Loss 3.3899 LearningRate 0.0008 Epoch: 18 Global Step: 303630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:38,658-Speed 8739.64 samples/sec Loss 3.3024 LearningRate 0.0008 Epoch: 18 Global Step: 303640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:39,747-Speed 9408.35 samples/sec Loss 3.4272 LearningRate 0.0008 Epoch: 18 Global Step: 303650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:40,848-Speed 9314.49 samples/sec Loss 3.3422 LearningRate 0.0008 Epoch: 18 Global Step: 303660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:41,948-Speed 9316.07 samples/sec Loss 3.4043 LearningRate 0.0008 Epoch: 18 Global Step: 303670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:43,102-Speed 8876.30 samples/sec Loss 3.3749 LearningRate 0.0008 Epoch: 18 Global Step: 303680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:44,252-Speed 8906.23 samples/sec Loss 3.4492 LearningRate 0.0008 Epoch: 18 Global Step: 303690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:45,360-Speed 9249.15 samples/sec Loss 3.3933 LearningRate 0.0008 Epoch: 18 Global Step: 303700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:50:46,500-Speed 8990.79 samples/sec Loss 3.3845 LearningRate 0.0008 Epoch: 18 Global Step: 303710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:47,633-Speed 9039.97 samples/sec Loss 3.4423 LearningRate 0.0008 Epoch: 18 Global Step: 303720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:48,742-Speed 9235.29 samples/sec Loss 3.3819 LearningRate 0.0008 Epoch: 18 Global Step: 303730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:49,877-Speed 9028.97 samples/sec Loss 3.4440 LearningRate 0.0008 Epoch: 18 Global Step: 303740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:50,992-Speed 9190.47 samples/sec Loss 3.4074 LearningRate 0.0008 Epoch: 18 Global Step: 303750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:52,090-Speed 9331.45 samples/sec Loss 3.3157 LearningRate 0.0008 Epoch: 18 Global Step: 303760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:53,240-Speed 8911.47 samples/sec Loss 3.4301 LearningRate 0.0008 Epoch: 18 Global Step: 303770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:54,394-Speed 8879.75 samples/sec Loss 3.3797 LearningRate 0.0008 Epoch: 18 Global Step: 303780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:55,486-Speed 9379.75 samples/sec Loss 3.3627 LearningRate 0.0008 Epoch: 18 Global Step: 303790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:56,573-Speed 9421.70 samples/sec Loss 3.4119 LearningRate 0.0008 Epoch: 18 Global Step: 303800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:57,684-Speed 9227.70 samples/sec Loss 3.4188 LearningRate 0.0008 Epoch: 18 Global Step: 303810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:58,860-Speed 8708.01 samples/sec Loss 3.4041 LearningRate 0.0008 Epoch: 18 Global Step: 303820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:50:59,992-Speed 9053.12 samples/sec Loss 3.3638 LearningRate 0.0008 Epoch: 18 Global Step: 303830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:51:01,073-Speed 9474.91 samples/sec Loss 3.4045 LearningRate 0.0008 Epoch: 18 Global Step: 303840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:51:02,215-Speed 8976.81 samples/sec Loss 3.3358 LearningRate 0.0008 Epoch: 18 Global Step: 303850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:51:03,327-Speed 9212.80 samples/sec Loss 3.4394 LearningRate 0.0008 Epoch: 18 Global Step: 303860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:51:04,424-Speed 9341.01 samples/sec Loss 3.3247 LearningRate 0.0008 Epoch: 18 Global Step: 303870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:51:05,530-Speed 9262.69 samples/sec Loss 3.4227 LearningRate 0.0008 Epoch: 18 Global Step: 303880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:51:06,607-Speed 9533.12 samples/sec Loss 3.3181 LearningRate 0.0008 Epoch: 18 Global Step: 303890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:51:07,745-Speed 9005.23 samples/sec Loss 3.4997 LearningRate 0.0008 Epoch: 18 Global Step: 303900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:51:08,856-Speed 9224.54 samples/sec Loss 3.3588 LearningRate 0.0008 Epoch: 18 Global Step: 303910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:51:09,957-Speed 9300.48 samples/sec Loss 3.3529 LearningRate 0.0008 Epoch: 18 Global Step: 303920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:51:11,071-Speed 9201.38 samples/sec Loss 3.4347 LearningRate 0.0008 Epoch: 18 Global Step: 303930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:51:12,180-Speed 9244.63 samples/sec Loss 3.3582 LearningRate 0.0008 Epoch: 18 Global Step: 303940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:51:13,293-Speed 9202.91 samples/sec Loss 3.3761 LearningRate 0.0008 Epoch: 18 Global Step: 303950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:51:14,446-Speed 8883.48 samples/sec Loss 3.3308 LearningRate 0.0008 Epoch: 18 Global Step: 303960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:51:15,580-Speed 9035.80 samples/sec Loss 3.3258 LearningRate 0.0008 Epoch: 18 Global Step: 303970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:51:16,666-Speed 9437.38 samples/sec Loss 3.4147 LearningRate 0.0008 Epoch: 18 Global Step: 303980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:51:17,800-Speed 9036.00 samples/sec Loss 3.3320 LearningRate 0.0008 Epoch: 18 Global Step: 303990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:51:18,959-Speed 8835.61 samples/sec Loss 3.3686 LearningRate 0.0008 Epoch: 18 Global Step: 304000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:51:41,136-[lfw][304000]XNorm: 6.605141 Training: 2022-04-11 23:51:41,137-[lfw][304000]Accuracy-Flip: 0.99633+-0.00233 Training: 2022-04-11 23:51:41,138-[lfw][304000]Accuracy-Highest: 0.99750 Training: 2022-04-11 23:52:06,653-[cfp_fp][304000]XNorm: 5.764245 Training: 2022-04-11 23:52:06,654-[cfp_fp][304000]Accuracy-Flip: 0.97286+-0.00777 Training: 2022-04-11 23:52:06,654-[cfp_fp][304000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:52:28,635-[agedb_30][304000]XNorm: 6.438392 Training: 2022-04-11 23:52:28,636-[agedb_30][304000]Accuracy-Flip: 0.97383+-0.00823 Training: 2022-04-11 23:52:28,637-[agedb_30][304000]Accuracy-Highest: 0.97417 Training: 2022-04-11 23:52:29,797-Speed 144.56 samples/sec Loss 3.3829 LearningRate 0.0008 Epoch: 18 Global Step: 304010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:30,903-Speed 9261.35 samples/sec Loss 3.2926 LearningRate 0.0008 Epoch: 18 Global Step: 304020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:32,010-Speed 9250.46 samples/sec Loss 3.3262 LearningRate 0.0008 Epoch: 18 Global Step: 304030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:33,132-Speed 9131.21 samples/sec Loss 3.3630 LearningRate 0.0008 Epoch: 18 Global Step: 304040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:34,255-Speed 9130.70 samples/sec Loss 3.4174 LearningRate 0.0008 Epoch: 18 Global Step: 304050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:35,361-Speed 9262.17 samples/sec Loss 3.3732 LearningRate 0.0008 Epoch: 18 Global Step: 304060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:36,518-Speed 8855.57 samples/sec Loss 3.3586 LearningRate 0.0008 Epoch: 18 Global Step: 304070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:37,661-Speed 8968.36 samples/sec Loss 3.4312 LearningRate 0.0008 Epoch: 18 Global Step: 304080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:38,824-Speed 8802.86 samples/sec Loss 3.4036 LearningRate 0.0008 Epoch: 18 Global Step: 304090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:39,939-Speed 9193.92 samples/sec Loss 3.3391 LearningRate 0.0008 Epoch: 18 Global Step: 304100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:52:41,031-Speed 9387.11 samples/sec Loss 3.3185 LearningRate 0.0008 Epoch: 18 Global Step: 304110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:42,164-Speed 9039.44 samples/sec Loss 3.4142 LearningRate 0.0008 Epoch: 18 Global Step: 304120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:43,390-Speed 8356.95 samples/sec Loss 3.4670 LearningRate 0.0008 Epoch: 18 Global Step: 304130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:44,558-Speed 8770.46 samples/sec Loss 3.3813 LearningRate 0.0008 Epoch: 18 Global Step: 304140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:45,705-Speed 8929.94 samples/sec Loss 3.4032 LearningRate 0.0008 Epoch: 18 Global Step: 304150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:46,820-Speed 9191.27 samples/sec Loss 3.3257 LearningRate 0.0008 Epoch: 18 Global Step: 304160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:47,952-Speed 9049.44 samples/sec Loss 3.3945 LearningRate 0.0008 Epoch: 18 Global Step: 304170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:49,134-Speed 8670.61 samples/sec Loss 3.4079 LearningRate 0.0008 Epoch: 18 Global Step: 304180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:50,225-Speed 9391.53 samples/sec Loss 3.3503 LearningRate 0.0008 Epoch: 18 Global Step: 304190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:51,304-Speed 9492.36 samples/sec Loss 3.3429 LearningRate 0.0008 Epoch: 18 Global Step: 304200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:52,419-Speed 9190.63 samples/sec Loss 3.3861 LearningRate 0.0008 Epoch: 18 Global Step: 304210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:52:53,507-Speed 9422.88 samples/sec Loss 3.3223 LearningRate 0.0008 Epoch: 18 Global Step: 304220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:54,705-Speed 8548.85 samples/sec Loss 3.3115 LearningRate 0.0008 Epoch: 18 Global Step: 304230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:55,799-Speed 9371.29 samples/sec Loss 3.3299 LearningRate 0.0008 Epoch: 18 Global Step: 304240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:56,898-Speed 9316.63 samples/sec Loss 3.3680 LearningRate 0.0008 Epoch: 18 Global Step: 304250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:57,993-Speed 9356.04 samples/sec Loss 3.4706 LearningRate 0.0008 Epoch: 18 Global Step: 304260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:52:59,108-Speed 9190.36 samples/sec Loss 3.4124 LearningRate 0.0008 Epoch: 18 Global Step: 304270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:00,223-Speed 9191.88 samples/sec Loss 3.3842 LearningRate 0.0008 Epoch: 18 Global Step: 304280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:01,332-Speed 9237.37 samples/sec Loss 3.3574 LearningRate 0.0008 Epoch: 18 Global Step: 304290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:02,458-Speed 9094.87 samples/sec Loss 3.3969 LearningRate 0.0008 Epoch: 18 Global Step: 304300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:03,603-Speed 8952.17 samples/sec Loss 3.3364 LearningRate 0.0008 Epoch: 18 Global Step: 304310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:04,710-Speed 9261.39 samples/sec Loss 3.4508 LearningRate 0.0008 Epoch: 18 Global Step: 304320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:53:05,851-Speed 8979.02 samples/sec Loss 3.3331 LearningRate 0.0008 Epoch: 18 Global Step: 304330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:06,967-Speed 9179.76 samples/sec Loss 3.3533 LearningRate 0.0008 Epoch: 18 Global Step: 304340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:08,088-Speed 9135.65 samples/sec Loss 3.3571 LearningRate 0.0008 Epoch: 18 Global Step: 304350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:09,238-Speed 8912.89 samples/sec Loss 3.4003 LearningRate 0.0008 Epoch: 18 Global Step: 304360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:10,379-Speed 8973.88 samples/sec Loss 3.3159 LearningRate 0.0008 Epoch: 18 Global Step: 304370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:11,507-Speed 9087.52 samples/sec Loss 3.4169 LearningRate 0.0008 Epoch: 18 Global Step: 304380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:12,606-Speed 9325.90 samples/sec Loss 3.3533 LearningRate 0.0008 Epoch: 18 Global Step: 304390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:13,736-Speed 9068.06 samples/sec Loss 3.3298 LearningRate 0.0008 Epoch: 18 Global Step: 304400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:14,852-Speed 9177.72 samples/sec Loss 3.4409 LearningRate 0.0008 Epoch: 18 Global Step: 304410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:15,957-Speed 9274.53 samples/sec Loss 3.3986 LearningRate 0.0008 Epoch: 18 Global Step: 304420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:17,076-Speed 9157.76 samples/sec Loss 3.4269 LearningRate 0.0008 Epoch: 18 Global Step: 304430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:53:18,177-Speed 9307.54 samples/sec Loss 3.3026 LearningRate 0.0008 Epoch: 18 Global Step: 304440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:19,257-Speed 9484.15 samples/sec Loss 3.3026 LearningRate 0.0008 Epoch: 18 Global Step: 304450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:20,311-Speed 9717.76 samples/sec Loss 3.4755 LearningRate 0.0008 Epoch: 18 Global Step: 304460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:21,438-Speed 9087.75 samples/sec Loss 3.3242 LearningRate 0.0008 Epoch: 18 Global Step: 304470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:22,616-Speed 8703.52 samples/sec Loss 3.2760 LearningRate 0.0008 Epoch: 18 Global Step: 304480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:23,719-Speed 9287.34 samples/sec Loss 3.3798 LearningRate 0.0008 Epoch: 18 Global Step: 304490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:24,807-Speed 9414.80 samples/sec Loss 3.4141 LearningRate 0.0008 Epoch: 18 Global Step: 304500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:25,883-Speed 9522.35 samples/sec Loss 3.3466 LearningRate 0.0008 Epoch: 18 Global Step: 304510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:27,033-Speed 8907.96 samples/sec Loss 3.4566 LearningRate 0.0008 Epoch: 18 Global Step: 304520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:28,211-Speed 8699.20 samples/sec Loss 3.3436 LearningRate 0.0008 Epoch: 18 Global Step: 304530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:29,356-Speed 8955.50 samples/sec Loss 3.4065 LearningRate 0.0008 Epoch: 18 Global Step: 304540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:53:30,445-Speed 9406.60 samples/sec Loss 3.3910 LearningRate 0.0008 Epoch: 18 Global Step: 304550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:31,568-Speed 9124.03 samples/sec Loss 3.3159 LearningRate 0.0008 Epoch: 18 Global Step: 304560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:32,714-Speed 8941.94 samples/sec Loss 3.4276 LearningRate 0.0008 Epoch: 18 Global Step: 304570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:33,850-Speed 9018.32 samples/sec Loss 3.4420 LearningRate 0.0008 Epoch: 18 Global Step: 304580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:34,994-Speed 8961.78 samples/sec Loss 3.3665 LearningRate 0.0008 Epoch: 18 Global Step: 304590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:36,079-Speed 9442.10 samples/sec Loss 3.3980 LearningRate 0.0008 Epoch: 18 Global Step: 304600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:37,191-Speed 9211.67 samples/sec Loss 3.3553 LearningRate 0.0008 Epoch: 18 Global Step: 304610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:38,285-Speed 9370.87 samples/sec Loss 3.4477 LearningRate 0.0008 Epoch: 18 Global Step: 304620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:39,442-Speed 8850.66 samples/sec Loss 3.3942 LearningRate 0.0008 Epoch: 18 Global Step: 304630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:40,560-Speed 9165.73 samples/sec Loss 3.3630 LearningRate 0.0008 Epoch: 18 Global Step: 304640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:41,711-Speed 8904.61 samples/sec Loss 3.3208 LearningRate 0.0008 Epoch: 18 Global Step: 304650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:53:42,813-Speed 9296.52 samples/sec Loss 3.4175 LearningRate 0.0008 Epoch: 18 Global Step: 304660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:43,994-Speed 8674.14 samples/sec Loss 3.4338 LearningRate 0.0008 Epoch: 18 Global Step: 304670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:45,132-Speed 9006.87 samples/sec Loss 3.3833 LearningRate 0.0008 Epoch: 18 Global Step: 304680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:46,274-Speed 8966.39 samples/sec Loss 3.3044 LearningRate 0.0008 Epoch: 18 Global Step: 304690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:47,381-Speed 9257.81 samples/sec Loss 3.3911 LearningRate 0.0008 Epoch: 18 Global Step: 304700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:48,513-Speed 9052.20 samples/sec Loss 3.4124 LearningRate 0.0008 Epoch: 18 Global Step: 304710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:49,636-Speed 9123.80 samples/sec Loss 3.3450 LearningRate 0.0008 Epoch: 18 Global Step: 304720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:50,742-Speed 9263.26 samples/sec Loss 3.3586 LearningRate 0.0008 Epoch: 18 Global Step: 304730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:51,860-Speed 9166.80 samples/sec Loss 3.4298 LearningRate 0.0008 Epoch: 18 Global Step: 304740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:52,997-Speed 9011.00 samples/sec Loss 3.3118 LearningRate 0.0008 Epoch: 18 Global Step: 304750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:54,108-Speed 9218.90 samples/sec Loss 3.3652 LearningRate 0.0008 Epoch: 18 Global Step: 304760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:55,226-Speed 9164.32 samples/sec Loss 3.3612 LearningRate 0.0008 Epoch: 18 Global Step: 304770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:56,317-Speed 9390.79 samples/sec Loss 3.4264 LearningRate 0.0008 Epoch: 18 Global Step: 304780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:57,415-Speed 9333.98 samples/sec Loss 3.4228 LearningRate 0.0008 Epoch: 18 Global Step: 304790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:58,538-Speed 9122.76 samples/sec Loss 3.4347 LearningRate 0.0008 Epoch: 18 Global Step: 304800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:53:59,627-Speed 9409.90 samples/sec Loss 3.4869 LearningRate 0.0008 Epoch: 18 Global Step: 304810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:00,765-Speed 9005.19 samples/sec Loss 3.3392 LearningRate 0.0008 Epoch: 18 Global Step: 304820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:01,869-Speed 9276.15 samples/sec Loss 3.3459 LearningRate 0.0008 Epoch: 18 Global Step: 304830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:03,000-Speed 9059.19 samples/sec Loss 3.4348 LearningRate 0.0008 Epoch: 18 Global Step: 304840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:04,103-Speed 9291.05 samples/sec Loss 3.4408 LearningRate 0.0008 Epoch: 18 Global Step: 304850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:05,200-Speed 9340.73 samples/sec Loss 3.3953 LearningRate 0.0008 Epoch: 18 Global Step: 304860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:06,310-Speed 9230.14 samples/sec Loss 3.3646 LearningRate 0.0008 Epoch: 18 Global Step: 304870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:07,412-Speed 9305.84 samples/sec Loss 3.4002 LearningRate 0.0008 Epoch: 18 Global Step: 304880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:08,515-Speed 9284.66 samples/sec Loss 3.3798 LearningRate 0.0008 Epoch: 18 Global Step: 304890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:09,619-Speed 9280.88 samples/sec Loss 3.3990 LearningRate 0.0008 Epoch: 18 Global Step: 304900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:10,740-Speed 9145.41 samples/sec Loss 3.4084 LearningRate 0.0008 Epoch: 18 Global Step: 304910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:11,909-Speed 8765.95 samples/sec Loss 3.4174 LearningRate 0.0007 Epoch: 18 Global Step: 304920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:12,994-Speed 9435.39 samples/sec Loss 3.3153 LearningRate 0.0007 Epoch: 18 Global Step: 304930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:14,099-Speed 9277.11 samples/sec Loss 3.3143 LearningRate 0.0007 Epoch: 18 Global Step: 304940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:15,207-Speed 9245.18 samples/sec Loss 3.4270 LearningRate 0.0007 Epoch: 18 Global Step: 304950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:16,320-Speed 9203.65 samples/sec Loss 3.4419 LearningRate 0.0007 Epoch: 18 Global Step: 304960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:54:17,433-Speed 9208.21 samples/sec Loss 3.4024 LearningRate 0.0007 Epoch: 18 Global Step: 304970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:18,574-Speed 8980.88 samples/sec Loss 3.3638 LearningRate 0.0007 Epoch: 18 Global Step: 304980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:19,739-Speed 8791.15 samples/sec Loss 3.3907 LearningRate 0.0007 Epoch: 18 Global Step: 304990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:20,825-Speed 9438.12 samples/sec Loss 3.4343 LearningRate 0.0007 Epoch: 18 Global Step: 305000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:21,928-Speed 9287.19 samples/sec Loss 3.4254 LearningRate 0.0007 Epoch: 18 Global Step: 305010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:23,062-Speed 9036.27 samples/sec Loss 3.3521 LearningRate 0.0007 Epoch: 18 Global Step: 305020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:24,233-Speed 8748.68 samples/sec Loss 3.4055 LearningRate 0.0007 Epoch: 18 Global Step: 305030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:25,366-Speed 9050.68 samples/sec Loss 3.3063 LearningRate 0.0007 Epoch: 18 Global Step: 305040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:26,478-Speed 9207.64 samples/sec Loss 3.3542 LearningRate 0.0007 Epoch: 18 Global Step: 305050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:27,653-Speed 8722.95 samples/sec Loss 3.3765 LearningRate 0.0007 Epoch: 18 Global Step: 305060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:28,777-Speed 9115.54 samples/sec Loss 3.4175 LearningRate 0.0007 Epoch: 18 Global Step: 305070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:29,889-Speed 9209.33 samples/sec Loss 3.4471 LearningRate 0.0007 Epoch: 18 Global Step: 305080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:31,028-Speed 9001.31 samples/sec Loss 3.3753 LearningRate 0.0007 Epoch: 18 Global Step: 305090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:32,147-Speed 9151.98 samples/sec Loss 3.3863 LearningRate 0.0007 Epoch: 18 Global Step: 305100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:33,292-Speed 8950.02 samples/sec Loss 3.4888 LearningRate 0.0007 Epoch: 18 Global Step: 305110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:34,439-Speed 8936.32 samples/sec Loss 3.3065 LearningRate 0.0007 Epoch: 18 Global Step: 305120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:35,564-Speed 9107.43 samples/sec Loss 3.4381 LearningRate 0.0007 Epoch: 18 Global Step: 305130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:36,674-Speed 9230.98 samples/sec Loss 3.3831 LearningRate 0.0007 Epoch: 18 Global Step: 305140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:37,802-Speed 9082.09 samples/sec Loss 3.4412 LearningRate 0.0007 Epoch: 18 Global Step: 305150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:38,968-Speed 8780.44 samples/sec Loss 3.3241 LearningRate 0.0007 Epoch: 18 Global Step: 305160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:40,124-Speed 8865.37 samples/sec Loss 3.4168 LearningRate 0.0007 Epoch: 18 Global Step: 305170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:54:41,258-Speed 9039.76 samples/sec Loss 3.4074 LearningRate 0.0007 Epoch: 18 Global Step: 305180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:54:42,381-Speed 9126.62 samples/sec Loss 3.3759 LearningRate 0.0007 Epoch: 18 Global Step: 305190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:43,492-Speed 9218.90 samples/sec Loss 3.3973 LearningRate 0.0007 Epoch: 18 Global Step: 305200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:44,584-Speed 9382.50 samples/sec Loss 3.4013 LearningRate 0.0007 Epoch: 18 Global Step: 305210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:45,642-Speed 9686.15 samples/sec Loss 3.4653 LearningRate 0.0007 Epoch: 18 Global Step: 305220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:46,749-Speed 9252.99 samples/sec Loss 3.3980 LearningRate 0.0007 Epoch: 18 Global Step: 305230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:47,898-Speed 8917.43 samples/sec Loss 3.3231 LearningRate 0.0007 Epoch: 18 Global Step: 305240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:49,024-Speed 9101.60 samples/sec Loss 3.3310 LearningRate 0.0007 Epoch: 18 Global Step: 305250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:50,148-Speed 9112.85 samples/sec Loss 3.3997 LearningRate 0.0007 Epoch: 18 Global Step: 305260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:51,264-Speed 9177.24 samples/sec Loss 3.4093 LearningRate 0.0007 Epoch: 18 Global Step: 305270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:52,373-Speed 9242.31 samples/sec Loss 3.3900 LearningRate 0.0007 Epoch: 18 Global Step: 305280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:53,521-Speed 8931.27 samples/sec Loss 3.3849 LearningRate 0.0007 Epoch: 18 Global Step: 305290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:54:54,701-Speed 8682.56 samples/sec Loss 3.3199 LearningRate 0.0007 Epoch: 18 Global Step: 305300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:54:55,825-Speed 9119.59 samples/sec Loss 3.3181 LearningRate 0.0007 Epoch: 18 Global Step: 305310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:56,917-Speed 9384.36 samples/sec Loss 3.3334 LearningRate 0.0007 Epoch: 18 Global Step: 305320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:57,994-Speed 9511.39 samples/sec Loss 3.2369 LearningRate 0.0007 Epoch: 18 Global Step: 305330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:54:59,110-Speed 9182.09 samples/sec Loss 3.3641 LearningRate 0.0007 Epoch: 18 Global Step: 305340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:00,201-Speed 9384.04 samples/sec Loss 3.3595 LearningRate 0.0007 Epoch: 18 Global Step: 305350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:01,307-Speed 9263.04 samples/sec Loss 3.4295 LearningRate 0.0007 Epoch: 18 Global Step: 305360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:02,425-Speed 9174.92 samples/sec Loss 3.3557 LearningRate 0.0007 Epoch: 18 Global Step: 305370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:03,500-Speed 9529.85 samples/sec Loss 3.3830 LearningRate 0.0007 Epoch: 18 Global Step: 305380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:04,635-Speed 9025.79 samples/sec Loss 3.4053 LearningRate 0.0007 Epoch: 18 Global Step: 305390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:05,746-Speed 9226.10 samples/sec Loss 3.3230 LearningRate 0.0007 Epoch: 18 Global Step: 305400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:06,870-Speed 9112.63 samples/sec Loss 3.3344 LearningRate 0.0007 Epoch: 18 Global Step: 305410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:55:07,977-Speed 9255.36 samples/sec Loss 3.3940 LearningRate 0.0007 Epoch: 18 Global Step: 305420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:55:09,077-Speed 9317.40 samples/sec Loss 3.3346 LearningRate 0.0007 Epoch: 18 Global Step: 305430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:10,170-Speed 9388.20 samples/sec Loss 3.3753 LearningRate 0.0007 Epoch: 18 Global Step: 305440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:11,316-Speed 8946.09 samples/sec Loss 3.4098 LearningRate 0.0007 Epoch: 18 Global Step: 305450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:12,410-Speed 9367.09 samples/sec Loss 3.4215 LearningRate 0.0007 Epoch: 18 Global Step: 305460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:13,541-Speed 9054.55 samples/sec Loss 3.4131 LearningRate 0.0007 Epoch: 18 Global Step: 305470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:14,741-Speed 8540.54 samples/sec Loss 3.3561 LearningRate 0.0007 Epoch: 18 Global Step: 305480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:15,836-Speed 9351.16 samples/sec Loss 3.3636 LearningRate 0.0007 Epoch: 18 Global Step: 305490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:16,930-Speed 9376.82 samples/sec Loss 3.3661 LearningRate 0.0007 Epoch: 18 Global Step: 305500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:18,030-Speed 9306.19 samples/sec Loss 3.4609 LearningRate 0.0007 Epoch: 18 Global Step: 305510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:19,152-Speed 9137.55 samples/sec Loss 3.4580 LearningRate 0.0007 Epoch: 18 Global Step: 305520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:20,240-Speed 9417.30 samples/sec Loss 3.3700 LearningRate 0.0007 Epoch: 18 Global Step: 305530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:21,353-Speed 9200.56 samples/sec Loss 3.4475 LearningRate 0.0007 Epoch: 18 Global Step: 305540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:22,443-Speed 9403.19 samples/sec Loss 3.3218 LearningRate 0.0007 Epoch: 18 Global Step: 305550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:23,559-Speed 9180.39 samples/sec Loss 3.3685 LearningRate 0.0007 Epoch: 18 Global Step: 305560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:24,667-Speed 9252.44 samples/sec Loss 3.3543 LearningRate 0.0007 Epoch: 18 Global Step: 305570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:25,793-Speed 9092.99 samples/sec Loss 3.3615 LearningRate 0.0007 Epoch: 18 Global Step: 305580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:26,936-Speed 8968.85 samples/sec Loss 3.4462 LearningRate 0.0007 Epoch: 18 Global Step: 305590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:28,042-Speed 9259.30 samples/sec Loss 3.4282 LearningRate 0.0007 Epoch: 18 Global Step: 305600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:29,129-Speed 9433.78 samples/sec Loss 3.2864 LearningRate 0.0007 Epoch: 18 Global Step: 305610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:30,258-Speed 9075.13 samples/sec Loss 3.3918 LearningRate 0.0007 Epoch: 18 Global Step: 305620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:31,370-Speed 9212.58 samples/sec Loss 3.4045 LearningRate 0.0007 Epoch: 18 Global Step: 305630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:32,463-Speed 9369.89 samples/sec Loss 3.4160 LearningRate 0.0007 Epoch: 18 Global Step: 305640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:55:33,582-Speed 9161.69 samples/sec Loss 3.4454 LearningRate 0.0007 Epoch: 18 Global Step: 305650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:34,759-Speed 8702.79 samples/sec Loss 3.4272 LearningRate 0.0007 Epoch: 18 Global Step: 305660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:35,890-Speed 9058.78 samples/sec Loss 3.3569 LearningRate 0.0007 Epoch: 18 Global Step: 305670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:37,019-Speed 9071.60 samples/sec Loss 3.3327 LearningRate 0.0007 Epoch: 18 Global Step: 305680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:38,121-Speed 9299.11 samples/sec Loss 3.4363 LearningRate 0.0007 Epoch: 18 Global Step: 305690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:39,266-Speed 8947.21 samples/sec Loss 3.4203 LearningRate 0.0007 Epoch: 18 Global Step: 305700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:40,347-Speed 9480.52 samples/sec Loss 3.3734 LearningRate 0.0007 Epoch: 18 Global Step: 305710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:41,435-Speed 9425.76 samples/sec Loss 3.3781 LearningRate 0.0007 Epoch: 18 Global Step: 305720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:42,525-Speed 9394.07 samples/sec Loss 3.3401 LearningRate 0.0007 Epoch: 18 Global Step: 305730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:43,638-Speed 9204.12 samples/sec Loss 3.4028 LearningRate 0.0007 Epoch: 18 Global Step: 305740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:44,726-Speed 9422.57 samples/sec Loss 3.4508 LearningRate 0.0007 Epoch: 18 Global Step: 305750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:55:45,815-Speed 9406.88 samples/sec Loss 3.4428 LearningRate 0.0007 Epoch: 18 Global Step: 305760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:46,934-Speed 9159.48 samples/sec Loss 3.4101 LearningRate 0.0007 Epoch: 18 Global Step: 305770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:48,091-Speed 8854.76 samples/sec Loss 3.3973 LearningRate 0.0007 Epoch: 18 Global Step: 305780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:49,286-Speed 8569.45 samples/sec Loss 3.4600 LearningRate 0.0007 Epoch: 18 Global Step: 305790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:50,393-Speed 9257.01 samples/sec Loss 3.4774 LearningRate 0.0007 Epoch: 18 Global Step: 305800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:51,472-Speed 9492.72 samples/sec Loss 3.3791 LearningRate 0.0007 Epoch: 18 Global Step: 305810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:52,618-Speed 8946.51 samples/sec Loss 3.4534 LearningRate 0.0007 Epoch: 18 Global Step: 305820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:53,762-Speed 8948.83 samples/sec Loss 3.3747 LearningRate 0.0007 Epoch: 18 Global Step: 305830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:54,870-Speed 9250.11 samples/sec Loss 3.2965 LearningRate 0.0007 Epoch: 18 Global Step: 305840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:55,980-Speed 9227.05 samples/sec Loss 3.3449 LearningRate 0.0007 Epoch: 18 Global Step: 305850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:57,064-Speed 9458.27 samples/sec Loss 3.3537 LearningRate 0.0007 Epoch: 18 Global Step: 305860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:55:58,172-Speed 9249.01 samples/sec Loss 3.3670 LearningRate 0.0007 Epoch: 18 Global Step: 305870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:55:59,310-Speed 9001.54 samples/sec Loss 3.3817 LearningRate 0.0007 Epoch: 18 Global Step: 305880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:00,449-Speed 8999.35 samples/sec Loss 3.3985 LearningRate 0.0007 Epoch: 18 Global Step: 305890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:01,658-Speed 8476.43 samples/sec Loss 3.3176 LearningRate 0.0007 Epoch: 18 Global Step: 305900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:02,817-Speed 8839.79 samples/sec Loss 3.3287 LearningRate 0.0007 Epoch: 18 Global Step: 305910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:03,926-Speed 9235.78 samples/sec Loss 3.4481 LearningRate 0.0007 Epoch: 18 Global Step: 305920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:05,026-Speed 9314.29 samples/sec Loss 3.4419 LearningRate 0.0007 Epoch: 18 Global Step: 305930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:06,213-Speed 8626.62 samples/sec Loss 3.4142 LearningRate 0.0007 Epoch: 18 Global Step: 305940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:07,342-Speed 9077.85 samples/sec Loss 3.4412 LearningRate 0.0007 Epoch: 18 Global Step: 305950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:08,488-Speed 8939.31 samples/sec Loss 3.4517 LearningRate 0.0007 Epoch: 18 Global Step: 305960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:09,612-Speed 9121.13 samples/sec Loss 3.2815 LearningRate 0.0007 Epoch: 18 Global Step: 305970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:56:10,695-Speed 9454.53 samples/sec Loss 3.3566 LearningRate 0.0007 Epoch: 18 Global Step: 305980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:56:11,811-Speed 9182.49 samples/sec Loss 3.3770 LearningRate 0.0007 Epoch: 18 Global Step: 305990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:12,908-Speed 9340.12 samples/sec Loss 3.3589 LearningRate 0.0007 Epoch: 18 Global Step: 306000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:56:34,989-[lfw][306000]XNorm: 6.541931 Training: 2022-04-11 23:56:34,990-[lfw][306000]Accuracy-Flip: 0.99650+-0.00273 Training: 2022-04-11 23:56:34,990-[lfw][306000]Accuracy-Highest: 0.99750 Training: 2022-04-11 23:57:00,479-[cfp_fp][306000]XNorm: 5.713169 Training: 2022-04-11 23:57:00,480-[cfp_fp][306000]Accuracy-Flip: 0.97214+-0.00915 Training: 2022-04-11 23:57:00,480-[cfp_fp][306000]Accuracy-Highest: 0.97386 Training: 2022-04-11 23:57:22,456-[agedb_30][306000]XNorm: 6.375520 Training: 2022-04-11 23:57:22,457-[agedb_30][306000]Accuracy-Flip: 0.97350+-0.00709 Training: 2022-04-11 23:57:22,457-[agedb_30][306000]Accuracy-Highest: 0.97417 Training: 2022-04-11 23:57:23,561-Speed 144.94 samples/sec Loss 3.3760 LearningRate 0.0007 Epoch: 18 Global Step: 306010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:24,671-Speed 9228.78 samples/sec Loss 3.4078 LearningRate 0.0007 Epoch: 18 Global Step: 306020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:25,860-Speed 8615.42 samples/sec Loss 3.3874 LearningRate 0.0007 Epoch: 18 Global Step: 306030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:26,957-Speed 9337.61 samples/sec Loss 3.4063 LearningRate 0.0007 Epoch: 18 Global Step: 306040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:28,071-Speed 9196.95 samples/sec Loss 3.4466 LearningRate 0.0007 Epoch: 18 Global Step: 306050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:29,225-Speed 8881.09 samples/sec Loss 3.4125 LearningRate 0.0007 Epoch: 18 Global Step: 306060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:30,350-Speed 9108.00 samples/sec Loss 3.3340 LearningRate 0.0007 Epoch: 18 Global Step: 306070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:31,475-Speed 9110.82 samples/sec Loss 3.3941 LearningRate 0.0007 Epoch: 18 Global Step: 306080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:32,576-Speed 9302.70 samples/sec Loss 3.3701 LearningRate 0.0007 Epoch: 18 Global Step: 306090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:33,775-Speed 8543.66 samples/sec Loss 3.3911 LearningRate 0.0007 Epoch: 18 Global Step: 306100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:34,894-Speed 9159.63 samples/sec Loss 3.3565 LearningRate 0.0007 Epoch: 18 Global Step: 306110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:35,975-Speed 9473.52 samples/sec Loss 3.4310 LearningRate 0.0007 Epoch: 18 Global Step: 306120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:37,072-Speed 9344.83 samples/sec Loss 3.3812 LearningRate 0.0007 Epoch: 18 Global Step: 306130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:38,206-Speed 9029.85 samples/sec Loss 3.2617 LearningRate 0.0007 Epoch: 18 Global Step: 306140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:39,335-Speed 9073.12 samples/sec Loss 3.4096 LearningRate 0.0007 Epoch: 18 Global Step: 306150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:40,448-Speed 9208.33 samples/sec Loss 3.3487 LearningRate 0.0007 Epoch: 18 Global Step: 306160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:41,571-Speed 9125.71 samples/sec Loss 3.4016 LearningRate 0.0007 Epoch: 18 Global Step: 306170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:42,710-Speed 8997.24 samples/sec Loss 3.4397 LearningRate 0.0007 Epoch: 18 Global Step: 306180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:43,824-Speed 9190.77 samples/sec Loss 3.4458 LearningRate 0.0007 Epoch: 18 Global Step: 306190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:44,947-Speed 9125.78 samples/sec Loss 3.3983 LearningRate 0.0007 Epoch: 18 Global Step: 306200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:46,039-Speed 9382.07 samples/sec Loss 3.3986 LearningRate 0.0007 Epoch: 18 Global Step: 306210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:47,136-Speed 9350.27 samples/sec Loss 3.3421 LearningRate 0.0007 Epoch: 18 Global Step: 306220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:48,272-Speed 9015.15 samples/sec Loss 3.4824 LearningRate 0.0007 Epoch: 18 Global Step: 306230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:49,406-Speed 9032.64 samples/sec Loss 3.3871 LearningRate 0.0007 Epoch: 18 Global Step: 306240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:50,485-Speed 9497.58 samples/sec Loss 3.3897 LearningRate 0.0007 Epoch: 18 Global Step: 306250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:51,631-Speed 8941.50 samples/sec Loss 3.4346 LearningRate 0.0007 Epoch: 18 Global Step: 306260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:52,691-Speed 9665.50 samples/sec Loss 3.3605 LearningRate 0.0007 Epoch: 18 Global Step: 306270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:53,783-Speed 9381.35 samples/sec Loss 3.3947 LearningRate 0.0007 Epoch: 18 Global Step: 306280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:54,927-Speed 8958.55 samples/sec Loss 3.3515 LearningRate 0.0007 Epoch: 18 Global Step: 306290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:57:56,001-Speed 9541.62 samples/sec Loss 3.4248 LearningRate 0.0007 Epoch: 18 Global Step: 306300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:57,115-Speed 9190.31 samples/sec Loss 3.3169 LearningRate 0.0007 Epoch: 18 Global Step: 306310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:58,220-Speed 9272.20 samples/sec Loss 3.3823 LearningRate 0.0007 Epoch: 18 Global Step: 306320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:57:59,384-Speed 8806.89 samples/sec Loss 3.3880 LearningRate 0.0007 Epoch: 18 Global Step: 306330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:00,498-Speed 9195.36 samples/sec Loss 3.4622 LearningRate 0.0007 Epoch: 18 Global Step: 306340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:01,628-Speed 9071.05 samples/sec Loss 3.3596 LearningRate 0.0007 Epoch: 18 Global Step: 306350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:02,799-Speed 8747.13 samples/sec Loss 3.4109 LearningRate 0.0007 Epoch: 18 Global Step: 306360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:03,936-Speed 9012.54 samples/sec Loss 3.4947 LearningRate 0.0007 Epoch: 18 Global Step: 306370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:05,066-Speed 9066.77 samples/sec Loss 3.3391 LearningRate 0.0007 Epoch: 18 Global Step: 306380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:06,187-Speed 9140.81 samples/sec Loss 3.4296 LearningRate 0.0007 Epoch: 18 Global Step: 306390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:07,300-Speed 9207.45 samples/sec Loss 3.3994 LearningRate 0.0007 Epoch: 18 Global Step: 306400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:58:08,428-Speed 9081.51 samples/sec Loss 3.4832 LearningRate 0.0007 Epoch: 18 Global Step: 306410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:09,552-Speed 9116.91 samples/sec Loss 3.3937 LearningRate 0.0007 Epoch: 18 Global Step: 306420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:10,700-Speed 8925.28 samples/sec Loss 3.4050 LearningRate 0.0007 Epoch: 18 Global Step: 306430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:11,839-Speed 8994.95 samples/sec Loss 3.4120 LearningRate 0.0007 Epoch: 18 Global Step: 306440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:12,968-Speed 9074.05 samples/sec Loss 3.4323 LearningRate 0.0007 Epoch: 18 Global Step: 306450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:14,090-Speed 9132.33 samples/sec Loss 3.3446 LearningRate 0.0007 Epoch: 18 Global Step: 306460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:15,219-Speed 9072.21 samples/sec Loss 3.3240 LearningRate 0.0007 Epoch: 18 Global Step: 306470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:16,370-Speed 8905.89 samples/sec Loss 3.4011 LearningRate 0.0007 Epoch: 18 Global Step: 306480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:17,492-Speed 9131.14 samples/sec Loss 3.3854 LearningRate 0.0007 Epoch: 18 Global Step: 306490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:18,651-Speed 8836.80 samples/sec Loss 3.4140 LearningRate 0.0007 Epoch: 18 Global Step: 306500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:19,750-Speed 9327.24 samples/sec Loss 3.3876 LearningRate 0.0007 Epoch: 18 Global Step: 306510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:20,905-Speed 8864.13 samples/sec Loss 3.3524 LearningRate 0.0007 Epoch: 18 Global Step: 306520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:22,038-Speed 9044.84 samples/sec Loss 3.4336 LearningRate 0.0007 Epoch: 18 Global Step: 306530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:23,160-Speed 9132.21 samples/sec Loss 3.4040 LearningRate 0.0007 Epoch: 18 Global Step: 306540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:24,311-Speed 8904.69 samples/sec Loss 3.3784 LearningRate 0.0007 Epoch: 18 Global Step: 306550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:25,484-Speed 8731.81 samples/sec Loss 3.3667 LearningRate 0.0007 Epoch: 18 Global Step: 306560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:26,590-Speed 9272.76 samples/sec Loss 3.3530 LearningRate 0.0007 Epoch: 18 Global Step: 306570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:58:27,696-Speed 9266.73 samples/sec Loss 3.4255 LearningRate 0.0007 Epoch: 18 Global Step: 306580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:58:28,805-Speed 9240.71 samples/sec Loss 3.4596 LearningRate 0.0007 Epoch: 18 Global Step: 306590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:58:29,881-Speed 9522.54 samples/sec Loss 3.3543 LearningRate 0.0007 Epoch: 18 Global Step: 306600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:58:30,992-Speed 9222.36 samples/sec Loss 3.3631 LearningRate 0.0007 Epoch: 18 Global Step: 306610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:58:32,159-Speed 8778.64 samples/sec Loss 3.3775 LearningRate 0.0007 Epoch: 18 Global Step: 306620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:58:33,252-Speed 9371.03 samples/sec Loss 3.3581 LearningRate 0.0007 Epoch: 18 Global Step: 306630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:58:34,389-Speed 9015.12 samples/sec Loss 3.3857 LearningRate 0.0007 Epoch: 18 Global Step: 306640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:58:35,496-Speed 9251.98 samples/sec Loss 3.4346 LearningRate 0.0007 Epoch: 18 Global Step: 306650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:58:36,597-Speed 9308.26 samples/sec Loss 3.3212 LearningRate 0.0007 Epoch: 18 Global Step: 306660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:58:37,690-Speed 9375.79 samples/sec Loss 3.4033 LearningRate 0.0007 Epoch: 18 Global Step: 306670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:38,822-Speed 9049.35 samples/sec Loss 3.3494 LearningRate 0.0007 Epoch: 18 Global Step: 306680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:39,976-Speed 8882.19 samples/sec Loss 3.4318 LearningRate 0.0007 Epoch: 18 Global Step: 306690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:41,117-Speed 8978.20 samples/sec Loss 3.3836 LearningRate 0.0007 Epoch: 18 Global Step: 306700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:42,287-Speed 8760.59 samples/sec Loss 3.4302 LearningRate 0.0007 Epoch: 18 Global Step: 306710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:43,445-Speed 8846.66 samples/sec Loss 3.3842 LearningRate 0.0007 Epoch: 18 Global Step: 306720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:44,526-Speed 9480.29 samples/sec Loss 3.4215 LearningRate 0.0007 Epoch: 18 Global Step: 306730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:45,650-Speed 9114.27 samples/sec Loss 3.3678 LearningRate 0.0007 Epoch: 18 Global Step: 306740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:46,775-Speed 9109.27 samples/sec Loss 3.4069 LearningRate 0.0007 Epoch: 18 Global Step: 306750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:47,932-Speed 8853.21 samples/sec Loss 3.3943 LearningRate 0.0007 Epoch: 18 Global Step: 306760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:49,011-Speed 9499.99 samples/sec Loss 3.3105 LearningRate 0.0007 Epoch: 18 Global Step: 306770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:50,090-Speed 9491.94 samples/sec Loss 3.3858 LearningRate 0.0007 Epoch: 18 Global Step: 306780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:51,181-Speed 9395.89 samples/sec Loss 3.4237 LearningRate 0.0007 Epoch: 18 Global Step: 306790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:52,326-Speed 8947.18 samples/sec Loss 3.4842 LearningRate 0.0007 Epoch: 18 Global Step: 306800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:53,431-Speed 9271.24 samples/sec Loss 3.3930 LearningRate 0.0007 Epoch: 18 Global Step: 306810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:54,553-Speed 9130.41 samples/sec Loss 3.3587 LearningRate 0.0007 Epoch: 18 Global Step: 306820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:55,719-Speed 8788.25 samples/sec Loss 3.3404 LearningRate 0.0007 Epoch: 18 Global Step: 306830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:56,842-Speed 9123.03 samples/sec Loss 3.3492 LearningRate 0.0007 Epoch: 18 Global Step: 306840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:58,042-Speed 8535.96 samples/sec Loss 3.3698 LearningRate 0.0007 Epoch: 18 Global Step: 306850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:58:59,143-Speed 9312.56 samples/sec Loss 3.4575 LearningRate 0.0007 Epoch: 18 Global Step: 306860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:00,279-Speed 9022.90 samples/sec Loss 3.4762 LearningRate 0.0007 Epoch: 18 Global Step: 306870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:59:01,354-Speed 9525.66 samples/sec Loss 3.3771 LearningRate 0.0007 Epoch: 18 Global Step: 306880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:59:02,483-Speed 9076.65 samples/sec Loss 3.4547 LearningRate 0.0007 Epoch: 18 Global Step: 306890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:59:03,564-Speed 9475.42 samples/sec Loss 3.4329 LearningRate 0.0007 Epoch: 18 Global Step: 306900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:04,642-Speed 9508.81 samples/sec Loss 3.4286 LearningRate 0.0006 Epoch: 18 Global Step: 306910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:05,735-Speed 9369.88 samples/sec Loss 3.3764 LearningRate 0.0006 Epoch: 18 Global Step: 306920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:06,865-Speed 9071.16 samples/sec Loss 3.3548 LearningRate 0.0006 Epoch: 18 Global Step: 306930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:07,965-Speed 9315.70 samples/sec Loss 3.4162 LearningRate 0.0006 Epoch: 18 Global Step: 306940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:09,142-Speed 8698.49 samples/sec Loss 3.3501 LearningRate 0.0006 Epoch: 18 Global Step: 306950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:10,274-Speed 9054.11 samples/sec Loss 3.4546 LearningRate 0.0006 Epoch: 18 Global Step: 306960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:11,369-Speed 9359.54 samples/sec Loss 3.4090 LearningRate 0.0006 Epoch: 18 Global Step: 306970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:12,588-Speed 8405.81 samples/sec Loss 3.3488 LearningRate 0.0006 Epoch: 18 Global Step: 306980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:13,687-Speed 9322.05 samples/sec Loss 3.4266 LearningRate 0.0006 Epoch: 18 Global Step: 306990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:14,797-Speed 9225.93 samples/sec Loss 3.4216 LearningRate 0.0006 Epoch: 18 Global Step: 307000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:59:15,949-Speed 8892.47 samples/sec Loss 3.3544 LearningRate 0.0006 Epoch: 18 Global Step: 307010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:59:17,071-Speed 9137.71 samples/sec Loss 3.3928 LearningRate 0.0006 Epoch: 18 Global Step: 307020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:18,264-Speed 8587.12 samples/sec Loss 3.3805 LearningRate 0.0006 Epoch: 18 Global Step: 307030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:19,437-Speed 8736.49 samples/sec Loss 3.4299 LearningRate 0.0006 Epoch: 18 Global Step: 307040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:20,540-Speed 9288.47 samples/sec Loss 3.3929 LearningRate 0.0006 Epoch: 18 Global Step: 307050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:21,642-Speed 9302.41 samples/sec Loss 3.4158 LearningRate 0.0006 Epoch: 18 Global Step: 307060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:22,759-Speed 9169.54 samples/sec Loss 3.3853 LearningRate 0.0006 Epoch: 18 Global Step: 307070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:23,859-Speed 9313.29 samples/sec Loss 3.3975 LearningRate 0.0006 Epoch: 18 Global Step: 307080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:25,033-Speed 8727.84 samples/sec Loss 3.4023 LearningRate 0.0006 Epoch: 18 Global Step: 307090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:26,149-Speed 9187.94 samples/sec Loss 3.3708 LearningRate 0.0006 Epoch: 18 Global Step: 307100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:27,284-Speed 9023.57 samples/sec Loss 3.3544 LearningRate 0.0006 Epoch: 18 Global Step: 307110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:28,415-Speed 9063.22 samples/sec Loss 3.4066 LearningRate 0.0006 Epoch: 18 Global Step: 307120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:59:29,563-Speed 8925.66 samples/sec Loss 3.3643 LearningRate 0.0006 Epoch: 18 Global Step: 307130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:30,663-Speed 9312.55 samples/sec Loss 3.4355 LearningRate 0.0006 Epoch: 18 Global Step: 307140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:31,783-Speed 9148.69 samples/sec Loss 3.3258 LearningRate 0.0006 Epoch: 18 Global Step: 307150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:32,857-Speed 9539.50 samples/sec Loss 3.3848 LearningRate 0.0006 Epoch: 18 Global Step: 307160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:33,968-Speed 9221.26 samples/sec Loss 3.4385 LearningRate 0.0006 Epoch: 18 Global Step: 307170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:35,083-Speed 9193.39 samples/sec Loss 3.4125 LearningRate 0.0006 Epoch: 18 Global Step: 307180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:59:36,205-Speed 9129.69 samples/sec Loss 3.3515 LearningRate 0.0006 Epoch: 18 Global Step: 307190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:59:37,334-Speed 9076.30 samples/sec Loss 3.4474 LearningRate 0.0006 Epoch: 18 Global Step: 307200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:59:38,439-Speed 9273.94 samples/sec Loss 3.4138 LearningRate 0.0006 Epoch: 18 Global Step: 307210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:59:39,584-Speed 8948.58 samples/sec Loss 3.3543 LearningRate 0.0006 Epoch: 18 Global Step: 307220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:59:40,703-Speed 9156.51 samples/sec Loss 3.4269 LearningRate 0.0006 Epoch: 18 Global Step: 307230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:59:41,773-Speed 9574.11 samples/sec Loss 3.4338 LearningRate 0.0006 Epoch: 18 Global Step: 307240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:59:42,873-Speed 9317.72 samples/sec Loss 3.3865 LearningRate 0.0006 Epoch: 18 Global Step: 307250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:59:44,003-Speed 9069.76 samples/sec Loss 3.4067 LearningRate 0.0006 Epoch: 18 Global Step: 307260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:59:45,106-Speed 9283.51 samples/sec Loss 3.4248 LearningRate 0.0006 Epoch: 18 Global Step: 307270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 23:59:46,205-Speed 9326.35 samples/sec Loss 3.4077 LearningRate 0.0006 Epoch: 18 Global Step: 307280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:47,379-Speed 8727.98 samples/sec Loss 3.3700 LearningRate 0.0006 Epoch: 18 Global Step: 307290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:48,501-Speed 9135.48 samples/sec Loss 3.4622 LearningRate 0.0006 Epoch: 18 Global Step: 307300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:49,674-Speed 8733.53 samples/sec Loss 3.4300 LearningRate 0.0006 Epoch: 18 Global Step: 307310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:50,803-Speed 9072.03 samples/sec Loss 3.4957 LearningRate 0.0006 Epoch: 18 Global Step: 307320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:51,900-Speed 9341.48 samples/sec Loss 3.4258 LearningRate 0.0006 Epoch: 18 Global Step: 307330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:53,003-Speed 9289.27 samples/sec Loss 3.4140 LearningRate 0.0006 Epoch: 18 Global Step: 307340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:54,141-Speed 9004.56 samples/sec Loss 3.3626 LearningRate 0.0006 Epoch: 18 Global Step: 307350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:55,245-Speed 9280.91 samples/sec Loss 3.4755 LearningRate 0.0006 Epoch: 18 Global Step: 307360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:56,393-Speed 8925.32 samples/sec Loss 3.4503 LearningRate 0.0006 Epoch: 18 Global Step: 307370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:57,524-Speed 9056.92 samples/sec Loss 3.3802 LearningRate 0.0006 Epoch: 18 Global Step: 307380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 23:59:58,632-Speed 9245.68 samples/sec Loss 3.3606 LearningRate 0.0006 Epoch: 18 Global Step: 307390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 23:59:59,698-Speed 9618.87 samples/sec Loss 3.3108 LearningRate 0.0006 Epoch: 18 Global Step: 307400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:00,832-Speed 9032.96 samples/sec Loss 3.3067 LearningRate 0.0006 Epoch: 18 Global Step: 307410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:01,996-Speed 8805.46 samples/sec Loss 3.4093 LearningRate 0.0006 Epoch: 18 Global Step: 307420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:03,129-Speed 9041.32 samples/sec Loss 3.3544 LearningRate 0.0006 Epoch: 18 Global Step: 307430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:04,197-Speed 9594.38 samples/sec Loss 3.3304 LearningRate 0.0006 Epoch: 18 Global Step: 307440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:05,306-Speed 9232.17 samples/sec Loss 3.3565 LearningRate 0.0006 Epoch: 18 Global Step: 307450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:06,404-Speed 9333.08 samples/sec Loss 3.3852 LearningRate 0.0006 Epoch: 18 Global Step: 307460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:07,482-Speed 9503.60 samples/sec Loss 3.3756 LearningRate 0.0006 Epoch: 18 Global Step: 307470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:08,622-Speed 8992.10 samples/sec Loss 3.3794 LearningRate 0.0006 Epoch: 18 Global Step: 307480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:09,756-Speed 9031.48 samples/sec Loss 3.4016 LearningRate 0.0006 Epoch: 18 Global Step: 307490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:00:10,857-Speed 9309.79 samples/sec Loss 3.4238 LearningRate 0.0006 Epoch: 18 Global Step: 307500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:11,968-Speed 9223.73 samples/sec Loss 3.4229 LearningRate 0.0006 Epoch: 18 Global Step: 307510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:13,142-Speed 8725.99 samples/sec Loss 3.3778 LearningRate 0.0006 Epoch: 18 Global Step: 307520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:14,272-Speed 9066.25 samples/sec Loss 3.4432 LearningRate 0.0006 Epoch: 18 Global Step: 307530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:15,440-Speed 8769.65 samples/sec Loss 3.4077 LearningRate 0.0006 Epoch: 18 Global Step: 307540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:16,561-Speed 9153.51 samples/sec Loss 3.3027 LearningRate 0.0006 Epoch: 18 Global Step: 307550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:17,688-Speed 9090.77 samples/sec Loss 3.3415 LearningRate 0.0006 Epoch: 18 Global Step: 307560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:18,824-Speed 9013.57 samples/sec Loss 3.4750 LearningRate 0.0006 Epoch: 18 Global Step: 307570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:19,937-Speed 9210.18 samples/sec Loss 3.4720 LearningRate 0.0006 Epoch: 18 Global Step: 307580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:21,079-Speed 8968.08 samples/sec Loss 3.4313 LearningRate 0.0006 Epoch: 18 Global Step: 307590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:22,177-Speed 9333.87 samples/sec Loss 3.4334 LearningRate 0.0006 Epoch: 18 Global Step: 307600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:00:23,274-Speed 9339.55 samples/sec Loss 3.4670 LearningRate 0.0006 Epoch: 18 Global Step: 307610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:24,475-Speed 8527.85 samples/sec Loss 3.4407 LearningRate 0.0006 Epoch: 18 Global Step: 307620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:25,632-Speed 8860.65 samples/sec Loss 3.3916 LearningRate 0.0006 Epoch: 18 Global Step: 307630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:26,767-Speed 9026.35 samples/sec Loss 3.4088 LearningRate 0.0006 Epoch: 18 Global Step: 307640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:27,876-Speed 9237.26 samples/sec Loss 3.3680 LearningRate 0.0006 Epoch: 18 Global Step: 307650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:28,988-Speed 9216.35 samples/sec Loss 3.3818 LearningRate 0.0006 Epoch: 18 Global Step: 307660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:30,114-Speed 9104.08 samples/sec Loss 3.3017 LearningRate 0.0006 Epoch: 18 Global Step: 307670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:31,206-Speed 9382.07 samples/sec Loss 3.3801 LearningRate 0.0006 Epoch: 18 Global Step: 307680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:32,368-Speed 8817.07 samples/sec Loss 3.3742 LearningRate 0.0006 Epoch: 18 Global Step: 307690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:33,507-Speed 9001.02 samples/sec Loss 3.4493 LearningRate 0.0006 Epoch: 18 Global Step: 307700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:34,652-Speed 8945.20 samples/sec Loss 3.3702 LearningRate 0.0006 Epoch: 18 Global Step: 307710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:35,799-Speed 8937.59 samples/sec Loss 3.3746 LearningRate 0.0006 Epoch: 18 Global Step: 307720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:36,961-Speed 8817.85 samples/sec Loss 3.3848 LearningRate 0.0006 Epoch: 18 Global Step: 307730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:38,125-Speed 8799.64 samples/sec Loss 3.4232 LearningRate 0.0006 Epoch: 18 Global Step: 307740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:39,243-Speed 9165.05 samples/sec Loss 3.4329 LearningRate 0.0006 Epoch: 18 Global Step: 307750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:40,366-Speed 9131.99 samples/sec Loss 3.4704 LearningRate 0.0006 Epoch: 18 Global Step: 307760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:41,509-Speed 8959.29 samples/sec Loss 3.4359 LearningRate 0.0006 Epoch: 18 Global Step: 307770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:42,704-Speed 8574.66 samples/sec Loss 3.3418 LearningRate 0.0006 Epoch: 18 Global Step: 307780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:43,840-Speed 9023.39 samples/sec Loss 3.3587 LearningRate 0.0006 Epoch: 18 Global Step: 307790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:44,956-Speed 9179.74 samples/sec Loss 3.3697 LearningRate 0.0006 Epoch: 18 Global Step: 307800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:46,058-Speed 9298.66 samples/sec Loss 3.4938 LearningRate 0.0006 Epoch: 18 Global Step: 307810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:00:47,180-Speed 9128.70 samples/sec Loss 3.3806 LearningRate 0.0006 Epoch: 18 Global Step: 307820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:48,288-Speed 9248.79 samples/sec Loss 3.4385 LearningRate 0.0006 Epoch: 18 Global Step: 307830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:49,438-Speed 8909.03 samples/sec Loss 3.3561 LearningRate 0.0006 Epoch: 18 Global Step: 307840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:50,593-Speed 8874.53 samples/sec Loss 3.3686 LearningRate 0.0006 Epoch: 18 Global Step: 307850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:51,768-Speed 8718.58 samples/sec Loss 3.4753 LearningRate 0.0006 Epoch: 18 Global Step: 307860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:52,864-Speed 9346.72 samples/sec Loss 3.4398 LearningRate 0.0006 Epoch: 18 Global Step: 307870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:54,016-Speed 8898.07 samples/sec Loss 3.3418 LearningRate 0.0006 Epoch: 18 Global Step: 307880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:55,150-Speed 9030.42 samples/sec Loss 3.4300 LearningRate 0.0006 Epoch: 18 Global Step: 307890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:56,264-Speed 9197.04 samples/sec Loss 3.4364 LearningRate 0.0006 Epoch: 18 Global Step: 307900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:57,422-Speed 8851.80 samples/sec Loss 3.3712 LearningRate 0.0006 Epoch: 18 Global Step: 307910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:00:58,562-Speed 8986.74 samples/sec Loss 3.4119 LearningRate 0.0006 Epoch: 18 Global Step: 307920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:00:59,711-Speed 8916.54 samples/sec Loss 3.4702 LearningRate 0.0006 Epoch: 18 Global Step: 307930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:01:00,850-Speed 8995.14 samples/sec Loss 3.4589 LearningRate 0.0006 Epoch: 18 Global Step: 307940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:01:01,941-Speed 9389.76 samples/sec Loss 3.4644 LearningRate 0.0006 Epoch: 18 Global Step: 307950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:01:03,038-Speed 9339.57 samples/sec Loss 3.3837 LearningRate 0.0006 Epoch: 18 Global Step: 307960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:01:04,226-Speed 8623.96 samples/sec Loss 3.3939 LearningRate 0.0006 Epoch: 18 Global Step: 307970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:01:05,311-Speed 9442.35 samples/sec Loss 3.4082 LearningRate 0.0006 Epoch: 18 Global Step: 307980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:01:06,383-Speed 9561.12 samples/sec Loss 3.3908 LearningRate 0.0006 Epoch: 18 Global Step: 307990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:01:07,495-Speed 9215.54 samples/sec Loss 3.4820 LearningRate 0.0006 Epoch: 18 Global Step: 308000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:01:29,593-[lfw][308000]XNorm: 6.600562 Training: 2022-04-12 00:01:29,594-[lfw][308000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-04-12 00:01:29,595-[lfw][308000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:01:55,072-[cfp_fp][308000]XNorm: 5.769856 Training: 2022-04-12 00:01:55,073-[cfp_fp][308000]Accuracy-Flip: 0.97200+-0.00874 Training: 2022-04-12 00:01:55,074-[cfp_fp][308000]Accuracy-Highest: 0.97386 Training: 2022-04-12 00:02:17,045-[agedb_30][308000]XNorm: 6.437475 Training: 2022-04-12 00:02:17,046-[agedb_30][308000]Accuracy-Flip: 0.97117+-0.00885 Training: 2022-04-12 00:02:17,047-[agedb_30][308000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:02:18,165-Speed 144.90 samples/sec Loss 3.4228 LearningRate 0.0006 Epoch: 18 Global Step: 308010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:19,298-Speed 9041.75 samples/sec Loss 3.4025 LearningRate 0.0006 Epoch: 18 Global Step: 308020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:20,459-Speed 8830.24 samples/sec Loss 3.3388 LearningRate 0.0006 Epoch: 18 Global Step: 308030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:21,614-Speed 8871.92 samples/sec Loss 3.3906 LearningRate 0.0006 Epoch: 18 Global Step: 308040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:22,749-Speed 9024.42 samples/sec Loss 3.4121 LearningRate 0.0006 Epoch: 18 Global Step: 308050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:23,852-Speed 9283.58 samples/sec Loss 3.3815 LearningRate 0.0006 Epoch: 18 Global Step: 308060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:24,966-Speed 9202.35 samples/sec Loss 3.3398 LearningRate 0.0006 Epoch: 18 Global Step: 308070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:26,075-Speed 9242.28 samples/sec Loss 3.4447 LearningRate 0.0006 Epoch: 18 Global Step: 308080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:27,209-Speed 9034.82 samples/sec Loss 3.3266 LearningRate 0.0006 Epoch: 18 Global Step: 308090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:28,369-Speed 8829.52 samples/sec Loss 3.3938 LearningRate 0.0006 Epoch: 18 Global Step: 308100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:29,508-Speed 8999.29 samples/sec Loss 3.4080 LearningRate 0.0006 Epoch: 18 Global Step: 308110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:30,610-Speed 9289.22 samples/sec Loss 3.3486 LearningRate 0.0006 Epoch: 18 Global Step: 308120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:31,732-Speed 9134.87 samples/sec Loss 3.3141 LearningRate 0.0006 Epoch: 18 Global Step: 308130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:32,809-Speed 9513.20 samples/sec Loss 3.3830 LearningRate 0.0006 Epoch: 18 Global Step: 308140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:34,009-Speed 8537.98 samples/sec Loss 3.4113 LearningRate 0.0006 Epoch: 18 Global Step: 308150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:02:35,101-Speed 9386.99 samples/sec Loss 3.4685 LearningRate 0.0006 Epoch: 18 Global Step: 308160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:36,246-Speed 8947.78 samples/sec Loss 3.3790 LearningRate 0.0006 Epoch: 18 Global Step: 308170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:37,376-Speed 9067.58 samples/sec Loss 3.3956 LearningRate 0.0006 Epoch: 18 Global Step: 308180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:38,546-Speed 8750.10 samples/sec Loss 3.3247 LearningRate 0.0006 Epoch: 18 Global Step: 308190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:39,699-Speed 8893.25 samples/sec Loss 3.3102 LearningRate 0.0006 Epoch: 18 Global Step: 308200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:40,820-Speed 9147.09 samples/sec Loss 3.4757 LearningRate 0.0006 Epoch: 18 Global Step: 308210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:41,942-Speed 9125.28 samples/sec Loss 3.3626 LearningRate 0.0006 Epoch: 18 Global Step: 308220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:43,066-Speed 9118.34 samples/sec Loss 3.3981 LearningRate 0.0006 Epoch: 18 Global Step: 308230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:44,175-Speed 9235.90 samples/sec Loss 3.3662 LearningRate 0.0006 Epoch: 18 Global Step: 308240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:45,344-Speed 8768.16 samples/sec Loss 3.4541 LearningRate 0.0006 Epoch: 18 Global Step: 308250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:46,448-Speed 9274.26 samples/sec Loss 3.3757 LearningRate 0.0006 Epoch: 18 Global Step: 308260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:47,520-Speed 9566.25 samples/sec Loss 3.3925 LearningRate 0.0006 Epoch: 18 Global Step: 308270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:48,655-Speed 9028.16 samples/sec Loss 3.3900 LearningRate 0.0006 Epoch: 18 Global Step: 308280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:49,766-Speed 9230.01 samples/sec Loss 3.3932 LearningRate 0.0006 Epoch: 18 Global Step: 308290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:50,912-Speed 8937.47 samples/sec Loss 3.4042 LearningRate 0.0006 Epoch: 18 Global Step: 308300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:52,037-Speed 9112.00 samples/sec Loss 3.3548 LearningRate 0.0006 Epoch: 18 Global Step: 308310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:53,127-Speed 9400.13 samples/sec Loss 3.3573 LearningRate 0.0006 Epoch: 18 Global Step: 308320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:54,237-Speed 9230.13 samples/sec Loss 3.4003 LearningRate 0.0006 Epoch: 18 Global Step: 308330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:55,382-Speed 8943.13 samples/sec Loss 3.4222 LearningRate 0.0006 Epoch: 18 Global Step: 308340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:56,509-Speed 9092.28 samples/sec Loss 3.4507 LearningRate 0.0006 Epoch: 18 Global Step: 308350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:57,639-Speed 9063.51 samples/sec Loss 3.4380 LearningRate 0.0006 Epoch: 18 Global Step: 308360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:58,756-Speed 9174.01 samples/sec Loss 3.3916 LearningRate 0.0006 Epoch: 18 Global Step: 308370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:02:59,877-Speed 9151.60 samples/sec Loss 3.3425 LearningRate 0.0006 Epoch: 18 Global Step: 308380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:01,103-Speed 8357.18 samples/sec Loss 3.3736 LearningRate 0.0006 Epoch: 18 Global Step: 308390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:02,207-Speed 9280.48 samples/sec Loss 3.3988 LearningRate 0.0006 Epoch: 18 Global Step: 308400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:03,315-Speed 9247.04 samples/sec Loss 3.5148 LearningRate 0.0006 Epoch: 18 Global Step: 308410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:04,418-Speed 9286.00 samples/sec Loss 3.3851 LearningRate 0.0006 Epoch: 18 Global Step: 308420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:05,502-Speed 9455.93 samples/sec Loss 3.3227 LearningRate 0.0006 Epoch: 18 Global Step: 308430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:06,609-Speed 9255.58 samples/sec Loss 3.4220 LearningRate 0.0006 Epoch: 18 Global Step: 308440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:07,727-Speed 9162.11 samples/sec Loss 3.3923 LearningRate 0.0006 Epoch: 18 Global Step: 308450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:08,895-Speed 8766.49 samples/sec Loss 3.3496 LearningRate 0.0006 Epoch: 18 Global Step: 308460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:09,998-Speed 9293.13 samples/sec Loss 3.4676 LearningRate 0.0006 Epoch: 18 Global Step: 308470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:11,147-Speed 8917.14 samples/sec Loss 3.4449 LearningRate 0.0006 Epoch: 18 Global Step: 308480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:12,260-Speed 9205.56 samples/sec Loss 3.3921 LearningRate 0.0006 Epoch: 18 Global Step: 308490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:13,352-Speed 9384.63 samples/sec Loss 3.3223 LearningRate 0.0006 Epoch: 18 Global Step: 308500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:14,512-Speed 8834.74 samples/sec Loss 3.3187 LearningRate 0.0006 Epoch: 18 Global Step: 308510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:15,618-Speed 9263.99 samples/sec Loss 3.4373 LearningRate 0.0006 Epoch: 18 Global Step: 308520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:16,714-Speed 9350.61 samples/sec Loss 3.4614 LearningRate 0.0006 Epoch: 18 Global Step: 308530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:17,788-Speed 9542.37 samples/sec Loss 3.3914 LearningRate 0.0006 Epoch: 18 Global Step: 308540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:18,917-Speed 9075.82 samples/sec Loss 3.3779 LearningRate 0.0006 Epoch: 18 Global Step: 308550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:20,000-Speed 9459.71 samples/sec Loss 3.4682 LearningRate 0.0006 Epoch: 18 Global Step: 308560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:03:21,105-Speed 9264.55 samples/sec Loss 3.4281 LearningRate 0.0006 Epoch: 18 Global Step: 308570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:22,270-Speed 8801.74 samples/sec Loss 3.3936 LearningRate 0.0006 Epoch: 18 Global Step: 308580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:23,420-Speed 8905.20 samples/sec Loss 3.4848 LearningRate 0.0006 Epoch: 18 Global Step: 308590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:24,521-Speed 9306.19 samples/sec Loss 3.3866 LearningRate 0.0006 Epoch: 18 Global Step: 308600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:25,654-Speed 9043.68 samples/sec Loss 3.4017 LearningRate 0.0006 Epoch: 18 Global Step: 308610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:26,708-Speed 9718.79 samples/sec Loss 3.3593 LearningRate 0.0006 Epoch: 18 Global Step: 308620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:27,894-Speed 8636.09 samples/sec Loss 3.4282 LearningRate 0.0006 Epoch: 18 Global Step: 308630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:28,943-Speed 9772.84 samples/sec Loss 3.4553 LearningRate 0.0006 Epoch: 18 Global Step: 308640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:30,030-Speed 9430.25 samples/sec Loss 3.2621 LearningRate 0.0006 Epoch: 18 Global Step: 308650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:31,218-Speed 8620.91 samples/sec Loss 3.4332 LearningRate 0.0006 Epoch: 18 Global Step: 308660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:03:32,382-Speed 8802.59 samples/sec Loss 3.4023 LearningRate 0.0006 Epoch: 18 Global Step: 308670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:03:33,551-Speed 8766.95 samples/sec Loss 3.4333 LearningRate 0.0006 Epoch: 18 Global Step: 308680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:03:34,655-Speed 9277.96 samples/sec Loss 3.3433 LearningRate 0.0006 Epoch: 18 Global Step: 308690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:03:35,799-Speed 8953.55 samples/sec Loss 3.4183 LearningRate 0.0006 Epoch: 18 Global Step: 308700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:03:36,973-Speed 8732.38 samples/sec Loss 3.3650 LearningRate 0.0006 Epoch: 18 Global Step: 308710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:03:38,112-Speed 8996.72 samples/sec Loss 3.3194 LearningRate 0.0006 Epoch: 18 Global Step: 308720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:03:39,231-Speed 9150.13 samples/sec Loss 3.3726 LearningRate 0.0006 Epoch: 18 Global Step: 308730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:03:40,310-Speed 9496.55 samples/sec Loss 3.3565 LearningRate 0.0006 Epoch: 18 Global Step: 308740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:03:41,400-Speed 9405.73 samples/sec Loss 3.4145 LearningRate 0.0006 Epoch: 18 Global Step: 308750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:03:42,522-Speed 9127.94 samples/sec Loss 3.4261 LearningRate 0.0006 Epoch: 18 Global Step: 308760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:43,659-Speed 9011.84 samples/sec Loss 3.3515 LearningRate 0.0006 Epoch: 18 Global Step: 308770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:44,773-Speed 9198.86 samples/sec Loss 3.3626 LearningRate 0.0006 Epoch: 18 Global Step: 308780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:45,898-Speed 9101.49 samples/sec Loss 3.3297 LearningRate 0.0006 Epoch: 18 Global Step: 308790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:47,004-Speed 9281.95 samples/sec Loss 3.4031 LearningRate 0.0006 Epoch: 18 Global Step: 308800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:48,110-Speed 9262.12 samples/sec Loss 3.3286 LearningRate 0.0006 Epoch: 18 Global Step: 308810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:49,269-Speed 8836.64 samples/sec Loss 3.2943 LearningRate 0.0006 Epoch: 18 Global Step: 308820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:50,372-Speed 9292.03 samples/sec Loss 3.4043 LearningRate 0.0006 Epoch: 18 Global Step: 308830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:51,572-Speed 8538.52 samples/sec Loss 3.3971 LearningRate 0.0006 Epoch: 18 Global Step: 308840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:52,697-Speed 9104.39 samples/sec Loss 3.3602 LearningRate 0.0006 Epoch: 18 Global Step: 308850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:53,811-Speed 9197.86 samples/sec Loss 3.3535 LearningRate 0.0006 Epoch: 18 Global Step: 308860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:03:54,914-Speed 9290.60 samples/sec Loss 3.4528 LearningRate 0.0006 Epoch: 18 Global Step: 308870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:56,049-Speed 9033.80 samples/sec Loss 3.3767 LearningRate 0.0006 Epoch: 18 Global Step: 308880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:57,216-Speed 8778.80 samples/sec Loss 3.3510 LearningRate 0.0006 Epoch: 18 Global Step: 308890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:58,283-Speed 9598.31 samples/sec Loss 3.4209 LearningRate 0.0006 Epoch: 18 Global Step: 308900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:03:59,395-Speed 9225.47 samples/sec Loss 3.3231 LearningRate 0.0006 Epoch: 18 Global Step: 308910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:00,529-Speed 9035.93 samples/sec Loss 3.4320 LearningRate 0.0006 Epoch: 18 Global Step: 308920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:01,659-Speed 9061.96 samples/sec Loss 3.3378 LearningRate 0.0006 Epoch: 18 Global Step: 308930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:02,772-Speed 9208.82 samples/sec Loss 3.3820 LearningRate 0.0006 Epoch: 18 Global Step: 308940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:03,845-Speed 9550.27 samples/sec Loss 3.2783 LearningRate 0.0006 Epoch: 18 Global Step: 308950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:04,911-Speed 9602.97 samples/sec Loss 3.4185 LearningRate 0.0006 Epoch: 18 Global Step: 308960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:06,003-Speed 9390.01 samples/sec Loss 3.3673 LearningRate 0.0006 Epoch: 18 Global Step: 308970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:07,118-Speed 9183.78 samples/sec Loss 3.4431 LearningRate 0.0006 Epoch: 18 Global Step: 308980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:08,273-Speed 8872.01 samples/sec Loss 3.3946 LearningRate 0.0006 Epoch: 18 Global Step: 308990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:09,447-Speed 8730.50 samples/sec Loss 3.4017 LearningRate 0.0006 Epoch: 18 Global Step: 309000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:10,579-Speed 9055.60 samples/sec Loss 3.3694 LearningRate 0.0006 Epoch: 18 Global Step: 309010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:11,756-Speed 8705.39 samples/sec Loss 3.4015 LearningRate 0.0006 Epoch: 18 Global Step: 309020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:12,844-Speed 9418.95 samples/sec Loss 3.4428 LearningRate 0.0006 Epoch: 18 Global Step: 309030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:13,997-Speed 8888.44 samples/sec Loss 3.3913 LearningRate 0.0006 Epoch: 18 Global Step: 309040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:15,181-Speed 8649.97 samples/sec Loss 3.4077 LearningRate 0.0006 Epoch: 18 Global Step: 309050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:16,314-Speed 9044.82 samples/sec Loss 3.3893 LearningRate 0.0006 Epoch: 18 Global Step: 309060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:17,428-Speed 9200.76 samples/sec Loss 3.3302 LearningRate 0.0005 Epoch: 18 Global Step: 309070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:04:18,546-Speed 9168.97 samples/sec Loss 3.3769 LearningRate 0.0005 Epoch: 18 Global Step: 309080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:19,635-Speed 9407.55 samples/sec Loss 3.4297 LearningRate 0.0005 Epoch: 18 Global Step: 309090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:20,704-Speed 9579.57 samples/sec Loss 3.3559 LearningRate 0.0005 Epoch: 18 Global Step: 309100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:21,835-Speed 9060.40 samples/sec Loss 3.4342 LearningRate 0.0005 Epoch: 18 Global Step: 309110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:22,945-Speed 9230.93 samples/sec Loss 3.4007 LearningRate 0.0005 Epoch: 18 Global Step: 309120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:24,101-Speed 8863.86 samples/sec Loss 3.4969 LearningRate 0.0005 Epoch: 18 Global Step: 309130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:25,242-Speed 8980.87 samples/sec Loss 3.5102 LearningRate 0.0005 Epoch: 18 Global Step: 309140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:26,356-Speed 9195.45 samples/sec Loss 3.3918 LearningRate 0.0005 Epoch: 18 Global Step: 309150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:27,497-Speed 8976.74 samples/sec Loss 3.3888 LearningRate 0.0005 Epoch: 18 Global Step: 309160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:28,622-Speed 9113.08 samples/sec Loss 3.3039 LearningRate 0.0005 Epoch: 18 Global Step: 309170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:29,714-Speed 9379.36 samples/sec Loss 3.3498 LearningRate 0.0005 Epoch: 18 Global Step: 309180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:04:30,830-Speed 9181.36 samples/sec Loss 3.3782 LearningRate 0.0005 Epoch: 18 Global Step: 309190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:31,962-Speed 9056.44 samples/sec Loss 3.3524 LearningRate 0.0005 Epoch: 18 Global Step: 309200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:33,080-Speed 9165.77 samples/sec Loss 3.3770 LearningRate 0.0005 Epoch: 18 Global Step: 309210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:34,147-Speed 9606.76 samples/sec Loss 3.4835 LearningRate 0.0005 Epoch: 18 Global Step: 309220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:35,275-Speed 9076.17 samples/sec Loss 3.3157 LearningRate 0.0005 Epoch: 18 Global Step: 309230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:36,387-Speed 9212.16 samples/sec Loss 3.4205 LearningRate 0.0005 Epoch: 18 Global Step: 309240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:37,530-Speed 8964.18 samples/sec Loss 3.3836 LearningRate 0.0005 Epoch: 18 Global Step: 309250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:38,713-Speed 8666.70 samples/sec Loss 3.4116 LearningRate 0.0005 Epoch: 18 Global Step: 309260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:39,836-Speed 9119.07 samples/sec Loss 3.3603 LearningRate 0.0005 Epoch: 18 Global Step: 309270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:40,946-Speed 9237.24 samples/sec Loss 3.4493 LearningRate 0.0005 Epoch: 18 Global Step: 309280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:42,093-Speed 8933.26 samples/sec Loss 3.4121 LearningRate 0.0005 Epoch: 18 Global Step: 309290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:04:43,191-Speed 9329.13 samples/sec Loss 3.3369 LearningRate 0.0005 Epoch: 18 Global Step: 309300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:44,284-Speed 9379.51 samples/sec Loss 3.3650 LearningRate 0.0005 Epoch: 18 Global Step: 309310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:45,464-Speed 8682.56 samples/sec Loss 3.4680 LearningRate 0.0005 Epoch: 18 Global Step: 309320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:46,565-Speed 9302.05 samples/sec Loss 3.4165 LearningRate 0.0005 Epoch: 18 Global Step: 309330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:04:47,673-Speed 9256.63 samples/sec Loss 3.4130 LearningRate 0.0005 Epoch: 18 Global Step: 309340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:04:48,797-Speed 9109.21 samples/sec Loss 3.4785 LearningRate 0.0005 Epoch: 18 Global Step: 309350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:04:49,950-Speed 8883.81 samples/sec Loss 3.3846 LearningRate 0.0005 Epoch: 18 Global Step: 309360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:04:51,066-Speed 9186.81 samples/sec Loss 3.5348 LearningRate 0.0005 Epoch: 18 Global Step: 309370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:04:52,169-Speed 9291.24 samples/sec Loss 3.4757 LearningRate 0.0005 Epoch: 18 Global Step: 309380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:04:53,321-Speed 8893.73 samples/sec Loss 3.2976 LearningRate 0.0005 Epoch: 18 Global Step: 309390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:04:54,400-Speed 9488.57 samples/sec Loss 3.4126 LearningRate 0.0005 Epoch: 18 Global Step: 309400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:04:55,493-Speed 9381.42 samples/sec Loss 3.3891 LearningRate 0.0005 Epoch: 18 Global Step: 309410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:04:56,638-Speed 8947.52 samples/sec Loss 3.3719 LearningRate 0.0005 Epoch: 18 Global Step: 309420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:04:57,777-Speed 8994.55 samples/sec Loss 3.3545 LearningRate 0.0005 Epoch: 18 Global Step: 309430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:04:58,892-Speed 9183.92 samples/sec Loss 3.4334 LearningRate 0.0005 Epoch: 18 Global Step: 309440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:00,047-Speed 8873.57 samples/sec Loss 3.3293 LearningRate 0.0005 Epoch: 18 Global Step: 309450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:01,174-Speed 9091.64 samples/sec Loss 3.3641 LearningRate 0.0005 Epoch: 18 Global Step: 309460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:02,300-Speed 9096.75 samples/sec Loss 3.3532 LearningRate 0.0005 Epoch: 18 Global Step: 309470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:03,455-Speed 8874.99 samples/sec Loss 3.3524 LearningRate 0.0005 Epoch: 18 Global Step: 309480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:04,585-Speed 9067.56 samples/sec Loss 3.3653 LearningRate 0.0005 Epoch: 18 Global Step: 309490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:05,737-Speed 8890.36 samples/sec Loss 3.4247 LearningRate 0.0005 Epoch: 18 Global Step: 309500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:06,823-Speed 9435.94 samples/sec Loss 3.4561 LearningRate 0.0005 Epoch: 18 Global Step: 309510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:07,961-Speed 9003.61 samples/sec Loss 3.3882 LearningRate 0.0005 Epoch: 18 Global Step: 309520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:09,096-Speed 9026.88 samples/sec Loss 3.3962 LearningRate 0.0005 Epoch: 18 Global Step: 309530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:05:10,226-Speed 9070.23 samples/sec Loss 3.4128 LearningRate 0.0005 Epoch: 18 Global Step: 309540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:11,346-Speed 9151.19 samples/sec Loss 3.3814 LearningRate 0.0005 Epoch: 18 Global Step: 309550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:12,477-Speed 9054.55 samples/sec Loss 3.4166 LearningRate 0.0005 Epoch: 18 Global Step: 309560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:13,639-Speed 8819.10 samples/sec Loss 3.4090 LearningRate 0.0005 Epoch: 18 Global Step: 309570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:14,737-Speed 9325.23 samples/sec Loss 3.4674 LearningRate 0.0005 Epoch: 18 Global Step: 309580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:15,870-Speed 9044.13 samples/sec Loss 3.3802 LearningRate 0.0005 Epoch: 18 Global Step: 309590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:16,982-Speed 9219.92 samples/sec Loss 3.3750 LearningRate 0.0005 Epoch: 18 Global Step: 309600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:18,073-Speed 9389.24 samples/sec Loss 3.4286 LearningRate 0.0005 Epoch: 18 Global Step: 309610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:19,191-Speed 9164.97 samples/sec Loss 3.3937 LearningRate 0.0005 Epoch: 18 Global Step: 309620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:20,351-Speed 8833.39 samples/sec Loss 3.3964 LearningRate 0.0005 Epoch: 18 Global Step: 309630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:21,464-Speed 9202.88 samples/sec Loss 3.3611 LearningRate 0.0005 Epoch: 18 Global Step: 309640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:22,608-Speed 8954.30 samples/sec Loss 3.4189 LearningRate 0.0005 Epoch: 18 Global Step: 309650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:23,737-Speed 9079.18 samples/sec Loss 3.4676 LearningRate 0.0005 Epoch: 18 Global Step: 309660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:24,839-Speed 9296.59 samples/sec Loss 3.3912 LearningRate 0.0005 Epoch: 18 Global Step: 309670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:25,940-Speed 9307.04 samples/sec Loss 3.3688 LearningRate 0.0005 Epoch: 18 Global Step: 309680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:27,082-Speed 8970.54 samples/sec Loss 3.3084 LearningRate 0.0005 Epoch: 18 Global Step: 309690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:28,176-Speed 9360.99 samples/sec Loss 3.3589 LearningRate 0.0005 Epoch: 18 Global Step: 309700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:29,325-Speed 8923.81 samples/sec Loss 3.4580 LearningRate 0.0005 Epoch: 18 Global Step: 309710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:30,444-Speed 9160.13 samples/sec Loss 3.4088 LearningRate 0.0005 Epoch: 18 Global Step: 309720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:31,617-Speed 8733.60 samples/sec Loss 3.3493 LearningRate 0.0005 Epoch: 18 Global Step: 309730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:32,761-Speed 8953.84 samples/sec Loss 3.4536 LearningRate 0.0005 Epoch: 18 Global Step: 309740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:05:33,849-Speed 9415.23 samples/sec Loss 3.4080 LearningRate 0.0005 Epoch: 18 Global Step: 309750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:34,956-Speed 9257.35 samples/sec Loss 3.3847 LearningRate 0.0005 Epoch: 18 Global Step: 309760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:36,075-Speed 9159.34 samples/sec Loss 3.4379 LearningRate 0.0005 Epoch: 18 Global Step: 309770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:37,166-Speed 9399.66 samples/sec Loss 3.4170 LearningRate 0.0005 Epoch: 18 Global Step: 309780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:38,330-Speed 8797.03 samples/sec Loss 3.3623 LearningRate 0.0005 Epoch: 18 Global Step: 309790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:39,455-Speed 9108.57 samples/sec Loss 3.3993 LearningRate 0.0005 Epoch: 18 Global Step: 309800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:40,581-Speed 9101.33 samples/sec Loss 3.3974 LearningRate 0.0005 Epoch: 18 Global Step: 309810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:41,701-Speed 9146.17 samples/sec Loss 3.2821 LearningRate 0.0005 Epoch: 18 Global Step: 309820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:42,816-Speed 9189.51 samples/sec Loss 3.2912 LearningRate 0.0005 Epoch: 18 Global Step: 309830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:43,939-Speed 9123.61 samples/sec Loss 3.4399 LearningRate 0.0005 Epoch: 18 Global Step: 309840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:45,095-Speed 8868.11 samples/sec Loss 3.4761 LearningRate 0.0005 Epoch: 18 Global Step: 309850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:46,205-Speed 9231.38 samples/sec Loss 3.4011 LearningRate 0.0005 Epoch: 18 Global Step: 309860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:47,309-Speed 9289.31 samples/sec Loss 3.4362 LearningRate 0.0005 Epoch: 18 Global Step: 309870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:48,464-Speed 8868.17 samples/sec Loss 3.3859 LearningRate 0.0005 Epoch: 18 Global Step: 309880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:49,583-Speed 9155.91 samples/sec Loss 3.3075 LearningRate 0.0005 Epoch: 18 Global Step: 309890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:50,757-Speed 8728.22 samples/sec Loss 3.3476 LearningRate 0.0005 Epoch: 18 Global Step: 309900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:51,900-Speed 8962.12 samples/sec Loss 3.3486 LearningRate 0.0005 Epoch: 18 Global Step: 309910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:53,021-Speed 9141.91 samples/sec Loss 3.4417 LearningRate 0.0005 Epoch: 18 Global Step: 309920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:54,148-Speed 9093.37 samples/sec Loss 3.4245 LearningRate 0.0005 Epoch: 18 Global Step: 309930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:55,279-Speed 9054.22 samples/sec Loss 3.4208 LearningRate 0.0005 Epoch: 18 Global Step: 309940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:56,395-Speed 9181.58 samples/sec Loss 3.4331 LearningRate 0.0005 Epoch: 18 Global Step: 309950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:05:57,484-Speed 9412.31 samples/sec Loss 3.4084 LearningRate 0.0005 Epoch: 18 Global Step: 309960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:58,604-Speed 9142.21 samples/sec Loss 3.3714 LearningRate 0.0005 Epoch: 18 Global Step: 309970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:05:59,767-Speed 8815.21 samples/sec Loss 3.4358 LearningRate 0.0005 Epoch: 18 Global Step: 309980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:06:00,911-Speed 8957.20 samples/sec Loss 3.3813 LearningRate 0.0005 Epoch: 18 Global Step: 309990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:06:02,001-Speed 9396.46 samples/sec Loss 3.3740 LearningRate 0.0005 Epoch: 18 Global Step: 310000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:06:23,904-[lfw][310000]XNorm: 6.594072 Training: 2022-04-12 00:06:23,905-[lfw][310000]Accuracy-Flip: 0.99667+-0.00298 Training: 2022-04-12 00:06:23,906-[lfw][310000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:06:49,238-[cfp_fp][310000]XNorm: 5.756133 Training: 2022-04-12 00:06:49,239-[cfp_fp][310000]Accuracy-Flip: 0.97114+-0.00861 Training: 2022-04-12 00:06:49,240-[cfp_fp][310000]Accuracy-Highest: 0.97386 Training: 2022-04-12 00:07:11,078-[agedb_30][310000]XNorm: 6.425412 Training: 2022-04-12 00:07:11,079-[agedb_30][310000]Accuracy-Flip: 0.97300+-0.00795 Training: 2022-04-12 00:07:11,079-[agedb_30][310000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:07:12,222-Speed 145.83 samples/sec Loss 3.3671 LearningRate 0.0005 Epoch: 18 Global Step: 310010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:13,324-Speed 9291.80 samples/sec Loss 3.3562 LearningRate 0.0005 Epoch: 18 Global Step: 310020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:14,424-Speed 9316.38 samples/sec Loss 3.4797 LearningRate 0.0005 Epoch: 18 Global Step: 310030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:15,552-Speed 9081.36 samples/sec Loss 3.3406 LearningRate 0.0005 Epoch: 18 Global Step: 310040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:16,667-Speed 9192.26 samples/sec Loss 3.4566 LearningRate 0.0005 Epoch: 18 Global Step: 310050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:17,760-Speed 9371.65 samples/sec Loss 3.4169 LearningRate 0.0005 Epoch: 18 Global Step: 310060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:07:18,837-Speed 9516.01 samples/sec Loss 3.4022 LearningRate 0.0005 Epoch: 18 Global Step: 310070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:19,961-Speed 9118.83 samples/sec Loss 3.3763 LearningRate 0.0005 Epoch: 18 Global Step: 310080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:21,066-Speed 9269.39 samples/sec Loss 3.3948 LearningRate 0.0005 Epoch: 18 Global Step: 310090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:22,236-Speed 8757.40 samples/sec Loss 3.3891 LearningRate 0.0005 Epoch: 18 Global Step: 310100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:23,364-Speed 9078.75 samples/sec Loss 3.4371 LearningRate 0.0005 Epoch: 18 Global Step: 310110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:24,579-Speed 8433.03 samples/sec Loss 3.4228 LearningRate 0.0005 Epoch: 18 Global Step: 310120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:25,713-Speed 9036.73 samples/sec Loss 3.3596 LearningRate 0.0005 Epoch: 18 Global Step: 310130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:26,797-Speed 9449.42 samples/sec Loss 3.3913 LearningRate 0.0005 Epoch: 18 Global Step: 310140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:27,976-Speed 8687.68 samples/sec Loss 3.3463 LearningRate 0.0005 Epoch: 18 Global Step: 310150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:29,099-Speed 9126.30 samples/sec Loss 3.3865 LearningRate 0.0005 Epoch: 18 Global Step: 310160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:30,232-Speed 9038.90 samples/sec Loss 3.3238 LearningRate 0.0005 Epoch: 18 Global Step: 310170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:07:31,402-Speed 8764.36 samples/sec Loss 3.4118 LearningRate 0.0005 Epoch: 18 Global Step: 310180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:32,526-Speed 9118.32 samples/sec Loss 3.4733 LearningRate 0.0005 Epoch: 18 Global Step: 310190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:33,656-Speed 9066.75 samples/sec Loss 3.3904 LearningRate 0.0005 Epoch: 18 Global Step: 310200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:34,789-Speed 9042.79 samples/sec Loss 3.4082 LearningRate 0.0005 Epoch: 18 Global Step: 310210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:35,922-Speed 9045.59 samples/sec Loss 3.4695 LearningRate 0.0005 Epoch: 18 Global Step: 310220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:37,026-Speed 9277.93 samples/sec Loss 3.3788 LearningRate 0.0005 Epoch: 18 Global Step: 310230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:38,167-Speed 8977.48 samples/sec Loss 3.4294 LearningRate 0.0005 Epoch: 18 Global Step: 310240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:39,332-Speed 8791.55 samples/sec Loss 3.4603 LearningRate 0.0005 Epoch: 18 Global Step: 310250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:40,465-Speed 9050.26 samples/sec Loss 3.3325 LearningRate 0.0005 Epoch: 18 Global Step: 310260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:41,593-Speed 9084.47 samples/sec Loss 3.4472 LearningRate 0.0005 Epoch: 18 Global Step: 310270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:42,712-Speed 9157.55 samples/sec Loss 3.4654 LearningRate 0.0005 Epoch: 18 Global Step: 310280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:43,828-Speed 9173.21 samples/sec Loss 3.3595 LearningRate 0.0005 Epoch: 18 Global Step: 310290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:44,946-Speed 9164.23 samples/sec Loss 3.5002 LearningRate 0.0005 Epoch: 18 Global Step: 310300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:46,058-Speed 9216.35 samples/sec Loss 3.3592 LearningRate 0.0005 Epoch: 18 Global Step: 310310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:47,178-Speed 9149.07 samples/sec Loss 3.2683 LearningRate 0.0005 Epoch: 18 Global Step: 310320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:48,341-Speed 8812.03 samples/sec Loss 3.3704 LearningRate 0.0005 Epoch: 18 Global Step: 310330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:49,510-Speed 8761.16 samples/sec Loss 3.3399 LearningRate 0.0005 Epoch: 18 Global Step: 310340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:50,611-Speed 9305.31 samples/sec Loss 3.3551 LearningRate 0.0005 Epoch: 18 Global Step: 310350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:51,734-Speed 9124.04 samples/sec Loss 3.3687 LearningRate 0.0005 Epoch: 18 Global Step: 310360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:52,898-Speed 8813.74 samples/sec Loss 3.3422 LearningRate 0.0005 Epoch: 18 Global Step: 310370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:54,070-Speed 8736.06 samples/sec Loss 3.4037 LearningRate 0.0005 Epoch: 18 Global Step: 310380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:55,173-Speed 9295.74 samples/sec Loss 3.3881 LearningRate 0.0005 Epoch: 18 Global Step: 310390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:56,366-Speed 8587.25 samples/sec Loss 3.3959 LearningRate 0.0005 Epoch: 18 Global Step: 310400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:57,461-Speed 9350.02 samples/sec Loss 3.3576 LearningRate 0.0005 Epoch: 18 Global Step: 310410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:58,569-Speed 9251.00 samples/sec Loss 3.4004 LearningRate 0.0005 Epoch: 18 Global Step: 310420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:07:59,721-Speed 8899.31 samples/sec Loss 3.4485 LearningRate 0.0005 Epoch: 18 Global Step: 310430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:00,842-Speed 9134.46 samples/sec Loss 3.3651 LearningRate 0.0005 Epoch: 18 Global Step: 310440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:01,973-Speed 9060.35 samples/sec Loss 3.3453 LearningRate 0.0005 Epoch: 18 Global Step: 310450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:03,085-Speed 9218.06 samples/sec Loss 3.3952 LearningRate 0.0005 Epoch: 18 Global Step: 310460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:04,223-Speed 8997.81 samples/sec Loss 3.3951 LearningRate 0.0005 Epoch: 18 Global Step: 310470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:05,358-Speed 9027.77 samples/sec Loss 3.4928 LearningRate 0.0005 Epoch: 18 Global Step: 310480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:08:06,509-Speed 8907.60 samples/sec Loss 3.4132 LearningRate 0.0005 Epoch: 18 Global Step: 310490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:07,642-Speed 9036.25 samples/sec Loss 3.4376 LearningRate 0.0005 Epoch: 18 Global Step: 310500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:08,770-Speed 9082.10 samples/sec Loss 3.3690 LearningRate 0.0005 Epoch: 18 Global Step: 310510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:09,931-Speed 8825.32 samples/sec Loss 3.4576 LearningRate 0.0005 Epoch: 18 Global Step: 310520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:11,034-Speed 9297.25 samples/sec Loss 3.3908 LearningRate 0.0005 Epoch: 18 Global Step: 310530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:12,155-Speed 9137.69 samples/sec Loss 3.4274 LearningRate 0.0005 Epoch: 18 Global Step: 310540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:13,339-Speed 8656.07 samples/sec Loss 3.5070 LearningRate 0.0005 Epoch: 18 Global Step: 310550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:14,484-Speed 8949.80 samples/sec Loss 3.5679 LearningRate 0.0005 Epoch: 18 Global Step: 310560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:15,616-Speed 9045.13 samples/sec Loss 3.4037 LearningRate 0.0005 Epoch: 18 Global Step: 310570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:16,710-Speed 9366.38 samples/sec Loss 3.4343 LearningRate 0.0005 Epoch: 18 Global Step: 310580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:17,805-Speed 9360.62 samples/sec Loss 3.3385 LearningRate 0.0005 Epoch: 18 Global Step: 310590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:18,982-Speed 8702.71 samples/sec Loss 3.4094 LearningRate 0.0005 Epoch: 18 Global Step: 310600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:20,147-Speed 8795.12 samples/sec Loss 3.3422 LearningRate 0.0005 Epoch: 18 Global Step: 310610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:21,241-Speed 9369.31 samples/sec Loss 3.4231 LearningRate 0.0005 Epoch: 18 Global Step: 310620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:22,372-Speed 9059.20 samples/sec Loss 3.4583 LearningRate 0.0005 Epoch: 18 Global Step: 310630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:23,501-Speed 9069.76 samples/sec Loss 3.3860 LearningRate 0.0005 Epoch: 18 Global Step: 310640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:24,652-Speed 8905.05 samples/sec Loss 3.4204 LearningRate 0.0005 Epoch: 18 Global Step: 310650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:25,820-Speed 8768.33 samples/sec Loss 3.4248 LearningRate 0.0005 Epoch: 18 Global Step: 310660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:26,963-Speed 8965.62 samples/sec Loss 3.4229 LearningRate 0.0005 Epoch: 18 Global Step: 310670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:28,040-Speed 9511.22 samples/sec Loss 3.3544 LearningRate 0.0005 Epoch: 18 Global Step: 310680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:29,130-Speed 9403.37 samples/sec Loss 3.4178 LearningRate 0.0005 Epoch: 18 Global Step: 310690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:30,255-Speed 9113.56 samples/sec Loss 3.3982 LearningRate 0.0005 Epoch: 18 Global Step: 310700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:31,326-Speed 9566.76 samples/sec Loss 3.3716 LearningRate 0.0005 Epoch: 18 Global Step: 310710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:32,443-Speed 9167.05 samples/sec Loss 3.4291 LearningRate 0.0005 Epoch: 18 Global Step: 310720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:33,559-Speed 9185.95 samples/sec Loss 3.4403 LearningRate 0.0005 Epoch: 18 Global Step: 310730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:34,681-Speed 9127.10 samples/sec Loss 3.4122 LearningRate 0.0005 Epoch: 18 Global Step: 310740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:35,806-Speed 9107.15 samples/sec Loss 3.4362 LearningRate 0.0005 Epoch: 18 Global Step: 310750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:36,943-Speed 9009.22 samples/sec Loss 3.3551 LearningRate 0.0005 Epoch: 18 Global Step: 310760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:38,043-Speed 9314.45 samples/sec Loss 3.3768 LearningRate 0.0005 Epoch: 18 Global Step: 310770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:39,147-Speed 9280.76 samples/sec Loss 3.4134 LearningRate 0.0005 Epoch: 18 Global Step: 310780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:40,250-Speed 9286.89 samples/sec Loss 3.4458 LearningRate 0.0005 Epoch: 18 Global Step: 310790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:08:41,378-Speed 9090.23 samples/sec Loss 3.5310 LearningRate 0.0005 Epoch: 18 Global Step: 310800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:42,510-Speed 9053.38 samples/sec Loss 3.4661 LearningRate 0.0005 Epoch: 18 Global Step: 310810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:43,648-Speed 9001.21 samples/sec Loss 3.4233 LearningRate 0.0005 Epoch: 18 Global Step: 310820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:44,752-Speed 9279.16 samples/sec Loss 3.4071 LearningRate 0.0005 Epoch: 18 Global Step: 310830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:45,850-Speed 9336.97 samples/sec Loss 3.3444 LearningRate 0.0005 Epoch: 18 Global Step: 310840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:46,992-Speed 8968.74 samples/sec Loss 3.4175 LearningRate 0.0005 Epoch: 18 Global Step: 310850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:48,096-Speed 9280.88 samples/sec Loss 3.4717 LearningRate 0.0005 Epoch: 18 Global Step: 310860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:49,218-Speed 9139.31 samples/sec Loss 3.3461 LearningRate 0.0005 Epoch: 18 Global Step: 310870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:50,319-Speed 9300.15 samples/sec Loss 3.4218 LearningRate 0.0005 Epoch: 18 Global Step: 310880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:51,465-Speed 8944.14 samples/sec Loss 3.4538 LearningRate 0.0005 Epoch: 18 Global Step: 310890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:52,592-Speed 9090.48 samples/sec Loss 3.3261 LearningRate 0.0005 Epoch: 18 Global Step: 310900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:53,696-Speed 9277.85 samples/sec Loss 3.3843 LearningRate 0.0005 Epoch: 18 Global Step: 310910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:54,844-Speed 8922.88 samples/sec Loss 3.4347 LearningRate 0.0005 Epoch: 18 Global Step: 310920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:55,961-Speed 9178.87 samples/sec Loss 3.4609 LearningRate 0.0005 Epoch: 18 Global Step: 310930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:57,076-Speed 9186.78 samples/sec Loss 3.4433 LearningRate 0.0005 Epoch: 18 Global Step: 310940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:58,196-Speed 9144.34 samples/sec Loss 3.4252 LearningRate 0.0005 Epoch: 18 Global Step: 310950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:08:59,301-Speed 9275.32 samples/sec Loss 3.3706 LearningRate 0.0005 Epoch: 18 Global Step: 310960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:00,429-Speed 9088.10 samples/sec Loss 3.3720 LearningRate 0.0005 Epoch: 18 Global Step: 310970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:01,531-Speed 9294.31 samples/sec Loss 3.4146 LearningRate 0.0005 Epoch: 18 Global Step: 310980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:02,628-Speed 9342.61 samples/sec Loss 3.4129 LearningRate 0.0005 Epoch: 18 Global Step: 310990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:03,749-Speed 9140.46 samples/sec Loss 3.4334 LearningRate 0.0005 Epoch: 18 Global Step: 311000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:09:04,869-Speed 9144.69 samples/sec Loss 3.4100 LearningRate 0.0005 Epoch: 18 Global Step: 311010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:06,016-Speed 8931.92 samples/sec Loss 3.3915 LearningRate 0.0005 Epoch: 18 Global Step: 311020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:07,119-Speed 9295.21 samples/sec Loss 3.3918 LearningRate 0.0005 Epoch: 18 Global Step: 311030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:08,224-Speed 9263.63 samples/sec Loss 3.4255 LearningRate 0.0005 Epoch: 18 Global Step: 311040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:09,353-Speed 9080.53 samples/sec Loss 3.4265 LearningRate 0.0005 Epoch: 18 Global Step: 311050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:10,455-Speed 9291.37 samples/sec Loss 3.4138 LearningRate 0.0005 Epoch: 18 Global Step: 311060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:11,568-Speed 9215.15 samples/sec Loss 3.3248 LearningRate 0.0005 Epoch: 18 Global Step: 311070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:12,680-Speed 9211.91 samples/sec Loss 3.3943 LearningRate 0.0005 Epoch: 18 Global Step: 311080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:13,817-Speed 9006.02 samples/sec Loss 3.4636 LearningRate 0.0005 Epoch: 18 Global Step: 311090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:14,894-Speed 9516.59 samples/sec Loss 3.4082 LearningRate 0.0005 Epoch: 18 Global Step: 311100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:15,995-Speed 9307.12 samples/sec Loss 3.4275 LearningRate 0.0005 Epoch: 18 Global Step: 311110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:09:17,073-Speed 9499.44 samples/sec Loss 3.4438 LearningRate 0.0005 Epoch: 18 Global Step: 311120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:18,177-Speed 9296.11 samples/sec Loss 3.4352 LearningRate 0.0005 Epoch: 18 Global Step: 311130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:19,334-Speed 8852.30 samples/sec Loss 3.3805 LearningRate 0.0005 Epoch: 18 Global Step: 311140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:20,488-Speed 8878.85 samples/sec Loss 3.4668 LearningRate 0.0005 Epoch: 18 Global Step: 311150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:21,592-Speed 9280.80 samples/sec Loss 3.3865 LearningRate 0.0005 Epoch: 18 Global Step: 311160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:22,720-Speed 9084.43 samples/sec Loss 3.4698 LearningRate 0.0005 Epoch: 18 Global Step: 311170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:23,818-Speed 9329.46 samples/sec Loss 3.3175 LearningRate 0.0005 Epoch: 18 Global Step: 311180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:24,922-Speed 9275.21 samples/sec Loss 3.4660 LearningRate 0.0005 Epoch: 18 Global Step: 311190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:26,022-Speed 9326.20 samples/sec Loss 3.4196 LearningRate 0.0005 Epoch: 18 Global Step: 311200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:27,127-Speed 9279.85 samples/sec Loss 3.3544 LearningRate 0.0005 Epoch: 18 Global Step: 311210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:28,249-Speed 9134.96 samples/sec Loss 3.4228 LearningRate 0.0005 Epoch: 18 Global Step: 311220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:29,358-Speed 9240.44 samples/sec Loss 3.4168 LearningRate 0.0005 Epoch: 18 Global Step: 311230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:30,454-Speed 9350.92 samples/sec Loss 3.3786 LearningRate 0.0005 Epoch: 18 Global Step: 311240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:31,574-Speed 9144.34 samples/sec Loss 3.4150 LearningRate 0.0005 Epoch: 18 Global Step: 311250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:32,677-Speed 9288.85 samples/sec Loss 3.4524 LearningRate 0.0005 Epoch: 18 Global Step: 311260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:33,807-Speed 9067.57 samples/sec Loss 3.4193 LearningRate 0.0005 Epoch: 18 Global Step: 311270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:34,897-Speed 9400.64 samples/sec Loss 3.3762 LearningRate 0.0005 Epoch: 18 Global Step: 311280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:36,021-Speed 9109.35 samples/sec Loss 3.3756 LearningRate 0.0005 Epoch: 18 Global Step: 311290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:37,113-Speed 9384.44 samples/sec Loss 3.3963 LearningRate 0.0005 Epoch: 18 Global Step: 311300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:38,233-Speed 9146.38 samples/sec Loss 3.4188 LearningRate 0.0005 Epoch: 18 Global Step: 311310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:39,344-Speed 9223.23 samples/sec Loss 3.2828 LearningRate 0.0005 Epoch: 18 Global Step: 311320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:40,484-Speed 8988.50 samples/sec Loss 3.3916 LearningRate 0.0005 Epoch: 18 Global Step: 311330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:41,645-Speed 8827.37 samples/sec Loss 3.3479 LearningRate 0.0005 Epoch: 18 Global Step: 311340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:42,781-Speed 9016.44 samples/sec Loss 3.4417 LearningRate 0.0005 Epoch: 18 Global Step: 311350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:43,897-Speed 9185.98 samples/sec Loss 3.4282 LearningRate 0.0005 Epoch: 18 Global Step: 311360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:45,011-Speed 9197.08 samples/sec Loss 3.3950 LearningRate 0.0005 Epoch: 18 Global Step: 311370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:46,149-Speed 9003.84 samples/sec Loss 3.2840 LearningRate 0.0005 Epoch: 18 Global Step: 311380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:47,249-Speed 9317.46 samples/sec Loss 3.4551 LearningRate 0.0005 Epoch: 18 Global Step: 311390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:09:48,323-Speed 9540.01 samples/sec Loss 3.4210 LearningRate 0.0005 Epoch: 18 Global Step: 311400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:49,416-Speed 9375.44 samples/sec Loss 3.3945 LearningRate 0.0005 Epoch: 18 Global Step: 311410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:50,554-Speed 9001.33 samples/sec Loss 3.4719 LearningRate 0.0005 Epoch: 18 Global Step: 311420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:51,647-Speed 9377.79 samples/sec Loss 3.4286 LearningRate 0.0004 Epoch: 18 Global Step: 311430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:52,846-Speed 8539.91 samples/sec Loss 3.3898 LearningRate 0.0004 Epoch: 18 Global Step: 311440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:53,991-Speed 8954.13 samples/sec Loss 3.4682 LearningRate 0.0004 Epoch: 18 Global Step: 311450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:55,133-Speed 8966.48 samples/sec Loss 3.4613 LearningRate 0.0004 Epoch: 18 Global Step: 311460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:56,254-Speed 9140.17 samples/sec Loss 3.3433 LearningRate 0.0004 Epoch: 18 Global Step: 311470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:57,343-Speed 9408.47 samples/sec Loss 3.3211 LearningRate 0.0004 Epoch: 18 Global Step: 311480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:58,447-Speed 9284.18 samples/sec Loss 3.5032 LearningRate 0.0004 Epoch: 18 Global Step: 311490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:09:59,616-Speed 8763.06 samples/sec Loss 3.4003 LearningRate 0.0004 Epoch: 18 Global Step: 311500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:00,782-Speed 8790.75 samples/sec Loss 3.3517 LearningRate 0.0004 Epoch: 18 Global Step: 311510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:01,914-Speed 9048.16 samples/sec Loss 3.3692 LearningRate 0.0004 Epoch: 18 Global Step: 311520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:03,003-Speed 9412.42 samples/sec Loss 3.3606 LearningRate 0.0004 Epoch: 18 Global Step: 311530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:04,081-Speed 9506.82 samples/sec Loss 3.4451 LearningRate 0.0004 Epoch: 18 Global Step: 311540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:05,170-Speed 9405.75 samples/sec Loss 3.4224 LearningRate 0.0004 Epoch: 18 Global Step: 311550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:06,266-Speed 9345.64 samples/sec Loss 3.5074 LearningRate 0.0004 Epoch: 18 Global Step: 311560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:07,409-Speed 8965.79 samples/sec Loss 3.4191 LearningRate 0.0004 Epoch: 18 Global Step: 311570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:08,508-Speed 9317.16 samples/sec Loss 3.4355 LearningRate 0.0004 Epoch: 18 Global Step: 311580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:09,620-Speed 9214.22 samples/sec Loss 3.4498 LearningRate 0.0004 Epoch: 18 Global Step: 311590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:10,709-Speed 9417.10 samples/sec Loss 3.3846 LearningRate 0.0004 Epoch: 18 Global Step: 311600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:11,841-Speed 9048.85 samples/sec Loss 3.3901 LearningRate 0.0004 Epoch: 18 Global Step: 311610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:12,969-Speed 9087.19 samples/sec Loss 3.4158 LearningRate 0.0004 Epoch: 18 Global Step: 311620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:14,093-Speed 9117.08 samples/sec Loss 3.3312 LearningRate 0.0004 Epoch: 18 Global Step: 311630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:15,222-Speed 9070.94 samples/sec Loss 3.3667 LearningRate 0.0004 Epoch: 18 Global Step: 311640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:16,401-Speed 8688.00 samples/sec Loss 3.3723 LearningRate 0.0004 Epoch: 18 Global Step: 311650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:17,570-Speed 8765.33 samples/sec Loss 3.4387 LearningRate 0.0004 Epoch: 18 Global Step: 311660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:18,684-Speed 9204.69 samples/sec Loss 3.3377 LearningRate 0.0004 Epoch: 18 Global Step: 311670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:19,858-Speed 8726.77 samples/sec Loss 3.3996 LearningRate 0.0004 Epoch: 18 Global Step: 311680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:20,990-Speed 9043.84 samples/sec Loss 3.3883 LearningRate 0.0004 Epoch: 18 Global Step: 311690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:22,123-Speed 9047.08 samples/sec Loss 3.4474 LearningRate 0.0004 Epoch: 18 Global Step: 311700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:23,296-Speed 8735.56 samples/sec Loss 3.3624 LearningRate 0.0004 Epoch: 18 Global Step: 311710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:24,436-Speed 8987.42 samples/sec Loss 3.3441 LearningRate 0.0004 Epoch: 18 Global Step: 311720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:25,555-Speed 9152.30 samples/sec Loss 3.3672 LearningRate 0.0004 Epoch: 18 Global Step: 311730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:26,672-Speed 9173.39 samples/sec Loss 3.3537 LearningRate 0.0004 Epoch: 18 Global Step: 311740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:27,760-Speed 9423.49 samples/sec Loss 3.4238 LearningRate 0.0004 Epoch: 18 Global Step: 311750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:28,864-Speed 9278.81 samples/sec Loss 3.3408 LearningRate 0.0004 Epoch: 18 Global Step: 311760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:29,991-Speed 9116.27 samples/sec Loss 3.4249 LearningRate 0.0004 Epoch: 18 Global Step: 311770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:31,077-Speed 9433.83 samples/sec Loss 3.3832 LearningRate 0.0004 Epoch: 18 Global Step: 311780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:32,192-Speed 9192.01 samples/sec Loss 3.3610 LearningRate 0.0004 Epoch: 18 Global Step: 311790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:33,299-Speed 9251.19 samples/sec Loss 3.5063 LearningRate 0.0004 Epoch: 18 Global Step: 311800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:34,416-Speed 9175.39 samples/sec Loss 3.4159 LearningRate 0.0004 Epoch: 18 Global Step: 311810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:35,513-Speed 9340.88 samples/sec Loss 3.4299 LearningRate 0.0004 Epoch: 18 Global Step: 311820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:36,615-Speed 9298.17 samples/sec Loss 3.3975 LearningRate 0.0004 Epoch: 18 Global Step: 311830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:37,734-Speed 9161.39 samples/sec Loss 3.4201 LearningRate 0.0004 Epoch: 18 Global Step: 311840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:38,871-Speed 9006.71 samples/sec Loss 3.3748 LearningRate 0.0004 Epoch: 18 Global Step: 311850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:40,029-Speed 8852.66 samples/sec Loss 3.4832 LearningRate 0.0004 Epoch: 18 Global Step: 311860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:41,162-Speed 9042.49 samples/sec Loss 3.4734 LearningRate 0.0004 Epoch: 18 Global Step: 311870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:42,284-Speed 9136.84 samples/sec Loss 3.3714 LearningRate 0.0004 Epoch: 18 Global Step: 311880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:43,445-Speed 8824.43 samples/sec Loss 3.4288 LearningRate 0.0004 Epoch: 18 Global Step: 311890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:44,594-Speed 8911.12 samples/sec Loss 3.4099 LearningRate 0.0004 Epoch: 18 Global Step: 311900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:10:45,673-Speed 9500.73 samples/sec Loss 3.4019 LearningRate 0.0004 Epoch: 18 Global Step: 311910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:10:46,736-Speed 9634.47 samples/sec Loss 3.4139 LearningRate 0.0004 Epoch: 18 Global Step: 311920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:47,861-Speed 9112.72 samples/sec Loss 3.3994 LearningRate 0.0004 Epoch: 18 Global Step: 311930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:48,969-Speed 9246.30 samples/sec Loss 3.4169 LearningRate 0.0004 Epoch: 18 Global Step: 311940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:50,117-Speed 8921.50 samples/sec Loss 3.4277 LearningRate 0.0004 Epoch: 18 Global Step: 311950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:51,235-Speed 9171.96 samples/sec Loss 3.4762 LearningRate 0.0004 Epoch: 18 Global Step: 311960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:52,371-Speed 9017.23 samples/sec Loss 3.4076 LearningRate 0.0004 Epoch: 18 Global Step: 311970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:53,542-Speed 8744.21 samples/sec Loss 3.3510 LearningRate 0.0004 Epoch: 18 Global Step: 311980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:54,675-Speed 9046.64 samples/sec Loss 3.4148 LearningRate 0.0004 Epoch: 18 Global Step: 311990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:10:55,798-Speed 9122.37 samples/sec Loss 3.4304 LearningRate 0.0004 Epoch: 18 Global Step: 312000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:11:17,577-[lfw][312000]XNorm: 6.563531 Training: 2022-04-12 00:11:17,578-[lfw][312000]Accuracy-Flip: 0.99617+-0.00289 Training: 2022-04-12 00:11:17,579-[lfw][312000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:11:42,768-[cfp_fp][312000]XNorm: 5.733843 Training: 2022-04-12 00:11:42,769-[cfp_fp][312000]Accuracy-Flip: 0.97243+-0.00860 Training: 2022-04-12 00:11:42,769-[cfp_fp][312000]Accuracy-Highest: 0.97386 Training: 2022-04-12 00:12:04,474-[agedb_30][312000]XNorm: 6.401253 Training: 2022-04-12 00:12:04,475-[agedb_30][312000]Accuracy-Flip: 0.97250+-0.00857 Training: 2022-04-12 00:12:04,475-[agedb_30][312000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:12:05,597-Speed 146.71 samples/sec Loss 3.4054 LearningRate 0.0004 Epoch: 18 Global Step: 312010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:06,758-Speed 8827.61 samples/sec Loss 3.3517 LearningRate 0.0004 Epoch: 18 Global Step: 312020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:12:07,867-Speed 9239.69 samples/sec Loss 3.5021 LearningRate 0.0004 Epoch: 18 Global Step: 312030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:12:08,969-Speed 9294.69 samples/sec Loss 3.3910 LearningRate 0.0004 Epoch: 18 Global Step: 312040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:12:10,034-Speed 9619.63 samples/sec Loss 3.4016 LearningRate 0.0004 Epoch: 18 Global Step: 312050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:11,128-Speed 9371.56 samples/sec Loss 3.3555 LearningRate 0.0004 Epoch: 18 Global Step: 312060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:12,230-Speed 9294.99 samples/sec Loss 3.4246 LearningRate 0.0004 Epoch: 18 Global Step: 312070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:13,323-Speed 9376.79 samples/sec Loss 3.4221 LearningRate 0.0004 Epoch: 18 Global Step: 312080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:14,438-Speed 9184.09 samples/sec Loss 3.4180 LearningRate 0.0004 Epoch: 18 Global Step: 312090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:15,549-Speed 9228.33 samples/sec Loss 3.3764 LearningRate 0.0004 Epoch: 18 Global Step: 312100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:16,671-Speed 9132.79 samples/sec Loss 3.4256 LearningRate 0.0004 Epoch: 18 Global Step: 312110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:17,761-Speed 9406.20 samples/sec Loss 3.4467 LearningRate 0.0004 Epoch: 18 Global Step: 312120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:18,860-Speed 9329.49 samples/sec Loss 3.2966 LearningRate 0.0004 Epoch: 18 Global Step: 312130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:19,975-Speed 9184.66 samples/sec Loss 3.4190 LearningRate 0.0004 Epoch: 18 Global Step: 312140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:21,117-Speed 8974.07 samples/sec Loss 3.2877 LearningRate 0.0004 Epoch: 18 Global Step: 312150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:22,227-Speed 9225.27 samples/sec Loss 3.4498 LearningRate 0.0004 Epoch: 18 Global Step: 312160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:23,366-Speed 9000.03 samples/sec Loss 3.2824 LearningRate 0.0004 Epoch: 18 Global Step: 312170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:24,511-Speed 8949.37 samples/sec Loss 3.4088 LearningRate 0.0004 Epoch: 18 Global Step: 312180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:25,635-Speed 9113.26 samples/sec Loss 3.4456 LearningRate 0.0004 Epoch: 18 Global Step: 312190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:26,780-Speed 8950.90 samples/sec Loss 3.4314 LearningRate 0.0004 Epoch: 18 Global Step: 312200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:27,886-Speed 9263.35 samples/sec Loss 3.4202 LearningRate 0.0004 Epoch: 18 Global Step: 312210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:29,027-Speed 8980.14 samples/sec Loss 3.4013 LearningRate 0.0004 Epoch: 18 Global Step: 312220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:30,211-Speed 8655.53 samples/sec Loss 3.3339 LearningRate 0.0004 Epoch: 18 Global Step: 312230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:31,317-Speed 9267.83 samples/sec Loss 3.3957 LearningRate 0.0004 Epoch: 18 Global Step: 312240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:32,406-Speed 9408.31 samples/sec Loss 3.3049 LearningRate 0.0004 Epoch: 18 Global Step: 312250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:12:33,524-Speed 9162.22 samples/sec Loss 3.3888 LearningRate 0.0004 Epoch: 18 Global Step: 312260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:34,637-Speed 9206.41 samples/sec Loss 3.3903 LearningRate 0.0004 Epoch: 18 Global Step: 312270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:35,726-Speed 9410.03 samples/sec Loss 3.3567 LearningRate 0.0004 Epoch: 18 Global Step: 312280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:36,783-Speed 9692.39 samples/sec Loss 3.3427 LearningRate 0.0004 Epoch: 18 Global Step: 312290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:37,915-Speed 9045.31 samples/sec Loss 3.4256 LearningRate 0.0004 Epoch: 18 Global Step: 312300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:38,997-Speed 9468.36 samples/sec Loss 3.4028 LearningRate 0.0004 Epoch: 18 Global Step: 312310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:40,104-Speed 9260.79 samples/sec Loss 3.3535 LearningRate 0.0004 Epoch: 18 Global Step: 312320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:41,228-Speed 9121.50 samples/sec Loss 3.3934 LearningRate 0.0004 Epoch: 18 Global Step: 312330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:42,415-Speed 8626.96 samples/sec Loss 3.4767 LearningRate 0.0004 Epoch: 18 Global Step: 312340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:43,607-Speed 8596.08 samples/sec Loss 3.3391 LearningRate 0.0004 Epoch: 18 Global Step: 312350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:44,711-Speed 9276.77 samples/sec Loss 3.4930 LearningRate 0.0004 Epoch: 18 Global Step: 312360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:45,844-Speed 9046.87 samples/sec Loss 3.3913 LearningRate 0.0004 Epoch: 18 Global Step: 312370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:46,953-Speed 9234.34 samples/sec Loss 3.3207 LearningRate 0.0004 Epoch: 18 Global Step: 312380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:48,059-Speed 9276.94 samples/sec Loss 3.3672 LearningRate 0.0004 Epoch: 18 Global Step: 312390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:49,155-Speed 9348.12 samples/sec Loss 3.4848 LearningRate 0.0004 Epoch: 18 Global Step: 312400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:50,263-Speed 9245.77 samples/sec Loss 3.3545 LearningRate 0.0004 Epoch: 18 Global Step: 312410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:51,404-Speed 8980.64 samples/sec Loss 3.3563 LearningRate 0.0004 Epoch: 18 Global Step: 312420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:52,552-Speed 8924.79 samples/sec Loss 3.4152 LearningRate 0.0004 Epoch: 18 Global Step: 312430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:53,650-Speed 9329.85 samples/sec Loss 3.3713 LearningRate 0.0004 Epoch: 18 Global Step: 312440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:54,751-Speed 9298.54 samples/sec Loss 3.4169 LearningRate 0.0004 Epoch: 18 Global Step: 312450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:55,891-Speed 8989.93 samples/sec Loss 3.3770 LearningRate 0.0004 Epoch: 18 Global Step: 312460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:12:56,999-Speed 9247.79 samples/sec Loss 3.4213 LearningRate 0.0004 Epoch: 18 Global Step: 312470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:12:58,099-Speed 9317.85 samples/sec Loss 3.4110 LearningRate 0.0004 Epoch: 18 Global Step: 312480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:12:59,231-Speed 9050.75 samples/sec Loss 3.4373 LearningRate 0.0004 Epoch: 18 Global Step: 312490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:00,343-Speed 9215.09 samples/sec Loss 3.4093 LearningRate 0.0004 Epoch: 18 Global Step: 312500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:01,454-Speed 9217.60 samples/sec Loss 3.4150 LearningRate 0.0004 Epoch: 18 Global Step: 312510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:02,533-Speed 9500.57 samples/sec Loss 3.3861 LearningRate 0.0004 Epoch: 18 Global Step: 312520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:03,643-Speed 9224.46 samples/sec Loss 3.4489 LearningRate 0.0004 Epoch: 18 Global Step: 312530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:04,793-Speed 8909.06 samples/sec Loss 3.4617 LearningRate 0.0004 Epoch: 18 Global Step: 312540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:05,886-Speed 9374.52 samples/sec Loss 3.4429 LearningRate 0.0004 Epoch: 18 Global Step: 312550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:07,003-Speed 9179.12 samples/sec Loss 3.4172 LearningRate 0.0004 Epoch: 18 Global Step: 312560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:08,113-Speed 9228.84 samples/sec Loss 3.3794 LearningRate 0.0004 Epoch: 18 Global Step: 312570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:09,222-Speed 9234.53 samples/sec Loss 3.3683 LearningRate 0.0004 Epoch: 18 Global Step: 312580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:13:10,331-Speed 9243.45 samples/sec Loss 3.3197 LearningRate 0.0004 Epoch: 18 Global Step: 312590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:11,414-Speed 9467.45 samples/sec Loss 3.4480 LearningRate 0.0004 Epoch: 18 Global Step: 312600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:12,521-Speed 9254.37 samples/sec Loss 3.4199 LearningRate 0.0004 Epoch: 18 Global Step: 312610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:13,682-Speed 8821.42 samples/sec Loss 3.3721 LearningRate 0.0004 Epoch: 18 Global Step: 312620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:14,795-Speed 9202.15 samples/sec Loss 3.3741 LearningRate 0.0004 Epoch: 18 Global Step: 312630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:15,919-Speed 9114.85 samples/sec Loss 3.3995 LearningRate 0.0004 Epoch: 18 Global Step: 312640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:17,081-Speed 8819.26 samples/sec Loss 3.3518 LearningRate 0.0004 Epoch: 18 Global Step: 312650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:18,199-Speed 9169.42 samples/sec Loss 3.4102 LearningRate 0.0004 Epoch: 18 Global Step: 312660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:19,322-Speed 9123.75 samples/sec Loss 3.3724 LearningRate 0.0004 Epoch: 18 Global Step: 312670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:20,464-Speed 8972.61 samples/sec Loss 3.4107 LearningRate 0.0004 Epoch: 18 Global Step: 312680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:21,597-Speed 9035.76 samples/sec Loss 3.4714 LearningRate 0.0004 Epoch: 18 Global Step: 312690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:22,715-Speed 9164.04 samples/sec Loss 3.4517 LearningRate 0.0004 Epoch: 18 Global Step: 312700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:23,875-Speed 8836.47 samples/sec Loss 3.4540 LearningRate 0.0004 Epoch: 18 Global Step: 312710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:24,993-Speed 9162.97 samples/sec Loss 3.4617 LearningRate 0.0004 Epoch: 18 Global Step: 312720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:26,134-Speed 8978.45 samples/sec Loss 3.3909 LearningRate 0.0004 Epoch: 18 Global Step: 312730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:27,232-Speed 9337.73 samples/sec Loss 3.4514 LearningRate 0.0004 Epoch: 18 Global Step: 312740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:28,346-Speed 9194.48 samples/sec Loss 3.3754 LearningRate 0.0004 Epoch: 18 Global Step: 312750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:29,508-Speed 8815.03 samples/sec Loss 3.3781 LearningRate 0.0004 Epoch: 18 Global Step: 312760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:30,641-Speed 9053.59 samples/sec Loss 3.3827 LearningRate 0.0004 Epoch: 18 Global Step: 312770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:31,779-Speed 9001.80 samples/sec Loss 3.4332 LearningRate 0.0004 Epoch: 18 Global Step: 312780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:32,836-Speed 9692.93 samples/sec Loss 3.4279 LearningRate 0.0004 Epoch: 18 Global Step: 312790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:33,958-Speed 9134.70 samples/sec Loss 3.4058 LearningRate 0.0004 Epoch: 18 Global Step: 312800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:35,133-Speed 8717.06 samples/sec Loss 3.4078 LearningRate 0.0004 Epoch: 18 Global Step: 312810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:36,195-Speed 9645.48 samples/sec Loss 3.4585 LearningRate 0.0004 Epoch: 18 Global Step: 312820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:13:37,354-Speed 8845.02 samples/sec Loss 3.4446 LearningRate 0.0004 Epoch: 18 Global Step: 312830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:13:38,502-Speed 8922.49 samples/sec Loss 3.3961 LearningRate 0.0004 Epoch: 18 Global Step: 312840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:13:39,605-Speed 9290.69 samples/sec Loss 3.3968 LearningRate 0.0004 Epoch: 18 Global Step: 312850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:13:40,742-Speed 9010.32 samples/sec Loss 3.3611 LearningRate 0.0004 Epoch: 18 Global Step: 312860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:13:41,894-Speed 8894.19 samples/sec Loss 3.4286 LearningRate 0.0004 Epoch: 18 Global Step: 312870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:13:43,067-Speed 8736.47 samples/sec Loss 3.3816 LearningRate 0.0004 Epoch: 18 Global Step: 312880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:13:44,154-Speed 9425.74 samples/sec Loss 3.3913 LearningRate 0.0004 Epoch: 18 Global Step: 312890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:13:45,254-Speed 9317.29 samples/sec Loss 3.3487 LearningRate 0.0004 Epoch: 18 Global Step: 312900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:13:46,378-Speed 9117.75 samples/sec Loss 3.3790 LearningRate 0.0004 Epoch: 18 Global Step: 312910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:13:47,493-Speed 9190.38 samples/sec Loss 3.4782 LearningRate 0.0004 Epoch: 18 Global Step: 312920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:48,620-Speed 9090.87 samples/sec Loss 3.4409 LearningRate 0.0004 Epoch: 18 Global Step: 312930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:49,703-Speed 9465.48 samples/sec Loss 3.3733 LearningRate 0.0004 Epoch: 18 Global Step: 312940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:50,806-Speed 9290.15 samples/sec Loss 3.3826 LearningRate 0.0004 Epoch: 18 Global Step: 312950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:51,917-Speed 9221.99 samples/sec Loss 3.3985 LearningRate 0.0004 Epoch: 18 Global Step: 312960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:53,010-Speed 9367.58 samples/sec Loss 3.4590 LearningRate 0.0004 Epoch: 18 Global Step: 312970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:54,134-Speed 9114.10 samples/sec Loss 3.4192 LearningRate 0.0004 Epoch: 18 Global Step: 312980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:55,303-Speed 8764.89 samples/sec Loss 3.3993 LearningRate 0.0004 Epoch: 18 Global Step: 312990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:56,421-Speed 9162.18 samples/sec Loss 3.4358 LearningRate 0.0004 Epoch: 18 Global Step: 313000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:57,583-Speed 8816.74 samples/sec Loss 3.4487 LearningRate 0.0004 Epoch: 18 Global Step: 313010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:58,737-Speed 8881.22 samples/sec Loss 3.3845 LearningRate 0.0004 Epoch: 18 Global Step: 313020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:13:59,878-Speed 8979.66 samples/sec Loss 3.3639 LearningRate 0.0004 Epoch: 18 Global Step: 313030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:01,036-Speed 8849.56 samples/sec Loss 3.3933 LearningRate 0.0004 Epoch: 18 Global Step: 313040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:02,170-Speed 9041.45 samples/sec Loss 3.3939 LearningRate 0.0004 Epoch: 18 Global Step: 313050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:03,242-Speed 9560.39 samples/sec Loss 3.4743 LearningRate 0.0004 Epoch: 18 Global Step: 313060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:04,410-Speed 8772.70 samples/sec Loss 3.3588 LearningRate 0.0004 Epoch: 18 Global Step: 313070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:05,523-Speed 9205.63 samples/sec Loss 3.4381 LearningRate 0.0004 Epoch: 18 Global Step: 313080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:06,605-Speed 9470.17 samples/sec Loss 3.4841 LearningRate 0.0004 Epoch: 18 Global Step: 313090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:07,705-Speed 9313.38 samples/sec Loss 3.4653 LearningRate 0.0004 Epoch: 18 Global Step: 313100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:08,860-Speed 8866.06 samples/sec Loss 3.3054 LearningRate 0.0004 Epoch: 18 Global Step: 313110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:09,939-Speed 9495.97 samples/sec Loss 3.3820 LearningRate 0.0004 Epoch: 18 Global Step: 313120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:11,085-Speed 8947.49 samples/sec Loss 3.4112 LearningRate 0.0004 Epoch: 18 Global Step: 313130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:12,227-Speed 8969.17 samples/sec Loss 3.3552 LearningRate 0.0004 Epoch: 18 Global Step: 313140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:13,350-Speed 9121.10 samples/sec Loss 3.3732 LearningRate 0.0004 Epoch: 18 Global Step: 313150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:14,444-Speed 9367.62 samples/sec Loss 3.4390 LearningRate 0.0004 Epoch: 18 Global Step: 313160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:15,557-Speed 9206.94 samples/sec Loss 3.3592 LearningRate 0.0004 Epoch: 18 Global Step: 313170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:16,692-Speed 9026.42 samples/sec Loss 3.3585 LearningRate 0.0004 Epoch: 18 Global Step: 313180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:17,818-Speed 9104.37 samples/sec Loss 3.4075 LearningRate 0.0004 Epoch: 18 Global Step: 313190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:18,962-Speed 8954.58 samples/sec Loss 3.4862 LearningRate 0.0004 Epoch: 18 Global Step: 313200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:20,088-Speed 9106.91 samples/sec Loss 3.3986 LearningRate 0.0004 Epoch: 18 Global Step: 313210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:21,215-Speed 9088.26 samples/sec Loss 3.4385 LearningRate 0.0004 Epoch: 18 Global Step: 313220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:22,341-Speed 9100.42 samples/sec Loss 3.3527 LearningRate 0.0004 Epoch: 18 Global Step: 313230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:23,522-Speed 8674.46 samples/sec Loss 3.3814 LearningRate 0.0004 Epoch: 18 Global Step: 313240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:24,671-Speed 8918.32 samples/sec Loss 3.4500 LearningRate 0.0004 Epoch: 18 Global Step: 313250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:25,745-Speed 9538.07 samples/sec Loss 3.4226 LearningRate 0.0004 Epoch: 18 Global Step: 313260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:26,825-Speed 9484.93 samples/sec Loss 3.3676 LearningRate 0.0004 Epoch: 18 Global Step: 313270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:27,985-Speed 8836.81 samples/sec Loss 3.3583 LearningRate 0.0004 Epoch: 18 Global Step: 313280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:29,131-Speed 8938.08 samples/sec Loss 3.3790 LearningRate 0.0004 Epoch: 18 Global Step: 313290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:30,249-Speed 9166.12 samples/sec Loss 3.4325 LearningRate 0.0004 Epoch: 18 Global Step: 313300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:31,368-Speed 9153.83 samples/sec Loss 3.3940 LearningRate 0.0004 Epoch: 18 Global Step: 313310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:32,500-Speed 9054.60 samples/sec Loss 3.4187 LearningRate 0.0004 Epoch: 18 Global Step: 313320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:33,575-Speed 9526.19 samples/sec Loss 3.3858 LearningRate 0.0004 Epoch: 18 Global Step: 313330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:34,731-Speed 8865.35 samples/sec Loss 3.4050 LearningRate 0.0004 Epoch: 18 Global Step: 313340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:35,874-Speed 8964.79 samples/sec Loss 3.3385 LearningRate 0.0004 Epoch: 18 Global Step: 313350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:36,985-Speed 9219.62 samples/sec Loss 3.3863 LearningRate 0.0004 Epoch: 18 Global Step: 313360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:38,093-Speed 9249.51 samples/sec Loss 3.3587 LearningRate 0.0004 Epoch: 18 Global Step: 313370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:39,197-Speed 9283.35 samples/sec Loss 3.4161 LearningRate 0.0004 Epoch: 18 Global Step: 313380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:40,300-Speed 9289.70 samples/sec Loss 3.3919 LearningRate 0.0004 Epoch: 18 Global Step: 313390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:41,398-Speed 9330.17 samples/sec Loss 3.4045 LearningRate 0.0004 Epoch: 18 Global Step: 313400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:42,521-Speed 9126.05 samples/sec Loss 3.4927 LearningRate 0.0004 Epoch: 18 Global Step: 313410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:43,606-Speed 9440.45 samples/sec Loss 3.3489 LearningRate 0.0004 Epoch: 18 Global Step: 313420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:44,735-Speed 9075.78 samples/sec Loss 3.3410 LearningRate 0.0004 Epoch: 18 Global Step: 313430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:45,864-Speed 9072.40 samples/sec Loss 3.4136 LearningRate 0.0004 Epoch: 18 Global Step: 313440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:46,971-Speed 9265.41 samples/sec Loss 3.4357 LearningRate 0.0004 Epoch: 18 Global Step: 313450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:48,109-Speed 9001.91 samples/sec Loss 3.3717 LearningRate 0.0004 Epoch: 18 Global Step: 313460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:14:49,217-Speed 9246.99 samples/sec Loss 3.3624 LearningRate 0.0004 Epoch: 18 Global Step: 313470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:50,352-Speed 9027.28 samples/sec Loss 3.4106 LearningRate 0.0004 Epoch: 18 Global Step: 313480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:51,466-Speed 9201.27 samples/sec Loss 3.4079 LearningRate 0.0004 Epoch: 18 Global Step: 313490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:52,608-Speed 8965.31 samples/sec Loss 3.3995 LearningRate 0.0004 Epoch: 18 Global Step: 313500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:53,767-Speed 8843.94 samples/sec Loss 3.4035 LearningRate 0.0004 Epoch: 18 Global Step: 313510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:54,903-Speed 9024.11 samples/sec Loss 3.3360 LearningRate 0.0004 Epoch: 18 Global Step: 313520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:56,095-Speed 8592.87 samples/sec Loss 3.3700 LearningRate 0.0004 Epoch: 18 Global Step: 313530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:57,235-Speed 8985.61 samples/sec Loss 3.4058 LearningRate 0.0004 Epoch: 18 Global Step: 313540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:58,373-Speed 9003.75 samples/sec Loss 3.3479 LearningRate 0.0004 Epoch: 18 Global Step: 313550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:14:59,476-Speed 9287.36 samples/sec Loss 3.4113 LearningRate 0.0004 Epoch: 18 Global Step: 313560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:15:00,632-Speed 8862.84 samples/sec Loss 3.4386 LearningRate 0.0004 Epoch: 18 Global Step: 313570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:01,769-Speed 9012.93 samples/sec Loss 3.4026 LearningRate 0.0004 Epoch: 18 Global Step: 313580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:02,909-Speed 8987.68 samples/sec Loss 3.4784 LearningRate 0.0004 Epoch: 18 Global Step: 313590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:03,998-Speed 9405.06 samples/sec Loss 3.3842 LearningRate 0.0004 Epoch: 18 Global Step: 313600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:05,110-Speed 9218.31 samples/sec Loss 3.4162 LearningRate 0.0004 Epoch: 18 Global Step: 313610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:06,240-Speed 9068.66 samples/sec Loss 3.3908 LearningRate 0.0004 Epoch: 18 Global Step: 313620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:07,374-Speed 9036.60 samples/sec Loss 3.4290 LearningRate 0.0004 Epoch: 18 Global Step: 313630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:08,500-Speed 9101.69 samples/sec Loss 3.4703 LearningRate 0.0004 Epoch: 18 Global Step: 313640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:09,612-Speed 9211.09 samples/sec Loss 3.4163 LearningRate 0.0004 Epoch: 18 Global Step: 313650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:10,720-Speed 9250.17 samples/sec Loss 3.4887 LearningRate 0.0004 Epoch: 18 Global Step: 313660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:11,864-Speed 8949.61 samples/sec Loss 3.3864 LearningRate 0.0004 Epoch: 18 Global Step: 313670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:12,983-Speed 9159.93 samples/sec Loss 3.4208 LearningRate 0.0004 Epoch: 18 Global Step: 313680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:14,110-Speed 9090.68 samples/sec Loss 3.3962 LearningRate 0.0004 Epoch: 18 Global Step: 313690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:15,200-Speed 9395.24 samples/sec Loss 3.3755 LearningRate 0.0004 Epoch: 18 Global Step: 313700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:16,331-Speed 9066.05 samples/sec Loss 3.4239 LearningRate 0.0004 Epoch: 18 Global Step: 313710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:17,451-Speed 9152.21 samples/sec Loss 3.3620 LearningRate 0.0004 Epoch: 18 Global Step: 313720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:18,597-Speed 8941.96 samples/sec Loss 3.4473 LearningRate 0.0004 Epoch: 18 Global Step: 313730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:19,750-Speed 8883.26 samples/sec Loss 3.3621 LearningRate 0.0004 Epoch: 18 Global Step: 313740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:20,918-Speed 8773.94 samples/sec Loss 3.4005 LearningRate 0.0004 Epoch: 18 Global Step: 313750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:22,066-Speed 8919.33 samples/sec Loss 3.4719 LearningRate 0.0004 Epoch: 18 Global Step: 313760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:23,150-Speed 9458.31 samples/sec Loss 3.4333 LearningRate 0.0004 Epoch: 18 Global Step: 313770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:15:24,286-Speed 9016.96 samples/sec Loss 3.4461 LearningRate 0.0004 Epoch: 18 Global Step: 313780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:25,409-Speed 9122.15 samples/sec Loss 3.3148 LearningRate 0.0004 Epoch: 18 Global Step: 313790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:26,531-Speed 9134.28 samples/sec Loss 3.4442 LearningRate 0.0004 Epoch: 18 Global Step: 313800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:27,631-Speed 9312.60 samples/sec Loss 3.5248 LearningRate 0.0004 Epoch: 18 Global Step: 313810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:28,735-Speed 9280.24 samples/sec Loss 3.4178 LearningRate 0.0004 Epoch: 18 Global Step: 313820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:29,838-Speed 9290.48 samples/sec Loss 3.4698 LearningRate 0.0004 Epoch: 18 Global Step: 313830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:30,985-Speed 8930.42 samples/sec Loss 3.4601 LearningRate 0.0004 Epoch: 18 Global Step: 313840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:32,186-Speed 8530.27 samples/sec Loss 3.4088 LearningRate 0.0004 Epoch: 18 Global Step: 313850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:33,336-Speed 8908.89 samples/sec Loss 3.4252 LearningRate 0.0004 Epoch: 18 Global Step: 313860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:34,478-Speed 8970.59 samples/sec Loss 3.4396 LearningRate 0.0004 Epoch: 18 Global Step: 313870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:35,612-Speed 9040.96 samples/sec Loss 3.3892 LearningRate 0.0004 Epoch: 18 Global Step: 313880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:15:36,666-Speed 9720.75 samples/sec Loss 3.4051 LearningRate 0.0004 Epoch: 18 Global Step: 313890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:37,798-Speed 9050.73 samples/sec Loss 3.3473 LearningRate 0.0004 Epoch: 18 Global Step: 313900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:38,919-Speed 9137.63 samples/sec Loss 3.4242 LearningRate 0.0004 Epoch: 18 Global Step: 313910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:40,051-Speed 9050.26 samples/sec Loss 3.3825 LearningRate 0.0004 Epoch: 18 Global Step: 313920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:41,204-Speed 8886.80 samples/sec Loss 3.3836 LearningRate 0.0004 Epoch: 18 Global Step: 313930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:42,318-Speed 9200.69 samples/sec Loss 3.3595 LearningRate 0.0004 Epoch: 18 Global Step: 313940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:43,459-Speed 8974.53 samples/sec Loss 3.4173 LearningRate 0.0004 Epoch: 18 Global Step: 313950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:44,627-Speed 8772.67 samples/sec Loss 3.4192 LearningRate 0.0004 Epoch: 18 Global Step: 313960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:45,712-Speed 9442.54 samples/sec Loss 3.3360 LearningRate 0.0004 Epoch: 18 Global Step: 313970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:46,832-Speed 9150.24 samples/sec Loss 3.3982 LearningRate 0.0004 Epoch: 18 Global Step: 313980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:47,953-Speed 9144.97 samples/sec Loss 3.4434 LearningRate 0.0004 Epoch: 18 Global Step: 313990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:15:49,108-Speed 8872.83 samples/sec Loss 3.4052 LearningRate 0.0004 Epoch: 18 Global Step: 314000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:16:11,743-[lfw][314000]XNorm: 6.549081 Training: 2022-04-12 00:16:11,744-[lfw][314000]Accuracy-Flip: 0.99700+-0.00287 Training: 2022-04-12 00:16:11,744-[lfw][314000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:16:37,250-[cfp_fp][314000]XNorm: 5.733874 Training: 2022-04-12 00:16:37,251-[cfp_fp][314000]Accuracy-Flip: 0.97514+-0.00793 Training: 2022-04-12 00:16:37,252-[cfp_fp][314000]Accuracy-Highest: 0.97514 Training: 2022-04-12 00:16:59,289-[agedb_30][314000]XNorm: 6.381650 Training: 2022-04-12 00:16:59,290-[agedb_30][314000]Accuracy-Flip: 0.97233+-0.00750 Training: 2022-04-12 00:16:59,291-[agedb_30][314000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:17:00,444-Speed 143.55 samples/sec Loss 3.3967 LearningRate 0.0004 Epoch: 18 Global Step: 314010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:01,564-Speed 9152.79 samples/sec Loss 3.3667 LearningRate 0.0004 Epoch: 18 Global Step: 314020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:02,701-Speed 9011.41 samples/sec Loss 3.4136 LearningRate 0.0004 Epoch: 18 Global Step: 314030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:03,818-Speed 9167.66 samples/sec Loss 3.4172 LearningRate 0.0004 Epoch: 18 Global Step: 314040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:04,972-Speed 8882.46 samples/sec Loss 3.3669 LearningRate 0.0004 Epoch: 18 Global Step: 314050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:06,091-Speed 9155.03 samples/sec Loss 3.3425 LearningRate 0.0004 Epoch: 18 Global Step: 314060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:07,206-Speed 9195.35 samples/sec Loss 3.4544 LearningRate 0.0004 Epoch: 18 Global Step: 314070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:08,346-Speed 8986.89 samples/sec Loss 3.4317 LearningRate 0.0003 Epoch: 18 Global Step: 314080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:09,461-Speed 9182.49 samples/sec Loss 3.3650 LearningRate 0.0003 Epoch: 18 Global Step: 314090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:17:10,589-Speed 9086.08 samples/sec Loss 3.4691 LearningRate 0.0003 Epoch: 18 Global Step: 314100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:17:11,701-Speed 9215.56 samples/sec Loss 3.3406 LearningRate 0.0003 Epoch: 18 Global Step: 314110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:12,817-Speed 9176.15 samples/sec Loss 3.4729 LearningRate 0.0003 Epoch: 18 Global Step: 314120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:13,968-Speed 8904.26 samples/sec Loss 3.3564 LearningRate 0.0003 Epoch: 18 Global Step: 314130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:15,067-Speed 9318.09 samples/sec Loss 3.3675 LearningRate 0.0003 Epoch: 18 Global Step: 314140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:16,165-Speed 9330.55 samples/sec Loss 3.4450 LearningRate 0.0003 Epoch: 18 Global Step: 314150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:17,340-Speed 8729.63 samples/sec Loss 3.4759 LearningRate 0.0003 Epoch: 18 Global Step: 314160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:18,519-Speed 8688.29 samples/sec Loss 3.4023 LearningRate 0.0003 Epoch: 18 Global Step: 314170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:19,709-Speed 8609.88 samples/sec Loss 3.4160 LearningRate 0.0003 Epoch: 18 Global Step: 314180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:20,839-Speed 9059.87 samples/sec Loss 3.4554 LearningRate 0.0003 Epoch: 18 Global Step: 314190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:21,999-Speed 8833.91 samples/sec Loss 3.3380 LearningRate 0.0003 Epoch: 18 Global Step: 314200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:23,121-Speed 9135.42 samples/sec Loss 3.5068 LearningRate 0.0003 Epoch: 18 Global Step: 314210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:24,208-Speed 9431.22 samples/sec Loss 3.4498 LearningRate 0.0003 Epoch: 18 Global Step: 314220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:25,344-Speed 9019.18 samples/sec Loss 3.3903 LearningRate 0.0003 Epoch: 18 Global Step: 314230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:26,502-Speed 8846.32 samples/sec Loss 3.3902 LearningRate 0.0003 Epoch: 18 Global Step: 314240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:27,613-Speed 9220.72 samples/sec Loss 3.3875 LearningRate 0.0003 Epoch: 18 Global Step: 314250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:28,667-Speed 9723.71 samples/sec Loss 3.4251 LearningRate 0.0003 Epoch: 18 Global Step: 314260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:29,763-Speed 9346.66 samples/sec Loss 3.3678 LearningRate 0.0003 Epoch: 18 Global Step: 314270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:30,821-Speed 9687.40 samples/sec Loss 3.4515 LearningRate 0.0003 Epoch: 18 Global Step: 314280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:31,967-Speed 8938.51 samples/sec Loss 3.3263 LearningRate 0.0003 Epoch: 18 Global Step: 314290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:33,139-Speed 8739.21 samples/sec Loss 3.3802 LearningRate 0.0003 Epoch: 18 Global Step: 314300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:34,303-Speed 8805.01 samples/sec Loss 3.3892 LearningRate 0.0003 Epoch: 18 Global Step: 314310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:35,427-Speed 9115.93 samples/sec Loss 3.4583 LearningRate 0.0003 Epoch: 18 Global Step: 314320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:36,550-Speed 9125.98 samples/sec Loss 3.4694 LearningRate 0.0003 Epoch: 18 Global Step: 314330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:37,665-Speed 9191.91 samples/sec Loss 3.4020 LearningRate 0.0003 Epoch: 18 Global Step: 314340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:38,786-Speed 9134.56 samples/sec Loss 3.4422 LearningRate 0.0003 Epoch: 18 Global Step: 314350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:39,888-Speed 9302.62 samples/sec Loss 3.4223 LearningRate 0.0003 Epoch: 18 Global Step: 314360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:41,044-Speed 8860.14 samples/sec Loss 3.4286 LearningRate 0.0003 Epoch: 18 Global Step: 314370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:42,160-Speed 9183.31 samples/sec Loss 3.3631 LearningRate 0.0003 Epoch: 18 Global Step: 314380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:43,288-Speed 9081.63 samples/sec Loss 3.4222 LearningRate 0.0003 Epoch: 18 Global Step: 314390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:44,406-Speed 9168.53 samples/sec Loss 3.4368 LearningRate 0.0003 Epoch: 18 Global Step: 314400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:45,526-Speed 9144.61 samples/sec Loss 3.4614 LearningRate 0.0003 Epoch: 18 Global Step: 314410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:17:46,627-Speed 9307.86 samples/sec Loss 3.3893 LearningRate 0.0003 Epoch: 18 Global Step: 314420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:17:47,725-Speed 9332.57 samples/sec Loss 3.3860 LearningRate 0.0003 Epoch: 18 Global Step: 314430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:48,846-Speed 9140.76 samples/sec Loss 3.3935 LearningRate 0.0003 Epoch: 18 Global Step: 314440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:50,010-Speed 8799.58 samples/sec Loss 3.4519 LearningRate 0.0003 Epoch: 18 Global Step: 314450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:51,135-Speed 9107.23 samples/sec Loss 3.4079 LearningRate 0.0003 Epoch: 18 Global Step: 314460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:52,278-Speed 8961.39 samples/sec Loss 3.3517 LearningRate 0.0003 Epoch: 18 Global Step: 314470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:53,387-Speed 9248.32 samples/sec Loss 3.4445 LearningRate 0.0003 Epoch: 18 Global Step: 314480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:54,498-Speed 9218.14 samples/sec Loss 3.4854 LearningRate 0.0003 Epoch: 18 Global Step: 314490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:55,659-Speed 8827.11 samples/sec Loss 3.3303 LearningRate 0.0003 Epoch: 18 Global Step: 314500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:56,813-Speed 8882.02 samples/sec Loss 3.4618 LearningRate 0.0003 Epoch: 18 Global Step: 314510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:57,931-Speed 9159.84 samples/sec Loss 3.3481 LearningRate 0.0003 Epoch: 18 Global Step: 314520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:17:59,019-Speed 9416.32 samples/sec Loss 3.3843 LearningRate 0.0003 Epoch: 18 Global Step: 314530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:00,116-Speed 9345.44 samples/sec Loss 3.3986 LearningRate 0.0003 Epoch: 18 Global Step: 314540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:01,253-Speed 9010.63 samples/sec Loss 3.4323 LearningRate 0.0003 Epoch: 18 Global Step: 314550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:02,391-Speed 9003.86 samples/sec Loss 3.4162 LearningRate 0.0003 Epoch: 18 Global Step: 314560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:03,507-Speed 9179.91 samples/sec Loss 3.3391 LearningRate 0.0003 Epoch: 18 Global Step: 314570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:04,618-Speed 9223.74 samples/sec Loss 3.4990 LearningRate 0.0003 Epoch: 18 Global Step: 314580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:05,731-Speed 9210.99 samples/sec Loss 3.4409 LearningRate 0.0003 Epoch: 18 Global Step: 314590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:06,803-Speed 9560.37 samples/sec Loss 3.4499 LearningRate 0.0003 Epoch: 18 Global Step: 314600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:18:07,927-Speed 9111.43 samples/sec Loss 3.2832 LearningRate 0.0003 Epoch: 18 Global Step: 314610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:18:09,022-Speed 9355.70 samples/sec Loss 3.3506 LearningRate 0.0003 Epoch: 18 Global Step: 314620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:18:10,113-Speed 9391.10 samples/sec Loss 3.4317 LearningRate 0.0003 Epoch: 18 Global Step: 314630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:18:11,199-Speed 9430.65 samples/sec Loss 3.4564 LearningRate 0.0003 Epoch: 18 Global Step: 314640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:18:12,297-Speed 9330.94 samples/sec Loss 3.3793 LearningRate 0.0003 Epoch: 18 Global Step: 314650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:18:13,422-Speed 9112.56 samples/sec Loss 3.3944 LearningRate 0.0003 Epoch: 18 Global Step: 314660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:18:14,530-Speed 9241.57 samples/sec Loss 3.4348 LearningRate 0.0003 Epoch: 18 Global Step: 314670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:18:15,632-Speed 9296.10 samples/sec Loss 3.4070 LearningRate 0.0003 Epoch: 18 Global Step: 314680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:18:16,728-Speed 9360.82 samples/sec Loss 3.3858 LearningRate 0.0003 Epoch: 18 Global Step: 314690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:18:17,844-Speed 9185.12 samples/sec Loss 3.3746 LearningRate 0.0003 Epoch: 18 Global Step: 314700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:18,955-Speed 9223.14 samples/sec Loss 3.4358 LearningRate 0.0003 Epoch: 18 Global Step: 314710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:20,065-Speed 9226.72 samples/sec Loss 3.3899 LearningRate 0.0003 Epoch: 18 Global Step: 314720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:21,162-Speed 9341.87 samples/sec Loss 3.4226 LearningRate 0.0003 Epoch: 18 Global Step: 314730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:22,282-Speed 9148.08 samples/sec Loss 3.3957 LearningRate 0.0003 Epoch: 18 Global Step: 314740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:23,387-Speed 9281.24 samples/sec Loss 3.4321 LearningRate 0.0003 Epoch: 18 Global Step: 314750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:24,470-Speed 9466.87 samples/sec Loss 3.4321 LearningRate 0.0003 Epoch: 18 Global Step: 314760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:25,582-Speed 9212.52 samples/sec Loss 3.3605 LearningRate 0.0003 Epoch: 18 Global Step: 314770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:26,670-Speed 9413.39 samples/sec Loss 3.4349 LearningRate 0.0003 Epoch: 18 Global Step: 314780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:27,808-Speed 9005.27 samples/sec Loss 3.4292 LearningRate 0.0003 Epoch: 18 Global Step: 314790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:28,914-Speed 9262.97 samples/sec Loss 3.4353 LearningRate 0.0003 Epoch: 18 Global Step: 314800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:30,049-Speed 9023.94 samples/sec Loss 3.4010 LearningRate 0.0003 Epoch: 18 Global Step: 314810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:31,137-Speed 9415.97 samples/sec Loss 3.4539 LearningRate 0.0003 Epoch: 18 Global Step: 314820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:32,251-Speed 9204.31 samples/sec Loss 3.4138 LearningRate 0.0003 Epoch: 18 Global Step: 314830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:33,407-Speed 8860.17 samples/sec Loss 3.4599 LearningRate 0.0003 Epoch: 18 Global Step: 314840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:34,501-Speed 9365.78 samples/sec Loss 3.3512 LearningRate 0.0003 Epoch: 18 Global Step: 314850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:35,619-Speed 9170.14 samples/sec Loss 3.4225 LearningRate 0.0003 Epoch: 18 Global Step: 314860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:36,742-Speed 9120.66 samples/sec Loss 3.4343 LearningRate 0.0003 Epoch: 18 Global Step: 314870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:37,857-Speed 9185.96 samples/sec Loss 3.4600 LearningRate 0.0003 Epoch: 18 Global Step: 314880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:39,031-Speed 8729.40 samples/sec Loss 3.4211 LearningRate 0.0003 Epoch: 18 Global Step: 314890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:40,132-Speed 9308.13 samples/sec Loss 3.3641 LearningRate 0.0003 Epoch: 18 Global Step: 314900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:41,248-Speed 9194.41 samples/sec Loss 3.3875 LearningRate 0.0003 Epoch: 18 Global Step: 314910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:42,332-Speed 9448.75 samples/sec Loss 3.3544 LearningRate 0.0003 Epoch: 18 Global Step: 314920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:43,446-Speed 9199.99 samples/sec Loss 3.3683 LearningRate 0.0003 Epoch: 18 Global Step: 314930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:44,595-Speed 8914.93 samples/sec Loss 3.5218 LearningRate 0.0003 Epoch: 18 Global Step: 314940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:45,708-Speed 9203.97 samples/sec Loss 3.5052 LearningRate 0.0003 Epoch: 18 Global Step: 314950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:46,838-Speed 9074.76 samples/sec Loss 3.3032 LearningRate 0.0003 Epoch: 18 Global Step: 314960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:47,979-Speed 8979.79 samples/sec Loss 3.4989 LearningRate 0.0003 Epoch: 18 Global Step: 314970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:49,098-Speed 9156.08 samples/sec Loss 3.4714 LearningRate 0.0003 Epoch: 18 Global Step: 314980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:50,227-Speed 9071.41 samples/sec Loss 3.3587 LearningRate 0.0003 Epoch: 18 Global Step: 314990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:51,364-Speed 9007.60 samples/sec Loss 3.4186 LearningRate 0.0003 Epoch: 18 Global Step: 315000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:18:52,485-Speed 9142.39 samples/sec Loss 3.4103 LearningRate 0.0003 Epoch: 18 Global Step: 315010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:53,615-Speed 9066.30 samples/sec Loss 3.4039 LearningRate 0.0003 Epoch: 18 Global Step: 315020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:54,766-Speed 8906.68 samples/sec Loss 3.3708 LearningRate 0.0003 Epoch: 18 Global Step: 315030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:55,913-Speed 8928.00 samples/sec Loss 3.4315 LearningRate 0.0003 Epoch: 18 Global Step: 315040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:57,031-Speed 9164.50 samples/sec Loss 3.4109 LearningRate 0.0003 Epoch: 18 Global Step: 315050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:58,166-Speed 9030.11 samples/sec Loss 3.3159 LearningRate 0.0003 Epoch: 18 Global Step: 315060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:18:59,279-Speed 9210.60 samples/sec Loss 3.4196 LearningRate 0.0003 Epoch: 18 Global Step: 315070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:00,465-Speed 8639.71 samples/sec Loss 3.3907 LearningRate 0.0003 Epoch: 18 Global Step: 315080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:01,545-Speed 9482.19 samples/sec Loss 3.3593 LearningRate 0.0003 Epoch: 18 Global Step: 315090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:02,705-Speed 8832.19 samples/sec Loss 3.4104 LearningRate 0.0003 Epoch: 18 Global Step: 315100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:03,818-Speed 9204.19 samples/sec Loss 3.4747 LearningRate 0.0003 Epoch: 18 Global Step: 315110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:04,927-Speed 9241.78 samples/sec Loss 3.4361 LearningRate 0.0003 Epoch: 18 Global Step: 315120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:06,053-Speed 9100.46 samples/sec Loss 3.3859 LearningRate 0.0003 Epoch: 18 Global Step: 315130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:07,200-Speed 8933.87 samples/sec Loss 3.3890 LearningRate 0.0003 Epoch: 18 Global Step: 315140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:08,358-Speed 8852.11 samples/sec Loss 3.4035 LearningRate 0.0003 Epoch: 18 Global Step: 315150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:09,539-Speed 8676.47 samples/sec Loss 3.3427 LearningRate 0.0003 Epoch: 18 Global Step: 315160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:10,681-Speed 8966.72 samples/sec Loss 3.4275 LearningRate 0.0003 Epoch: 18 Global Step: 315170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:11,841-Speed 8834.40 samples/sec Loss 3.4541 LearningRate 0.0003 Epoch: 18 Global Step: 315180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:12,969-Speed 9081.86 samples/sec Loss 3.4423 LearningRate 0.0003 Epoch: 18 Global Step: 315190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:14,110-Speed 8982.49 samples/sec Loss 3.4745 LearningRate 0.0003 Epoch: 18 Global Step: 315200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:15,284-Speed 8722.67 samples/sec Loss 3.3840 LearningRate 0.0003 Epoch: 18 Global Step: 315210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:16,419-Speed 9031.82 samples/sec Loss 3.3840 LearningRate 0.0003 Epoch: 18 Global Step: 315220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:17,574-Speed 8873.43 samples/sec Loss 3.4259 LearningRate 0.0003 Epoch: 18 Global Step: 315230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:18,690-Speed 9177.22 samples/sec Loss 3.4397 LearningRate 0.0003 Epoch: 18 Global Step: 315240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:19,863-Speed 8741.16 samples/sec Loss 3.4750 LearningRate 0.0003 Epoch: 18 Global Step: 315250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:20,974-Speed 9216.84 samples/sec Loss 3.3739 LearningRate 0.0003 Epoch: 18 Global Step: 315260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:22,126-Speed 8896.84 samples/sec Loss 3.3663 LearningRate 0.0003 Epoch: 18 Global Step: 315270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:23,277-Speed 8896.93 samples/sec Loss 3.4001 LearningRate 0.0003 Epoch: 18 Global Step: 315280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:24,370-Speed 9376.11 samples/sec Loss 3.4724 LearningRate 0.0003 Epoch: 18 Global Step: 315290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:25,459-Speed 9410.85 samples/sec Loss 3.4251 LearningRate 0.0003 Epoch: 18 Global Step: 315300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:26,617-Speed 8850.73 samples/sec Loss 3.4300 LearningRate 0.0003 Epoch: 18 Global Step: 315310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:27,743-Speed 9094.78 samples/sec Loss 3.4249 LearningRate 0.0003 Epoch: 18 Global Step: 315320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:28,855-Speed 9218.25 samples/sec Loss 3.3185 LearningRate 0.0003 Epoch: 18 Global Step: 315330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:30,052-Speed 8555.75 samples/sec Loss 3.3468 LearningRate 0.0003 Epoch: 18 Global Step: 315340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:31,175-Speed 9124.49 samples/sec Loss 3.4783 LearningRate 0.0003 Epoch: 18 Global Step: 315350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:32,296-Speed 9137.00 samples/sec Loss 3.4040 LearningRate 0.0003 Epoch: 18 Global Step: 315360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:33,400-Speed 9278.84 samples/sec Loss 3.4393 LearningRate 0.0003 Epoch: 18 Global Step: 315370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:34,629-Speed 8342.88 samples/sec Loss 3.4058 LearningRate 0.0003 Epoch: 18 Global Step: 315380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:35,748-Speed 9155.62 samples/sec Loss 3.3074 LearningRate 0.0003 Epoch: 18 Global Step: 315390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:36,864-Speed 9187.26 samples/sec Loss 3.3446 LearningRate 0.0003 Epoch: 18 Global Step: 315400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:38,015-Speed 8897.00 samples/sec Loss 3.4829 LearningRate 0.0003 Epoch: 18 Global Step: 315410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:39,171-Speed 8867.00 samples/sec Loss 3.3293 LearningRate 0.0003 Epoch: 18 Global Step: 315420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:40,244-Speed 9547.06 samples/sec Loss 3.3828 LearningRate 0.0003 Epoch: 18 Global Step: 315430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:41,347-Speed 9287.49 samples/sec Loss 3.4108 LearningRate 0.0003 Epoch: 18 Global Step: 315440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:42,424-Speed 9512.19 samples/sec Loss 3.4790 LearningRate 0.0003 Epoch: 18 Global Step: 315450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:43,561-Speed 9009.13 samples/sec Loss 3.4174 LearningRate 0.0003 Epoch: 18 Global Step: 315460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:44,700-Speed 8999.53 samples/sec Loss 3.4971 LearningRate 0.0003 Epoch: 18 Global Step: 315470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:19:45,803-Speed 9285.72 samples/sec Loss 3.3621 LearningRate 0.0003 Epoch: 18 Global Step: 315480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:46,924-Speed 9140.21 samples/sec Loss 3.3930 LearningRate 0.0003 Epoch: 18 Global Step: 315490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:48,074-Speed 8907.88 samples/sec Loss 3.3825 LearningRate 0.0003 Epoch: 18 Global Step: 315500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:49,202-Speed 9088.73 samples/sec Loss 3.3800 LearningRate 0.0003 Epoch: 18 Global Step: 315510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:50,319-Speed 9173.07 samples/sec Loss 3.4279 LearningRate 0.0003 Epoch: 18 Global Step: 315520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:51,424-Speed 9269.70 samples/sec Loss 3.4221 LearningRate 0.0003 Epoch: 18 Global Step: 315530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:52,559-Speed 9024.71 samples/sec Loss 3.4516 LearningRate 0.0003 Epoch: 18 Global Step: 315540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:53,681-Speed 9133.76 samples/sec Loss 3.3850 LearningRate 0.0003 Epoch: 18 Global Step: 315550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:54,802-Speed 9144.28 samples/sec Loss 3.4755 LearningRate 0.0003 Epoch: 18 Global Step: 315560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:55,870-Speed 9595.24 samples/sec Loss 3.4369 LearningRate 0.0003 Epoch: 18 Global Step: 315570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:57,000-Speed 9066.37 samples/sec Loss 3.3477 LearningRate 0.0003 Epoch: 18 Global Step: 315580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:19:58,119-Speed 9152.43 samples/sec Loss 3.3894 LearningRate 0.0003 Epoch: 18 Global Step: 315590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:19:59,302-Speed 8664.08 samples/sec Loss 3.3941 LearningRate 0.0003 Epoch: 18 Global Step: 315600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:00,402-Speed 9306.94 samples/sec Loss 3.3459 LearningRate 0.0003 Epoch: 18 Global Step: 315610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:01,516-Speed 9199.60 samples/sec Loss 3.4113 LearningRate 0.0003 Epoch: 18 Global Step: 315620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:02,657-Speed 8980.00 samples/sec Loss 3.4680 LearningRate 0.0003 Epoch: 18 Global Step: 315630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:03,819-Speed 8815.19 samples/sec Loss 3.4132 LearningRate 0.0003 Epoch: 18 Global Step: 315640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:04,924-Speed 9270.03 samples/sec Loss 3.4450 LearningRate 0.0003 Epoch: 18 Global Step: 315650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:06,018-Speed 9377.16 samples/sec Loss 3.4784 LearningRate 0.0003 Epoch: 18 Global Step: 315660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:07,128-Speed 9226.94 samples/sec Loss 3.4935 LearningRate 0.0003 Epoch: 18 Global Step: 315670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:08,259-Speed 9061.45 samples/sec Loss 3.3809 LearningRate 0.0003 Epoch: 18 Global Step: 315680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:09,372-Speed 9200.83 samples/sec Loss 3.3982 LearningRate 0.0003 Epoch: 18 Global Step: 315690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:10,548-Speed 8711.42 samples/sec Loss 3.4191 LearningRate 0.0003 Epoch: 18 Global Step: 315700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:11,693-Speed 8953.35 samples/sec Loss 3.3708 LearningRate 0.0003 Epoch: 18 Global Step: 315710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:12,808-Speed 9186.04 samples/sec Loss 3.3950 LearningRate 0.0003 Epoch: 18 Global Step: 315720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:13,938-Speed 9072.69 samples/sec Loss 3.3730 LearningRate 0.0003 Epoch: 18 Global Step: 315730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:15,043-Speed 9266.70 samples/sec Loss 3.4344 LearningRate 0.0003 Epoch: 18 Global Step: 315740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:16,167-Speed 9120.46 samples/sec Loss 3.3973 LearningRate 0.0003 Epoch: 18 Global Step: 315750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:17,331-Speed 8800.73 samples/sec Loss 3.4373 LearningRate 0.0003 Epoch: 18 Global Step: 315760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:18,433-Speed 9294.57 samples/sec Loss 3.4494 LearningRate 0.0003 Epoch: 18 Global Step: 315770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:19,540-Speed 9261.45 samples/sec Loss 3.3873 LearningRate 0.0003 Epoch: 18 Global Step: 315780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:20,655-Speed 9189.70 samples/sec Loss 3.4606 LearningRate 0.0003 Epoch: 18 Global Step: 315790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:20:21,810-Speed 8873.98 samples/sec Loss 3.4487 LearningRate 0.0003 Epoch: 18 Global Step: 315800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:20:22,950-Speed 8982.34 samples/sec Loss 3.4444 LearningRate 0.0003 Epoch: 18 Global Step: 315810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:20:24,065-Speed 9194.05 samples/sec Loss 3.4324 LearningRate 0.0003 Epoch: 18 Global Step: 315820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:20:25,163-Speed 9424.41 samples/sec Loss 3.3874 LearningRate 0.0003 Epoch: 18 Global Step: 315830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:20:26,279-Speed 9181.62 samples/sec Loss 3.4633 LearningRate 0.0003 Epoch: 18 Global Step: 315840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:20:27,359-Speed 9486.89 samples/sec Loss 3.3306 LearningRate 0.0003 Epoch: 18 Global Step: 315850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:20:28,450-Speed 9384.49 samples/sec Loss 3.3840 LearningRate 0.0003 Epoch: 18 Global Step: 315860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:20:29,624-Speed 8727.93 samples/sec Loss 3.3244 LearningRate 0.0003 Epoch: 18 Global Step: 315870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:20:30,772-Speed 8926.32 samples/sec Loss 3.3490 LearningRate 0.0003 Epoch: 18 Global Step: 315880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:20:31,909-Speed 9015.55 samples/sec Loss 3.4357 LearningRate 0.0003 Epoch: 18 Global Step: 315890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:33,052-Speed 8965.18 samples/sec Loss 3.3989 LearningRate 0.0003 Epoch: 18 Global Step: 315900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:34,145-Speed 9371.98 samples/sec Loss 3.3920 LearningRate 0.0003 Epoch: 18 Global Step: 315910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:35,254-Speed 9239.28 samples/sec Loss 3.3825 LearningRate 0.0003 Epoch: 18 Global Step: 315920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:36,371-Speed 9173.06 samples/sec Loss 3.3556 LearningRate 0.0003 Epoch: 18 Global Step: 315930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:37,494-Speed 9124.67 samples/sec Loss 3.4435 LearningRate 0.0003 Epoch: 18 Global Step: 315940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:38,643-Speed 8916.60 samples/sec Loss 3.4309 LearningRate 0.0003 Epoch: 18 Global Step: 315950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:39,774-Speed 9060.65 samples/sec Loss 3.4561 LearningRate 0.0003 Epoch: 18 Global Step: 315960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:40,869-Speed 9352.45 samples/sec Loss 3.4152 LearningRate 0.0003 Epoch: 18 Global Step: 315970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:41,955-Speed 9441.69 samples/sec Loss 3.4492 LearningRate 0.0003 Epoch: 18 Global Step: 315980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:43,061-Speed 9256.42 samples/sec Loss 3.3403 LearningRate 0.0003 Epoch: 18 Global Step: 315990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:20:44,211-Speed 8916.44 samples/sec Loss 3.4354 LearningRate 0.0003 Epoch: 18 Global Step: 316000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:21:06,002-[lfw][316000]XNorm: 6.569137 Training: 2022-04-12 00:21:06,003-[lfw][316000]Accuracy-Flip: 0.99650+-0.00283 Training: 2022-04-12 00:21:06,003-[lfw][316000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:21:31,225-[cfp_fp][316000]XNorm: 5.736773 Training: 2022-04-12 00:21:31,226-[cfp_fp][316000]Accuracy-Flip: 0.97300+-0.00898 Training: 2022-04-12 00:21:31,226-[cfp_fp][316000]Accuracy-Highest: 0.97514 Training: 2022-04-12 00:21:52,961-[agedb_30][316000]XNorm: 6.391359 Training: 2022-04-12 00:21:52,961-[agedb_30][316000]Accuracy-Flip: 0.97333+-0.00756 Training: 2022-04-12 00:21:52,962-[agedb_30][316000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:21:54,088-Speed 146.54 samples/sec Loss 3.4275 LearningRate 0.0003 Epoch: 18 Global Step: 316010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:21:55,239-Speed 8903.48 samples/sec Loss 3.3612 LearningRate 0.0003 Epoch: 18 Global Step: 316020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:21:56,365-Speed 9101.75 samples/sec Loss 3.3582 LearningRate 0.0003 Epoch: 18 Global Step: 316030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:21:57,470-Speed 9269.61 samples/sec Loss 3.3458 LearningRate 0.0003 Epoch: 18 Global Step: 316040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:21:58,551-Speed 9477.77 samples/sec Loss 3.3445 LearningRate 0.0003 Epoch: 18 Global Step: 316050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:21:59,611-Speed 9668.78 samples/sec Loss 3.5072 LearningRate 0.0003 Epoch: 18 Global Step: 316060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:00,721-Speed 9230.71 samples/sec Loss 3.4644 LearningRate 0.0003 Epoch: 18 Global Step: 316070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:01,845-Speed 9119.17 samples/sec Loss 3.3726 LearningRate 0.0003 Epoch: 18 Global Step: 316080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:02,924-Speed 9493.98 samples/sec Loss 3.4156 LearningRate 0.0003 Epoch: 18 Global Step: 316090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:22:04,037-Speed 9204.45 samples/sec Loss 3.4640 LearningRate 0.0003 Epoch: 18 Global Step: 316100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:05,160-Speed 9126.58 samples/sec Loss 3.4257 LearningRate 0.0003 Epoch: 18 Global Step: 316110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:06,305-Speed 8945.71 samples/sec Loss 3.3787 LearningRate 0.0003 Epoch: 18 Global Step: 316120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:07,481-Speed 8714.66 samples/sec Loss 3.3499 LearningRate 0.0003 Epoch: 18 Global Step: 316130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:08,663-Speed 8664.86 samples/sec Loss 3.4128 LearningRate 0.0003 Epoch: 18 Global Step: 316140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:09,827-Speed 8798.81 samples/sec Loss 3.3769 LearningRate 0.0003 Epoch: 18 Global Step: 316150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:10,951-Speed 9118.06 samples/sec Loss 3.4589 LearningRate 0.0003 Epoch: 18 Global Step: 316160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:12,044-Speed 9375.78 samples/sec Loss 3.4166 LearningRate 0.0003 Epoch: 18 Global Step: 316170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:13,133-Speed 9409.13 samples/sec Loss 3.5365 LearningRate 0.0003 Epoch: 18 Global Step: 316180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:14,207-Speed 9539.15 samples/sec Loss 3.3975 LearningRate 0.0003 Epoch: 18 Global Step: 316190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:15,302-Speed 9358.90 samples/sec Loss 3.3939 LearningRate 0.0003 Epoch: 18 Global Step: 316200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:22:16,468-Speed 8786.43 samples/sec Loss 3.4249 LearningRate 0.0003 Epoch: 18 Global Step: 316210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:22:17,567-Speed 9321.49 samples/sec Loss 3.3872 LearningRate 0.0003 Epoch: 18 Global Step: 316220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:18,759-Speed 8600.39 samples/sec Loss 3.2867 LearningRate 0.0003 Epoch: 18 Global Step: 316230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:19,932-Speed 8731.85 samples/sec Loss 3.3810 LearningRate 0.0003 Epoch: 18 Global Step: 316240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:21,007-Speed 9537.55 samples/sec Loss 3.3819 LearningRate 0.0003 Epoch: 18 Global Step: 316250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:22,164-Speed 8854.57 samples/sec Loss 3.3776 LearningRate 0.0003 Epoch: 18 Global Step: 316260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:23,282-Speed 9159.29 samples/sec Loss 3.3888 LearningRate 0.0003 Epoch: 18 Global Step: 316270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:24,423-Speed 8985.30 samples/sec Loss 3.4223 LearningRate 0.0003 Epoch: 18 Global Step: 316280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:25,563-Speed 8982.15 samples/sec Loss 3.4182 LearningRate 0.0003 Epoch: 18 Global Step: 316290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:26,740-Speed 8704.54 samples/sec Loss 3.3031 LearningRate 0.0003 Epoch: 18 Global Step: 316300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:27,907-Speed 8780.76 samples/sec Loss 3.4302 LearningRate 0.0003 Epoch: 18 Global Step: 316310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:28,997-Speed 9398.91 samples/sec Loss 3.3285 LearningRate 0.0003 Epoch: 18 Global Step: 316320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:30,139-Speed 8969.99 samples/sec Loss 3.4373 LearningRate 0.0003 Epoch: 18 Global Step: 316330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:31,290-Speed 8905.93 samples/sec Loss 3.3158 LearningRate 0.0003 Epoch: 18 Global Step: 316340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:32,438-Speed 8919.86 samples/sec Loss 3.4013 LearningRate 0.0003 Epoch: 18 Global Step: 316350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:33,624-Speed 8643.08 samples/sec Loss 3.4145 LearningRate 0.0003 Epoch: 18 Global Step: 316360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:34,760-Speed 9015.90 samples/sec Loss 3.3525 LearningRate 0.0003 Epoch: 18 Global Step: 316370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:35,900-Speed 8991.35 samples/sec Loss 3.3894 LearningRate 0.0003 Epoch: 18 Global Step: 316380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:37,013-Speed 9213.61 samples/sec Loss 3.4016 LearningRate 0.0003 Epoch: 18 Global Step: 316390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:38,091-Speed 9498.49 samples/sec Loss 3.4598 LearningRate 0.0003 Epoch: 18 Global Step: 316400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:39,226-Speed 9027.47 samples/sec Loss 3.4000 LearningRate 0.0003 Epoch: 18 Global Step: 316410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:40,361-Speed 9029.97 samples/sec Loss 3.4159 LearningRate 0.0003 Epoch: 18 Global Step: 316420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:41,526-Speed 8796.56 samples/sec Loss 3.3774 LearningRate 0.0003 Epoch: 18 Global Step: 316430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:42,655-Speed 9068.62 samples/sec Loss 3.4730 LearningRate 0.0003 Epoch: 18 Global Step: 316440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:43,774-Speed 9161.71 samples/sec Loss 3.4214 LearningRate 0.0003 Epoch: 18 Global Step: 316450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:44,943-Speed 8764.01 samples/sec Loss 3.3856 LearningRate 0.0003 Epoch: 18 Global Step: 316460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:46,072-Speed 9070.01 samples/sec Loss 3.4339 LearningRate 0.0003 Epoch: 18 Global Step: 316470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:47,228-Speed 8864.70 samples/sec Loss 3.4650 LearningRate 0.0003 Epoch: 18 Global Step: 316480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:48,349-Speed 9143.41 samples/sec Loss 3.4631 LearningRate 0.0003 Epoch: 18 Global Step: 316490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:49,449-Speed 9310.93 samples/sec Loss 3.4303 LearningRate 0.0003 Epoch: 18 Global Step: 316500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:50,548-Speed 9322.05 samples/sec Loss 3.4426 LearningRate 0.0003 Epoch: 18 Global Step: 316510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:51,720-Speed 8744.86 samples/sec Loss 3.3744 LearningRate 0.0003 Epoch: 18 Global Step: 316520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:22:52,865-Speed 8948.26 samples/sec Loss 3.3992 LearningRate 0.0003 Epoch: 18 Global Step: 316530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:54,001-Speed 9019.84 samples/sec Loss 3.3806 LearningRate 0.0003 Epoch: 18 Global Step: 316540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:55,112-Speed 9225.68 samples/sec Loss 3.4421 LearningRate 0.0003 Epoch: 18 Global Step: 316550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:56,241-Speed 9072.11 samples/sec Loss 3.3433 LearningRate 0.0003 Epoch: 18 Global Step: 316560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:57,333-Speed 9382.93 samples/sec Loss 3.4300 LearningRate 0.0003 Epoch: 18 Global Step: 316570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:58,435-Speed 9300.43 samples/sec Loss 3.3842 LearningRate 0.0003 Epoch: 18 Global Step: 316580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:22:59,538-Speed 9286.41 samples/sec Loss 3.4616 LearningRate 0.0003 Epoch: 18 Global Step: 316590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:00,660-Speed 9134.33 samples/sec Loss 3.3543 LearningRate 0.0003 Epoch: 18 Global Step: 316600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:01,804-Speed 8954.39 samples/sec Loss 3.4538 LearningRate 0.0003 Epoch: 18 Global Step: 316610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:02,925-Speed 9137.26 samples/sec Loss 3.3887 LearningRate 0.0003 Epoch: 18 Global Step: 316620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:04,082-Speed 8855.14 samples/sec Loss 3.4913 LearningRate 0.0003 Epoch: 18 Global Step: 316630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:05,215-Speed 9047.07 samples/sec Loss 3.4382 LearningRate 0.0003 Epoch: 18 Global Step: 316640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:06,368-Speed 8887.72 samples/sec Loss 3.4909 LearningRate 0.0003 Epoch: 18 Global Step: 316650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:07,529-Speed 8826.60 samples/sec Loss 3.3254 LearningRate 0.0003 Epoch: 18 Global Step: 316660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:08,657-Speed 9083.12 samples/sec Loss 3.3575 LearningRate 0.0003 Epoch: 18 Global Step: 316670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:09,824-Speed 8781.62 samples/sec Loss 3.3907 LearningRate 0.0003 Epoch: 18 Global Step: 316680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:10,939-Speed 9184.94 samples/sec Loss 3.3812 LearningRate 0.0003 Epoch: 18 Global Step: 316690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:12,085-Speed 8942.05 samples/sec Loss 3.4375 LearningRate 0.0003 Epoch: 18 Global Step: 316700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:13,243-Speed 8848.24 samples/sec Loss 3.3453 LearningRate 0.0003 Epoch: 18 Global Step: 316710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:14,383-Speed 8995.86 samples/sec Loss 3.3563 LearningRate 0.0003 Epoch: 18 Global Step: 316720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:15,508-Speed 9103.04 samples/sec Loss 3.4274 LearningRate 0.0003 Epoch: 18 Global Step: 316730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:23:16,617-Speed 9236.69 samples/sec Loss 3.4102 LearningRate 0.0003 Epoch: 18 Global Step: 316740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:17,728-Speed 9231.88 samples/sec Loss 3.3855 LearningRate 0.0003 Epoch: 18 Global Step: 316750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:18,873-Speed 8944.17 samples/sec Loss 3.3999 LearningRate 0.0003 Epoch: 18 Global Step: 316760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:19,974-Speed 9311.54 samples/sec Loss 3.3935 LearningRate 0.0003 Epoch: 18 Global Step: 316770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:21,097-Speed 9122.76 samples/sec Loss 3.3968 LearningRate 0.0003 Epoch: 18 Global Step: 316780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:22,258-Speed 8825.11 samples/sec Loss 3.3198 LearningRate 0.0003 Epoch: 18 Global Step: 316790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:23,415-Speed 8855.52 samples/sec Loss 3.4892 LearningRate 0.0003 Epoch: 18 Global Step: 316800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:24,550-Speed 9024.76 samples/sec Loss 3.3436 LearningRate 0.0003 Epoch: 18 Global Step: 316810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:25,719-Speed 8767.16 samples/sec Loss 3.4188 LearningRate 0.0003 Epoch: 18 Global Step: 316820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:26,816-Speed 9342.32 samples/sec Loss 3.3812 LearningRate 0.0003 Epoch: 18 Global Step: 316830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:27,927-Speed 9220.75 samples/sec Loss 3.3727 LearningRate 0.0003 Epoch: 18 Global Step: 316840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:29,030-Speed 9293.27 samples/sec Loss 3.3707 LearningRate 0.0003 Epoch: 18 Global Step: 316850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:30,194-Speed 8801.88 samples/sec Loss 3.4525 LearningRate 0.0003 Epoch: 18 Global Step: 316860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:31,325-Speed 9058.63 samples/sec Loss 3.3791 LearningRate 0.0003 Epoch: 18 Global Step: 316870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:32,418-Speed 9375.30 samples/sec Loss 3.4597 LearningRate 0.0003 Epoch: 18 Global Step: 316880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:33,553-Speed 9029.00 samples/sec Loss 3.4692 LearningRate 0.0003 Epoch: 18 Global Step: 316890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:34,738-Speed 8648.15 samples/sec Loss 3.4206 LearningRate 0.0003 Epoch: 18 Global Step: 316900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:35,900-Speed 8814.04 samples/sec Loss 3.5052 LearningRate 0.0003 Epoch: 18 Global Step: 316910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:37,034-Speed 9035.74 samples/sec Loss 3.3527 LearningRate 0.0003 Epoch: 18 Global Step: 316920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:38,162-Speed 9086.75 samples/sec Loss 3.4412 LearningRate 0.0003 Epoch: 18 Global Step: 316930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:39,288-Speed 9095.38 samples/sec Loss 3.3971 LearningRate 0.0003 Epoch: 18 Global Step: 316940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:40,443-Speed 8872.91 samples/sec Loss 3.3806 LearningRate 0.0003 Epoch: 18 Global Step: 316950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:41,596-Speed 8892.46 samples/sec Loss 3.4796 LearningRate 0.0003 Epoch: 18 Global Step: 316960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:42,721-Speed 9102.26 samples/sec Loss 3.2824 LearningRate 0.0003 Epoch: 18 Global Step: 316970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:43,850-Speed 9080.14 samples/sec Loss 3.3668 LearningRate 0.0003 Epoch: 18 Global Step: 316980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:44,935-Speed 9438.14 samples/sec Loss 3.3686 LearningRate 0.0003 Epoch: 18 Global Step: 316990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:46,033-Speed 9328.93 samples/sec Loss 3.4462 LearningRate 0.0003 Epoch: 18 Global Step: 317000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:47,158-Speed 9112.86 samples/sec Loss 3.4151 LearningRate 0.0003 Epoch: 18 Global Step: 317010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:48,245-Speed 9424.42 samples/sec Loss 3.4155 LearningRate 0.0003 Epoch: 18 Global Step: 317020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:49,360-Speed 9182.76 samples/sec Loss 3.3248 LearningRate 0.0003 Epoch: 18 Global Step: 317030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:50,506-Speed 8944.03 samples/sec Loss 3.4052 LearningRate 0.0003 Epoch: 18 Global Step: 317040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:23:51,647-Speed 8986.51 samples/sec Loss 3.5034 LearningRate 0.0003 Epoch: 18 Global Step: 317050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:52,745-Speed 9332.10 samples/sec Loss 3.4058 LearningRate 0.0003 Epoch: 18 Global Step: 317060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:53,881-Speed 9018.48 samples/sec Loss 3.4223 LearningRate 0.0003 Epoch: 18 Global Step: 317070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:54,967-Speed 9435.90 samples/sec Loss 3.4754 LearningRate 0.0003 Epoch: 18 Global Step: 317080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:56,086-Speed 9155.16 samples/sec Loss 3.4039 LearningRate 0.0003 Epoch: 18 Global Step: 317090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:57,247-Speed 8822.76 samples/sec Loss 3.3941 LearningRate 0.0003 Epoch: 18 Global Step: 317100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:58,370-Speed 9122.56 samples/sec Loss 3.4047 LearningRate 0.0003 Epoch: 18 Global Step: 317110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:23:59,719-Speed 7596.82 samples/sec Loss 3.3648 LearningRate 0.0003 Epoch: 18 Global Step: 317120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:24:38,342-Speed 265.14 samples/sec Loss 3.3116 LearningRate 0.0002 Epoch: 19 Global Step: 317130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:24:40,360-Speed 5076.44 samples/sec Loss 3.3183 LearningRate 0.0002 Epoch: 19 Global Step: 317140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:24:41,735-Speed 7452.80 samples/sec Loss 3.3292 LearningRate 0.0002 Epoch: 19 Global Step: 317150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:24:43,120-Speed 7396.13 samples/sec Loss 3.3251 LearningRate 0.0002 Epoch: 19 Global Step: 317160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:24:44,439-Speed 7765.06 samples/sec Loss 3.2841 LearningRate 0.0002 Epoch: 19 Global Step: 317170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:24:45,555-Speed 9189.87 samples/sec Loss 3.3143 LearningRate 0.0002 Epoch: 19 Global Step: 317180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:24:47,042-Speed 6888.03 samples/sec Loss 3.2816 LearningRate 0.0002 Epoch: 19 Global Step: 317190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:24:48,148-Speed 9261.63 samples/sec Loss 3.2591 LearningRate 0.0002 Epoch: 19 Global Step: 317200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:24:49,316-Speed 8771.88 samples/sec Loss 3.2764 LearningRate 0.0002 Epoch: 19 Global Step: 317210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:24:50,403-Speed 9426.13 samples/sec Loss 3.3166 LearningRate 0.0002 Epoch: 19 Global Step: 317220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:24:51,504-Speed 9309.41 samples/sec Loss 3.2450 LearningRate 0.0002 Epoch: 19 Global Step: 317230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:24:52,829-Speed 7729.82 samples/sec Loss 3.2508 LearningRate 0.0002 Epoch: 19 Global Step: 317240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:24:53,987-Speed 8845.26 samples/sec Loss 3.3051 LearningRate 0.0002 Epoch: 19 Global Step: 317250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:24:55,152-Speed 8795.50 samples/sec Loss 3.2975 LearningRate 0.0002 Epoch: 19 Global Step: 317260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:24:56,843-Speed 6061.27 samples/sec Loss 3.2642 LearningRate 0.0002 Epoch: 19 Global Step: 317270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:24:57,917-Speed 9541.40 samples/sec Loss 3.3307 LearningRate 0.0002 Epoch: 19 Global Step: 317280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:24:59,201-Speed 7979.40 samples/sec Loss 3.3193 LearningRate 0.0002 Epoch: 19 Global Step: 317290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:00,481-Speed 8004.16 samples/sec Loss 3.3147 LearningRate 0.0002 Epoch: 19 Global Step: 317300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:01,595-Speed 9196.74 samples/sec Loss 3.2980 LearningRate 0.0002 Epoch: 19 Global Step: 317310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:02,894-Speed 7884.48 samples/sec Loss 3.2303 LearningRate 0.0002 Epoch: 19 Global Step: 317320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:04,056-Speed 8816.05 samples/sec Loss 3.3658 LearningRate 0.0002 Epoch: 19 Global Step: 317330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:05,166-Speed 9235.37 samples/sec Loss 3.2934 LearningRate 0.0002 Epoch: 19 Global Step: 317340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:06,258-Speed 9377.82 samples/sec Loss 3.2920 LearningRate 0.0002 Epoch: 19 Global Step: 317350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:07,387-Speed 9077.27 samples/sec Loss 3.2542 LearningRate 0.0002 Epoch: 19 Global Step: 317360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:08,535-Speed 8928.13 samples/sec Loss 3.2752 LearningRate 0.0002 Epoch: 19 Global Step: 317370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:09,670-Speed 9019.59 samples/sec Loss 3.2056 LearningRate 0.0002 Epoch: 19 Global Step: 317380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:10,794-Speed 9120.87 samples/sec Loss 3.2339 LearningRate 0.0002 Epoch: 19 Global Step: 317390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:11,906-Speed 9208.76 samples/sec Loss 3.3333 LearningRate 0.0002 Epoch: 19 Global Step: 317400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:13,008-Speed 9304.91 samples/sec Loss 3.2998 LearningRate 0.0002 Epoch: 19 Global Step: 317410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:14,106-Speed 9327.96 samples/sec Loss 3.3003 LearningRate 0.0002 Epoch: 19 Global Step: 317420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:15,347-Speed 8256.94 samples/sec Loss 3.2246 LearningRate 0.0002 Epoch: 19 Global Step: 317430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:16,522-Speed 8719.76 samples/sec Loss 3.3192 LearningRate 0.0002 Epoch: 19 Global Step: 317440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:17,631-Speed 9238.87 samples/sec Loss 3.3138 LearningRate 0.0002 Epoch: 19 Global Step: 317450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:18,721-Speed 9402.06 samples/sec Loss 3.4065 LearningRate 0.0002 Epoch: 19 Global Step: 317460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:19,841-Speed 9149.20 samples/sec Loss 3.2989 LearningRate 0.0002 Epoch: 19 Global Step: 317470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:20,955-Speed 9201.33 samples/sec Loss 3.2828 LearningRate 0.0002 Epoch: 19 Global Step: 317480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:22,073-Speed 9166.70 samples/sec Loss 3.1985 LearningRate 0.0002 Epoch: 19 Global Step: 317490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:23,196-Speed 9118.92 samples/sec Loss 3.2544 LearningRate 0.0002 Epoch: 19 Global Step: 317500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:24,317-Speed 9137.64 samples/sec Loss 3.3179 LearningRate 0.0002 Epoch: 19 Global Step: 317510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:25,453-Speed 9018.50 samples/sec Loss 3.3023 LearningRate 0.0002 Epoch: 19 Global Step: 317520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:26,584-Speed 9061.89 samples/sec Loss 3.2663 LearningRate 0.0002 Epoch: 19 Global Step: 317530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:27,696-Speed 9213.40 samples/sec Loss 3.2584 LearningRate 0.0002 Epoch: 19 Global Step: 317540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:28,831-Speed 9023.51 samples/sec Loss 3.3168 LearningRate 0.0002 Epoch: 19 Global Step: 317550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:29,948-Speed 9174.72 samples/sec Loss 3.2774 LearningRate 0.0002 Epoch: 19 Global Step: 317560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:31,035-Speed 9424.59 samples/sec Loss 3.3818 LearningRate 0.0002 Epoch: 19 Global Step: 317570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:32,139-Speed 9285.49 samples/sec Loss 3.4123 LearningRate 0.0002 Epoch: 19 Global Step: 317580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:33,275-Speed 9022.94 samples/sec Loss 3.2742 LearningRate 0.0002 Epoch: 19 Global Step: 317590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:34,355-Speed 9490.12 samples/sec Loss 3.2698 LearningRate 0.0002 Epoch: 19 Global Step: 317600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:35,459-Speed 9279.77 samples/sec Loss 3.3480 LearningRate 0.0002 Epoch: 19 Global Step: 317610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:36,553-Speed 9360.39 samples/sec Loss 3.2822 LearningRate 0.0002 Epoch: 19 Global Step: 317620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:37,652-Speed 9328.53 samples/sec Loss 3.2728 LearningRate 0.0002 Epoch: 19 Global Step: 317630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:38,787-Speed 9024.42 samples/sec Loss 3.2974 LearningRate 0.0002 Epoch: 19 Global Step: 317640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:39,901-Speed 9197.66 samples/sec Loss 3.3121 LearningRate 0.0002 Epoch: 19 Global Step: 317650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:40,976-Speed 9528.83 samples/sec Loss 3.2820 LearningRate 0.0002 Epoch: 19 Global Step: 317660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:42,070-Speed 9365.35 samples/sec Loss 3.2415 LearningRate 0.0002 Epoch: 19 Global Step: 317670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:43,184-Speed 9206.43 samples/sec Loss 3.2732 LearningRate 0.0002 Epoch: 19 Global Step: 317680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:44,274-Speed 9397.10 samples/sec Loss 3.3185 LearningRate 0.0002 Epoch: 19 Global Step: 317690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:45,358-Speed 9451.23 samples/sec Loss 3.2315 LearningRate 0.0002 Epoch: 19 Global Step: 317700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:46,456-Speed 9334.29 samples/sec Loss 3.2255 LearningRate 0.0002 Epoch: 19 Global Step: 317710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:25:47,588-Speed 9046.01 samples/sec Loss 3.2334 LearningRate 0.0002 Epoch: 19 Global Step: 317720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:48,684-Speed 9350.27 samples/sec Loss 3.3128 LearningRate 0.0002 Epoch: 19 Global Step: 317730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:49,753-Speed 9590.20 samples/sec Loss 3.3549 LearningRate 0.0002 Epoch: 19 Global Step: 317740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:50,845-Speed 9377.62 samples/sec Loss 3.2477 LearningRate 0.0002 Epoch: 19 Global Step: 317750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:51,953-Speed 9253.82 samples/sec Loss 3.2723 LearningRate 0.0002 Epoch: 19 Global Step: 317760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:53,116-Speed 8812.59 samples/sec Loss 3.3954 LearningRate 0.0002 Epoch: 19 Global Step: 317770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:54,269-Speed 8879.39 samples/sec Loss 3.3220 LearningRate 0.0002 Epoch: 19 Global Step: 317780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:55,380-Speed 9225.33 samples/sec Loss 3.2928 LearningRate 0.0002 Epoch: 19 Global Step: 317790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:56,508-Speed 9082.30 samples/sec Loss 3.2633 LearningRate 0.0002 Epoch: 19 Global Step: 317800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:57,585-Speed 9513.09 samples/sec Loss 3.2712 LearningRate 0.0002 Epoch: 19 Global Step: 317810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:58,704-Speed 9154.46 samples/sec Loss 3.3367 LearningRate 0.0002 Epoch: 19 Global Step: 317820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:25:59,827-Speed 9127.10 samples/sec Loss 3.3369 LearningRate 0.0002 Epoch: 19 Global Step: 317830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:00,929-Speed 9294.90 samples/sec Loss 3.3017 LearningRate 0.0002 Epoch: 19 Global Step: 317840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:02,039-Speed 9235.10 samples/sec Loss 3.3225 LearningRate 0.0002 Epoch: 19 Global Step: 317850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:03,232-Speed 8590.06 samples/sec Loss 3.3843 LearningRate 0.0002 Epoch: 19 Global Step: 317860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:04,339-Speed 9253.44 samples/sec Loss 3.3375 LearningRate 0.0002 Epoch: 19 Global Step: 317870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:05,440-Speed 9308.49 samples/sec Loss 3.3192 LearningRate 0.0002 Epoch: 19 Global Step: 317880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:06,539-Speed 9323.99 samples/sec Loss 3.2817 LearningRate 0.0002 Epoch: 19 Global Step: 317890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:07,644-Speed 9268.21 samples/sec Loss 3.2649 LearningRate 0.0002 Epoch: 19 Global Step: 317900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:08,776-Speed 9054.91 samples/sec Loss 3.3126 LearningRate 0.0002 Epoch: 19 Global Step: 317910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:09,900-Speed 9117.15 samples/sec Loss 3.3104 LearningRate 0.0002 Epoch: 19 Global Step: 317920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:26:10,995-Speed 9357.64 samples/sec Loss 3.2639 LearningRate 0.0002 Epoch: 19 Global Step: 317930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:12,078-Speed 9456.58 samples/sec Loss 3.2083 LearningRate 0.0002 Epoch: 19 Global Step: 317940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:13,234-Speed 8868.03 samples/sec Loss 3.3240 LearningRate 0.0002 Epoch: 19 Global Step: 317950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:14,394-Speed 8832.23 samples/sec Loss 3.2824 LearningRate 0.0002 Epoch: 19 Global Step: 317960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:15,508-Speed 9196.46 samples/sec Loss 3.3118 LearningRate 0.0002 Epoch: 19 Global Step: 317970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:16,591-Speed 9458.50 samples/sec Loss 3.2740 LearningRate 0.0002 Epoch: 19 Global Step: 317980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:17,757-Speed 8788.17 samples/sec Loss 3.2982 LearningRate 0.0002 Epoch: 19 Global Step: 317990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:18,839-Speed 9468.87 samples/sec Loss 3.2792 LearningRate 0.0002 Epoch: 19 Global Step: 318000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:26:40,593-[lfw][318000]XNorm: 6.554832 Training: 2022-04-12 00:26:40,593-[lfw][318000]Accuracy-Flip: 0.99633+-0.00287 Training: 2022-04-12 00:26:40,593-[lfw][318000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:27:05,756-[cfp_fp][318000]XNorm: 5.732424 Training: 2022-04-12 00:27:05,756-[cfp_fp][318000]Accuracy-Flip: 0.97443+-0.00683 Training: 2022-04-12 00:27:05,757-[cfp_fp][318000]Accuracy-Highest: 0.97514 Training: 2022-04-12 00:27:27,468-[agedb_30][318000]XNorm: 6.385354 Training: 2022-04-12 00:27:27,469-[agedb_30][318000]Accuracy-Flip: 0.97333+-0.00782 Training: 2022-04-12 00:27:27,469-[agedb_30][318000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:27:28,579-Speed 146.83 samples/sec Loss 3.2781 LearningRate 0.0002 Epoch: 19 Global Step: 318010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:29,722-Speed 8966.16 samples/sec Loss 3.3494 LearningRate 0.0002 Epoch: 19 Global Step: 318020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:30,782-Speed 9658.39 samples/sec Loss 3.3319 LearningRate 0.0002 Epoch: 19 Global Step: 318030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:31,859-Speed 9517.94 samples/sec Loss 3.3351 LearningRate 0.0002 Epoch: 19 Global Step: 318040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:33,000-Speed 8977.11 samples/sec Loss 3.3226 LearningRate 0.0002 Epoch: 19 Global Step: 318050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:34,115-Speed 9189.64 samples/sec Loss 3.2921 LearningRate 0.0002 Epoch: 19 Global Step: 318060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:35,197-Speed 9478.11 samples/sec Loss 3.3069 LearningRate 0.0002 Epoch: 19 Global Step: 318070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:36,351-Speed 8875.80 samples/sec Loss 3.2918 LearningRate 0.0002 Epoch: 19 Global Step: 318080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:37,466-Speed 9188.77 samples/sec Loss 3.3108 LearningRate 0.0002 Epoch: 19 Global Step: 318090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:38,593-Speed 9088.75 samples/sec Loss 3.2813 LearningRate 0.0002 Epoch: 19 Global Step: 318100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:39,676-Speed 9466.51 samples/sec Loss 3.2251 LearningRate 0.0002 Epoch: 19 Global Step: 318110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:40,723-Speed 9785.79 samples/sec Loss 3.2810 LearningRate 0.0002 Epoch: 19 Global Step: 318120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:41,812-Speed 9405.97 samples/sec Loss 3.2635 LearningRate 0.0002 Epoch: 19 Global Step: 318130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:42,906-Speed 9366.12 samples/sec Loss 3.2751 LearningRate 0.0002 Epoch: 19 Global Step: 318140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:44,020-Speed 9199.26 samples/sec Loss 3.2666 LearningRate 0.0002 Epoch: 19 Global Step: 318150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:45,109-Speed 9405.76 samples/sec Loss 3.3012 LearningRate 0.0002 Epoch: 19 Global Step: 318160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:46,214-Speed 9270.84 samples/sec Loss 3.2153 LearningRate 0.0002 Epoch: 19 Global Step: 318170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:47,341-Speed 9092.28 samples/sec Loss 3.3136 LearningRate 0.0002 Epoch: 19 Global Step: 318180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:48,475-Speed 9038.21 samples/sec Loss 3.2216 LearningRate 0.0002 Epoch: 19 Global Step: 318190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:27:49,590-Speed 9186.13 samples/sec Loss 3.3190 LearningRate 0.0002 Epoch: 19 Global Step: 318200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:27:50,706-Speed 9180.71 samples/sec Loss 3.3507 LearningRate 0.0002 Epoch: 19 Global Step: 318210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:27:51,850-Speed 8958.87 samples/sec Loss 3.3027 LearningRate 0.0002 Epoch: 19 Global Step: 318220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:27:53,015-Speed 8795.95 samples/sec Loss 3.3177 LearningRate 0.0002 Epoch: 19 Global Step: 318230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:27:54,164-Speed 8919.06 samples/sec Loss 3.2759 LearningRate 0.0002 Epoch: 19 Global Step: 318240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:27:55,252-Speed 9415.66 samples/sec Loss 3.3099 LearningRate 0.0002 Epoch: 19 Global Step: 318250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:27:56,380-Speed 9084.92 samples/sec Loss 3.2997 LearningRate 0.0002 Epoch: 19 Global Step: 318260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:27:57,510-Speed 9063.80 samples/sec Loss 3.2673 LearningRate 0.0002 Epoch: 19 Global Step: 318270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:27:58,615-Speed 9279.32 samples/sec Loss 3.2839 LearningRate 0.0002 Epoch: 19 Global Step: 318280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:27:59,705-Speed 9394.51 samples/sec Loss 3.3259 LearningRate 0.0002 Epoch: 19 Global Step: 318290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:00,787-Speed 9471.99 samples/sec Loss 3.3043 LearningRate 0.0002 Epoch: 19 Global Step: 318300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:01,894-Speed 9259.95 samples/sec Loss 3.2299 LearningRate 0.0002 Epoch: 19 Global Step: 318310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:02,967-Speed 9551.90 samples/sec Loss 3.3335 LearningRate 0.0002 Epoch: 19 Global Step: 318320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:04,088-Speed 9135.59 samples/sec Loss 3.3088 LearningRate 0.0002 Epoch: 19 Global Step: 318330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:05,213-Speed 9108.91 samples/sec Loss 3.2908 LearningRate 0.0002 Epoch: 19 Global Step: 318340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:06,320-Speed 9251.80 samples/sec Loss 3.3149 LearningRate 0.0002 Epoch: 19 Global Step: 318350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:07,448-Speed 9088.59 samples/sec Loss 3.3356 LearningRate 0.0002 Epoch: 19 Global Step: 318360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:08,564-Speed 9178.63 samples/sec Loss 3.3151 LearningRate 0.0002 Epoch: 19 Global Step: 318370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:09,688-Speed 9109.13 samples/sec Loss 3.3299 LearningRate 0.0002 Epoch: 19 Global Step: 318380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:10,772-Speed 9457.23 samples/sec Loss 3.3087 LearningRate 0.0002 Epoch: 19 Global Step: 318390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:11,889-Speed 9166.67 samples/sec Loss 3.2204 LearningRate 0.0002 Epoch: 19 Global Step: 318400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:13,046-Speed 8859.98 samples/sec Loss 3.3377 LearningRate 0.0002 Epoch: 19 Global Step: 318410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:28:14,187-Speed 8981.83 samples/sec Loss 3.2026 LearningRate 0.0002 Epoch: 19 Global Step: 318420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:15,358-Speed 8748.12 samples/sec Loss 3.2353 LearningRate 0.0002 Epoch: 19 Global Step: 318430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:16,543-Speed 8645.80 samples/sec Loss 3.3209 LearningRate 0.0002 Epoch: 19 Global Step: 318440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:17,643-Speed 9318.22 samples/sec Loss 3.2991 LearningRate 0.0002 Epoch: 19 Global Step: 318450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:18,769-Speed 9097.70 samples/sec Loss 3.2825 LearningRate 0.0002 Epoch: 19 Global Step: 318460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:19,887-Speed 9170.44 samples/sec Loss 3.2647 LearningRate 0.0002 Epoch: 19 Global Step: 318470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:20,948-Speed 9650.09 samples/sec Loss 3.2050 LearningRate 0.0002 Epoch: 19 Global Step: 318480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:22,088-Speed 8993.58 samples/sec Loss 3.2801 LearningRate 0.0002 Epoch: 19 Global Step: 318490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:23,184-Speed 9343.05 samples/sec Loss 3.2967 LearningRate 0.0002 Epoch: 19 Global Step: 318500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:24,296-Speed 9212.76 samples/sec Loss 3.3087 LearningRate 0.0002 Epoch: 19 Global Step: 318510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:25,393-Speed 9337.85 samples/sec Loss 3.3484 LearningRate 0.0002 Epoch: 19 Global Step: 318520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:28:26,509-Speed 9183.29 samples/sec Loss 3.3263 LearningRate 0.0002 Epoch: 19 Global Step: 318530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:27,626-Speed 9170.69 samples/sec Loss 3.3466 LearningRate 0.0002 Epoch: 19 Global Step: 318540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:28,741-Speed 9191.12 samples/sec Loss 3.3448 LearningRate 0.0002 Epoch: 19 Global Step: 318550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:29,861-Speed 9147.34 samples/sec Loss 3.2663 LearningRate 0.0002 Epoch: 19 Global Step: 318560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:30,938-Speed 9510.30 samples/sec Loss 3.2284 LearningRate 0.0002 Epoch: 19 Global Step: 318570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:32,062-Speed 9118.21 samples/sec Loss 3.2699 LearningRate 0.0002 Epoch: 19 Global Step: 318580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:33,161-Speed 9327.51 samples/sec Loss 3.3049 LearningRate 0.0002 Epoch: 19 Global Step: 318590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:34,263-Speed 9293.34 samples/sec Loss 3.3364 LearningRate 0.0002 Epoch: 19 Global Step: 318600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:35,388-Speed 9112.60 samples/sec Loss 3.3681 LearningRate 0.0002 Epoch: 19 Global Step: 318610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:36,542-Speed 8876.16 samples/sec Loss 3.3093 LearningRate 0.0002 Epoch: 19 Global Step: 318620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:37,648-Speed 9264.95 samples/sec Loss 3.2880 LearningRate 0.0002 Epoch: 19 Global Step: 318630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:38,807-Speed 8842.69 samples/sec Loss 3.2162 LearningRate 0.0002 Epoch: 19 Global Step: 318640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:39,917-Speed 9231.81 samples/sec Loss 3.2815 LearningRate 0.0002 Epoch: 19 Global Step: 318650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:41,040-Speed 9121.61 samples/sec Loss 3.2884 LearningRate 0.0002 Epoch: 19 Global Step: 318660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:42,199-Speed 8836.76 samples/sec Loss 3.3045 LearningRate 0.0002 Epoch: 19 Global Step: 318670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:43,269-Speed 9586.61 samples/sec Loss 3.3504 LearningRate 0.0002 Epoch: 19 Global Step: 318680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:44,396-Speed 9089.69 samples/sec Loss 3.2467 LearningRate 0.0002 Epoch: 19 Global Step: 318690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:45,507-Speed 9222.66 samples/sec Loss 3.2820 LearningRate 0.0002 Epoch: 19 Global Step: 318700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:46,596-Speed 9403.86 samples/sec Loss 3.2738 LearningRate 0.0002 Epoch: 19 Global Step: 318710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:47,730-Speed 9038.28 samples/sec Loss 3.3566 LearningRate 0.0002 Epoch: 19 Global Step: 318720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:48,846-Speed 9175.64 samples/sec Loss 3.2572 LearningRate 0.0002 Epoch: 19 Global Step: 318730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:28:49,963-Speed 9180.87 samples/sec Loss 3.3652 LearningRate 0.0002 Epoch: 19 Global Step: 318740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:51,136-Speed 8732.66 samples/sec Loss 3.3304 LearningRate 0.0002 Epoch: 19 Global Step: 318750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:52,246-Speed 9226.26 samples/sec Loss 3.2913 LearningRate 0.0002 Epoch: 19 Global Step: 318760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:53,376-Speed 9068.52 samples/sec Loss 3.2498 LearningRate 0.0002 Epoch: 19 Global Step: 318770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:54,481-Speed 9275.00 samples/sec Loss 3.2879 LearningRate 0.0002 Epoch: 19 Global Step: 318780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:55,597-Speed 9184.73 samples/sec Loss 3.3442 LearningRate 0.0002 Epoch: 19 Global Step: 318790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:56,709-Speed 9210.38 samples/sec Loss 3.3162 LearningRate 0.0002 Epoch: 19 Global Step: 318800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:57,862-Speed 8883.06 samples/sec Loss 3.3550 LearningRate 0.0002 Epoch: 19 Global Step: 318810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:28:58,976-Speed 9203.05 samples/sec Loss 3.2615 LearningRate 0.0002 Epoch: 19 Global Step: 318820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:00,110-Speed 9028.13 samples/sec Loss 3.3304 LearningRate 0.0002 Epoch: 19 Global Step: 318830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:01,239-Speed 9075.24 samples/sec Loss 3.2753 LearningRate 0.0002 Epoch: 19 Global Step: 318840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:29:02,317-Speed 9513.50 samples/sec Loss 3.2840 LearningRate 0.0002 Epoch: 19 Global Step: 318850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:03,478-Speed 8822.02 samples/sec Loss 3.2169 LearningRate 0.0002 Epoch: 19 Global Step: 318860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:04,601-Speed 9123.02 samples/sec Loss 3.2727 LearningRate 0.0002 Epoch: 19 Global Step: 318870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:05,732-Speed 9059.58 samples/sec Loss 3.2669 LearningRate 0.0002 Epoch: 19 Global Step: 318880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:06,843-Speed 9220.33 samples/sec Loss 3.3208 LearningRate 0.0002 Epoch: 19 Global Step: 318890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:08,012-Speed 8767.01 samples/sec Loss 3.3140 LearningRate 0.0002 Epoch: 19 Global Step: 318900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:09,180-Speed 8771.43 samples/sec Loss 3.2755 LearningRate 0.0002 Epoch: 19 Global Step: 318910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:10,300-Speed 9152.10 samples/sec Loss 3.2637 LearningRate 0.0002 Epoch: 19 Global Step: 318920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:11,439-Speed 8990.74 samples/sec Loss 3.2892 LearningRate 0.0002 Epoch: 19 Global Step: 318930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:12,578-Speed 8998.30 samples/sec Loss 3.3262 LearningRate 0.0002 Epoch: 19 Global Step: 318940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:13,681-Speed 9294.16 samples/sec Loss 3.2579 LearningRate 0.0002 Epoch: 19 Global Step: 318950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:14,791-Speed 9231.23 samples/sec Loss 3.2180 LearningRate 0.0002 Epoch: 19 Global Step: 318960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:15,881-Speed 9402.20 samples/sec Loss 3.2722 LearningRate 0.0002 Epoch: 19 Global Step: 318970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:16,970-Speed 9412.36 samples/sec Loss 3.3045 LearningRate 0.0002 Epoch: 19 Global Step: 318980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:18,145-Speed 8714.79 samples/sec Loss 3.3362 LearningRate 0.0002 Epoch: 19 Global Step: 318990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:19,228-Speed 9459.02 samples/sec Loss 3.2913 LearningRate 0.0002 Epoch: 19 Global Step: 319000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:29:20,344-Speed 9189.98 samples/sec Loss 3.2793 LearningRate 0.0002 Epoch: 19 Global Step: 319010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:29:21,483-Speed 8993.22 samples/sec Loss 3.2518 LearningRate 0.0002 Epoch: 19 Global Step: 319020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:29:22,571-Speed 9422.64 samples/sec Loss 3.3555 LearningRate 0.0002 Epoch: 19 Global Step: 319030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:29:23,754-Speed 8656.37 samples/sec Loss 3.2371 LearningRate 0.0002 Epoch: 19 Global Step: 319040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:29:24,909-Speed 8873.56 samples/sec Loss 3.2288 LearningRate 0.0002 Epoch: 19 Global Step: 319050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:29:26,012-Speed 9292.31 samples/sec Loss 3.3014 LearningRate 0.0002 Epoch: 19 Global Step: 319060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:29:27,170-Speed 8848.12 samples/sec Loss 3.3026 LearningRate 0.0002 Epoch: 19 Global Step: 319070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:29:28,327-Speed 8850.94 samples/sec Loss 3.3150 LearningRate 0.0002 Epoch: 19 Global Step: 319080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:29:29,460-Speed 9044.01 samples/sec Loss 3.2938 LearningRate 0.0002 Epoch: 19 Global Step: 319090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:29:30,566-Speed 9265.78 samples/sec Loss 3.2702 LearningRate 0.0002 Epoch: 19 Global Step: 319100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:31,679-Speed 9202.07 samples/sec Loss 3.2661 LearningRate 0.0002 Epoch: 19 Global Step: 319110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:32,816-Speed 9020.55 samples/sec Loss 3.2761 LearningRate 0.0002 Epoch: 19 Global Step: 319120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:33,898-Speed 9466.58 samples/sec Loss 3.2352 LearningRate 0.0002 Epoch: 19 Global Step: 319130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:35,059-Speed 8822.95 samples/sec Loss 3.2120 LearningRate 0.0002 Epoch: 19 Global Step: 319140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:36,185-Speed 9100.95 samples/sec Loss 3.2815 LearningRate 0.0002 Epoch: 19 Global Step: 319150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:37,283-Speed 9334.40 samples/sec Loss 3.3072 LearningRate 0.0002 Epoch: 19 Global Step: 319160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:38,372-Speed 9407.04 samples/sec Loss 3.3296 LearningRate 0.0002 Epoch: 19 Global Step: 319170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:39,568-Speed 8566.75 samples/sec Loss 3.3177 LearningRate 0.0002 Epoch: 19 Global Step: 319180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:40,689-Speed 9133.47 samples/sec Loss 3.2812 LearningRate 0.0002 Epoch: 19 Global Step: 319190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:41,806-Speed 9176.08 samples/sec Loss 3.2617 LearningRate 0.0002 Epoch: 19 Global Step: 319200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:29:42,898-Speed 9384.82 samples/sec Loss 3.3243 LearningRate 0.0002 Epoch: 19 Global Step: 319210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:29:43,992-Speed 9370.63 samples/sec Loss 3.2669 LearningRate 0.0002 Epoch: 19 Global Step: 319220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:45,066-Speed 9537.70 samples/sec Loss 3.2621 LearningRate 0.0002 Epoch: 19 Global Step: 319230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:46,200-Speed 9033.23 samples/sec Loss 3.2722 LearningRate 0.0002 Epoch: 19 Global Step: 319240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:47,336-Speed 9018.70 samples/sec Loss 3.2670 LearningRate 0.0002 Epoch: 19 Global Step: 319250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:48,472-Speed 9022.18 samples/sec Loss 3.4567 LearningRate 0.0002 Epoch: 19 Global Step: 319260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:49,561-Speed 9406.83 samples/sec Loss 3.2670 LearningRate 0.0002 Epoch: 19 Global Step: 319270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:50,687-Speed 9109.79 samples/sec Loss 3.1973 LearningRate 0.0002 Epoch: 19 Global Step: 319280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:51,832-Speed 8949.16 samples/sec Loss 3.3612 LearningRate 0.0002 Epoch: 19 Global Step: 319290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:52,981-Speed 8917.96 samples/sec Loss 3.2570 LearningRate 0.0002 Epoch: 19 Global Step: 319300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:54,097-Speed 9175.97 samples/sec Loss 3.2899 LearningRate 0.0002 Epoch: 19 Global Step: 319310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:55,191-Speed 9368.81 samples/sec Loss 3.2641 LearningRate 0.0002 Epoch: 19 Global Step: 319320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:56,299-Speed 9244.16 samples/sec Loss 3.2853 LearningRate 0.0002 Epoch: 19 Global Step: 319330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:57,416-Speed 9173.05 samples/sec Loss 3.3420 LearningRate 0.0002 Epoch: 19 Global Step: 319340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:58,495-Speed 9495.61 samples/sec Loss 3.3213 LearningRate 0.0002 Epoch: 19 Global Step: 319350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:29:59,621-Speed 9103.22 samples/sec Loss 3.2400 LearningRate 0.0002 Epoch: 19 Global Step: 319360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:00,751-Speed 9067.13 samples/sec Loss 3.3236 LearningRate 0.0002 Epoch: 19 Global Step: 319370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:01,885-Speed 9033.00 samples/sec Loss 3.2752 LearningRate 0.0002 Epoch: 19 Global Step: 319380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:02,972-Speed 9428.01 samples/sec Loss 3.3034 LearningRate 0.0002 Epoch: 19 Global Step: 319390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:04,078-Speed 9265.49 samples/sec Loss 3.2999 LearningRate 0.0002 Epoch: 19 Global Step: 319400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:05,172-Speed 9360.39 samples/sec Loss 3.2349 LearningRate 0.0002 Epoch: 19 Global Step: 319410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:06,297-Speed 9105.93 samples/sec Loss 3.3300 LearningRate 0.0002 Epoch: 19 Global Step: 319420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:30:07,401-Speed 9281.18 samples/sec Loss 3.3274 LearningRate 0.0002 Epoch: 19 Global Step: 319430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:08,522-Speed 9143.99 samples/sec Loss 3.2043 LearningRate 0.0002 Epoch: 19 Global Step: 319440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:09,650-Speed 9085.04 samples/sec Loss 3.2879 LearningRate 0.0002 Epoch: 19 Global Step: 319450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:10,718-Speed 9593.41 samples/sec Loss 3.2334 LearningRate 0.0002 Epoch: 19 Global Step: 319460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:11,818-Speed 9313.36 samples/sec Loss 3.2205 LearningRate 0.0002 Epoch: 19 Global Step: 319470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:12,898-Speed 9485.13 samples/sec Loss 3.4333 LearningRate 0.0002 Epoch: 19 Global Step: 319480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:14,037-Speed 9003.67 samples/sec Loss 3.3118 LearningRate 0.0002 Epoch: 19 Global Step: 319490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:15,151-Speed 9195.45 samples/sec Loss 3.2871 LearningRate 0.0002 Epoch: 19 Global Step: 319500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:16,240-Speed 9406.50 samples/sec Loss 3.3171 LearningRate 0.0002 Epoch: 19 Global Step: 319510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:17,363-Speed 9125.72 samples/sec Loss 3.2748 LearningRate 0.0002 Epoch: 19 Global Step: 319520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:18,484-Speed 9143.28 samples/sec Loss 3.3013 LearningRate 0.0002 Epoch: 19 Global Step: 319530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:19,595-Speed 9216.66 samples/sec Loss 3.3007 LearningRate 0.0002 Epoch: 19 Global Step: 319540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:20,690-Speed 9364.64 samples/sec Loss 3.3312 LearningRate 0.0002 Epoch: 19 Global Step: 319550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:21,787-Speed 9338.99 samples/sec Loss 3.2544 LearningRate 0.0002 Epoch: 19 Global Step: 319560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:22,904-Speed 9167.12 samples/sec Loss 3.2428 LearningRate 0.0002 Epoch: 19 Global Step: 319570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:24,058-Speed 8883.16 samples/sec Loss 3.2834 LearningRate 0.0002 Epoch: 19 Global Step: 319580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:25,209-Speed 8899.96 samples/sec Loss 3.3148 LearningRate 0.0002 Epoch: 19 Global Step: 319590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:26,420-Speed 8461.06 samples/sec Loss 3.2345 LearningRate 0.0002 Epoch: 19 Global Step: 319600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:27,519-Speed 9322.97 samples/sec Loss 3.3434 LearningRate 0.0002 Epoch: 19 Global Step: 319610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:28,665-Speed 8939.89 samples/sec Loss 3.2827 LearningRate 0.0002 Epoch: 19 Global Step: 319620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:29,764-Speed 9325.54 samples/sec Loss 3.2370 LearningRate 0.0002 Epoch: 19 Global Step: 319630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:30,853-Speed 9405.96 samples/sec Loss 3.2358 LearningRate 0.0002 Epoch: 19 Global Step: 319640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:31,988-Speed 9028.12 samples/sec Loss 3.3381 LearningRate 0.0002 Epoch: 19 Global Step: 319650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:33,139-Speed 8908.12 samples/sec Loss 3.3109 LearningRate 0.0002 Epoch: 19 Global Step: 319660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:34,298-Speed 8835.25 samples/sec Loss 3.4082 LearningRate 0.0002 Epoch: 19 Global Step: 319670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:35,369-Speed 9568.34 samples/sec Loss 3.3358 LearningRate 0.0002 Epoch: 19 Global Step: 319680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:36,475-Speed 9262.40 samples/sec Loss 3.3657 LearningRate 0.0002 Epoch: 19 Global Step: 319690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:37,582-Speed 9254.26 samples/sec Loss 3.2937 LearningRate 0.0002 Epoch: 19 Global Step: 319700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:38,700-Speed 9164.11 samples/sec Loss 3.2996 LearningRate 0.0002 Epoch: 19 Global Step: 319710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:39,843-Speed 8967.98 samples/sec Loss 3.2590 LearningRate 0.0002 Epoch: 19 Global Step: 319720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:40,918-Speed 9523.85 samples/sec Loss 3.2713 LearningRate 0.0002 Epoch: 19 Global Step: 319730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:41,977-Speed 9681.40 samples/sec Loss 3.3489 LearningRate 0.0002 Epoch: 19 Global Step: 319740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:43,076-Speed 9322.27 samples/sec Loss 3.2554 LearningRate 0.0002 Epoch: 19 Global Step: 319750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:44,225-Speed 8920.35 samples/sec Loss 3.3251 LearningRate 0.0002 Epoch: 19 Global Step: 319760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:45,328-Speed 9288.73 samples/sec Loss 3.2077 LearningRate 0.0002 Epoch: 19 Global Step: 319770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:46,425-Speed 9343.54 samples/sec Loss 3.2678 LearningRate 0.0002 Epoch: 19 Global Step: 319780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:47,551-Speed 9097.45 samples/sec Loss 3.3158 LearningRate 0.0002 Epoch: 19 Global Step: 319790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:48,644-Speed 9377.23 samples/sec Loss 3.2806 LearningRate 0.0002 Epoch: 19 Global Step: 319800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:49,772-Speed 9086.13 samples/sec Loss 3.2927 LearningRate 0.0002 Epoch: 19 Global Step: 319810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:50,856-Speed 9448.19 samples/sec Loss 3.3023 LearningRate 0.0002 Epoch: 19 Global Step: 319820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:51,981-Speed 9106.57 samples/sec Loss 3.3147 LearningRate 0.0002 Epoch: 19 Global Step: 319830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:53,102-Speed 9137.65 samples/sec Loss 3.2290 LearningRate 0.0002 Epoch: 19 Global Step: 319840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:54,228-Speed 9104.11 samples/sec Loss 3.2937 LearningRate 0.0002 Epoch: 19 Global Step: 319850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:55,311-Speed 9458.72 samples/sec Loss 3.3185 LearningRate 0.0002 Epoch: 19 Global Step: 319860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:56,401-Speed 9399.62 samples/sec Loss 3.2899 LearningRate 0.0002 Epoch: 19 Global Step: 319870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:57,575-Speed 8725.04 samples/sec Loss 3.3321 LearningRate 0.0002 Epoch: 19 Global Step: 319880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:58,713-Speed 9003.59 samples/sec Loss 3.2875 LearningRate 0.0002 Epoch: 19 Global Step: 319890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:30:59,838-Speed 9108.51 samples/sec Loss 3.2656 LearningRate 0.0002 Epoch: 19 Global Step: 319900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:31:00,930-Speed 9377.89 samples/sec Loss 3.3616 LearningRate 0.0002 Epoch: 19 Global Step: 319910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:31:02,018-Speed 9423.97 samples/sec Loss 3.2974 LearningRate 0.0002 Epoch: 19 Global Step: 319920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:31:03,117-Speed 9325.53 samples/sec Loss 3.2850 LearningRate 0.0002 Epoch: 19 Global Step: 319930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:31:04,246-Speed 9075.44 samples/sec Loss 3.2237 LearningRate 0.0002 Epoch: 19 Global Step: 319940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:31:05,323-Speed 9512.63 samples/sec Loss 3.2999 LearningRate 0.0002 Epoch: 19 Global Step: 319950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:31:06,471-Speed 8929.00 samples/sec Loss 3.3027 LearningRate 0.0002 Epoch: 19 Global Step: 319960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:31:07,592-Speed 9133.98 samples/sec Loss 3.3039 LearningRate 0.0002 Epoch: 19 Global Step: 319970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:31:08,728-Speed 9024.79 samples/sec Loss 3.2882 LearningRate 0.0002 Epoch: 19 Global Step: 319980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:31:09,821-Speed 9373.62 samples/sec Loss 3.2200 LearningRate 0.0002 Epoch: 19 Global Step: 319990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:31:10,901-Speed 9480.99 samples/sec Loss 3.3068 LearningRate 0.0002 Epoch: 19 Global Step: 320000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:31:32,812-[lfw][320000]XNorm: 6.561927 Training: 2022-04-12 00:31:32,813-[lfw][320000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-04-12 00:31:32,814-[lfw][320000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:31:58,171-[cfp_fp][320000]XNorm: 5.731094 Training: 2022-04-12 00:31:58,172-[cfp_fp][320000]Accuracy-Flip: 0.97271+-0.00907 Training: 2022-04-12 00:31:58,172-[cfp_fp][320000]Accuracy-Highest: 0.97514 Training: 2022-04-12 00:32:20,120-[agedb_30][320000]XNorm: 6.391900 Training: 2022-04-12 00:32:20,121-[agedb_30][320000]Accuracy-Flip: 0.97233+-0.00782 Training: 2022-04-12 00:32:20,121-[agedb_30][320000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:32:21,219-Speed 145.63 samples/sec Loss 3.3279 LearningRate 0.0002 Epoch: 19 Global Step: 320010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:22,317-Speed 9325.98 samples/sec Loss 3.2269 LearningRate 0.0002 Epoch: 19 Global Step: 320020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:23,408-Speed 9398.27 samples/sec Loss 3.3593 LearningRate 0.0002 Epoch: 19 Global Step: 320030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:24,546-Speed 9000.02 samples/sec Loss 3.3558 LearningRate 0.0002 Epoch: 19 Global Step: 320040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:32:25,691-Speed 8948.09 samples/sec Loss 3.2497 LearningRate 0.0002 Epoch: 19 Global Step: 320050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:26,851-Speed 8835.65 samples/sec Loss 3.2922 LearningRate 0.0002 Epoch: 19 Global Step: 320060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:27,980-Speed 9076.36 samples/sec Loss 3.3138 LearningRate 0.0002 Epoch: 19 Global Step: 320070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:29,136-Speed 8863.23 samples/sec Loss 3.2817 LearningRate 0.0002 Epoch: 19 Global Step: 320080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:30,316-Speed 8684.42 samples/sec Loss 3.2876 LearningRate 0.0002 Epoch: 19 Global Step: 320090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:31,424-Speed 9242.85 samples/sec Loss 3.2883 LearningRate 0.0002 Epoch: 19 Global Step: 320100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:32,547-Speed 9125.40 samples/sec Loss 3.2861 LearningRate 0.0002 Epoch: 19 Global Step: 320110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:33,687-Speed 8989.41 samples/sec Loss 3.3041 LearningRate 0.0002 Epoch: 19 Global Step: 320120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:34,787-Speed 9310.43 samples/sec Loss 3.2457 LearningRate 0.0002 Epoch: 19 Global Step: 320130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:35,912-Speed 9112.56 samples/sec Loss 3.1916 LearningRate 0.0002 Epoch: 19 Global Step: 320140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:37,074-Speed 8812.72 samples/sec Loss 3.3169 LearningRate 0.0002 Epoch: 19 Global Step: 320150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:38,186-Speed 9213.36 samples/sec Loss 3.2855 LearningRate 0.0002 Epoch: 19 Global Step: 320160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:39,293-Speed 9259.05 samples/sec Loss 3.3578 LearningRate 0.0002 Epoch: 19 Global Step: 320170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:40,431-Speed 9003.97 samples/sec Loss 3.2657 LearningRate 0.0002 Epoch: 19 Global Step: 320180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:41,607-Speed 8711.21 samples/sec Loss 3.2638 LearningRate 0.0002 Epoch: 19 Global Step: 320190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:42,693-Speed 9435.69 samples/sec Loss 3.3725 LearningRate 0.0002 Epoch: 19 Global Step: 320200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:43,784-Speed 9391.38 samples/sec Loss 3.3044 LearningRate 0.0002 Epoch: 19 Global Step: 320210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:44,883-Speed 9326.48 samples/sec Loss 3.2817 LearningRate 0.0002 Epoch: 19 Global Step: 320220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:45,984-Speed 9299.33 samples/sec Loss 3.3368 LearningRate 0.0002 Epoch: 19 Global Step: 320230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:47,112-Speed 9090.55 samples/sec Loss 3.3897 LearningRate 0.0002 Epoch: 19 Global Step: 320240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:48,268-Speed 8858.83 samples/sec Loss 3.3356 LearningRate 0.0002 Epoch: 19 Global Step: 320250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:32:49,376-Speed 9251.12 samples/sec Loss 3.2361 LearningRate 0.0002 Epoch: 19 Global Step: 320260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:50,492-Speed 9183.03 samples/sec Loss 3.2841 LearningRate 0.0002 Epoch: 19 Global Step: 320270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:51,602-Speed 9235.40 samples/sec Loss 3.2781 LearningRate 0.0002 Epoch: 19 Global Step: 320280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:52,701-Speed 9316.76 samples/sec Loss 3.2840 LearningRate 0.0002 Epoch: 19 Global Step: 320290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:53,848-Speed 8932.10 samples/sec Loss 3.2553 LearningRate 0.0002 Epoch: 19 Global Step: 320300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:54,969-Speed 9144.10 samples/sec Loss 3.3252 LearningRate 0.0002 Epoch: 19 Global Step: 320310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:56,096-Speed 9090.07 samples/sec Loss 3.2480 LearningRate 0.0002 Epoch: 19 Global Step: 320320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:57,181-Speed 9440.88 samples/sec Loss 3.2731 LearningRate 0.0002 Epoch: 19 Global Step: 320330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:58,315-Speed 9040.40 samples/sec Loss 3.3126 LearningRate 0.0002 Epoch: 19 Global Step: 320340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:32:59,432-Speed 9170.89 samples/sec Loss 3.2677 LearningRate 0.0002 Epoch: 19 Global Step: 320350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:00,554-Speed 9130.45 samples/sec Loss 3.3881 LearningRate 0.0002 Epoch: 19 Global Step: 320360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:33:01,677-Speed 9121.91 samples/sec Loss 3.2506 LearningRate 0.0002 Epoch: 19 Global Step: 320370 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:33:02,778-Speed 9308.46 samples/sec Loss 3.2380 LearningRate 0.0002 Epoch: 19 Global Step: 320380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:03,901-Speed 9122.33 samples/sec Loss 3.2820 LearningRate 0.0002 Epoch: 19 Global Step: 320390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:05,013-Speed 9214.04 samples/sec Loss 3.2440 LearningRate 0.0002 Epoch: 19 Global Step: 320400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:06,121-Speed 9249.80 samples/sec Loss 3.2743 LearningRate 0.0002 Epoch: 19 Global Step: 320410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:07,216-Speed 9355.71 samples/sec Loss 3.2957 LearningRate 0.0002 Epoch: 19 Global Step: 320420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:08,367-Speed 8911.39 samples/sec Loss 3.2974 LearningRate 0.0002 Epoch: 19 Global Step: 320430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:09,473-Speed 9262.59 samples/sec Loss 3.1725 LearningRate 0.0002 Epoch: 19 Global Step: 320440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:10,639-Speed 8783.10 samples/sec Loss 3.3014 LearningRate 0.0002 Epoch: 19 Global Step: 320450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:11,770-Speed 9066.00 samples/sec Loss 3.2764 LearningRate 0.0002 Epoch: 19 Global Step: 320460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:12,916-Speed 8934.48 samples/sec Loss 3.3569 LearningRate 0.0002 Epoch: 19 Global Step: 320470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:14,015-Speed 9324.07 samples/sec Loss 3.3188 LearningRate 0.0002 Epoch: 19 Global Step: 320480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:33:15,161-Speed 8939.61 samples/sec Loss 3.3397 LearningRate 0.0002 Epoch: 19 Global Step: 320490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:16,242-Speed 9481.93 samples/sec Loss 3.2807 LearningRate 0.0002 Epoch: 19 Global Step: 320500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:17,356-Speed 9198.86 samples/sec Loss 3.2338 LearningRate 0.0002 Epoch: 19 Global Step: 320510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:18,451-Speed 9363.21 samples/sec Loss 3.2355 LearningRate 0.0002 Epoch: 19 Global Step: 320520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:19,607-Speed 8860.82 samples/sec Loss 3.3047 LearningRate 0.0002 Epoch: 19 Global Step: 320530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:20,720-Speed 9206.29 samples/sec Loss 3.2779 LearningRate 0.0002 Epoch: 19 Global Step: 320540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:21,809-Speed 9400.79 samples/sec Loss 3.1799 LearningRate 0.0002 Epoch: 19 Global Step: 320550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:22,926-Speed 9174.60 samples/sec Loss 3.3259 LearningRate 0.0002 Epoch: 19 Global Step: 320560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:24,006-Speed 9489.56 samples/sec Loss 3.2529 LearningRate 0.0002 Epoch: 19 Global Step: 320570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:25,109-Speed 9289.95 samples/sec Loss 3.2798 LearningRate 0.0002 Epoch: 19 Global Step: 320580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:26,243-Speed 9035.52 samples/sec Loss 3.2356 LearningRate 0.0002 Epoch: 19 Global Step: 320590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:27,320-Speed 9517.40 samples/sec Loss 3.2422 LearningRate 0.0002 Epoch: 19 Global Step: 320600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:28,391-Speed 9560.78 samples/sec Loss 3.3147 LearningRate 0.0002 Epoch: 19 Global Step: 320610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:29,514-Speed 9123.95 samples/sec Loss 3.2936 LearningRate 0.0002 Epoch: 19 Global Step: 320620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:30,631-Speed 9176.41 samples/sec Loss 3.2921 LearningRate 0.0002 Epoch: 19 Global Step: 320630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:31,755-Speed 9116.07 samples/sec Loss 3.3507 LearningRate 0.0002 Epoch: 19 Global Step: 320640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:32,907-Speed 8895.07 samples/sec Loss 3.2424 LearningRate 0.0002 Epoch: 19 Global Step: 320650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:34,066-Speed 8836.38 samples/sec Loss 3.3308 LearningRate 0.0002 Epoch: 19 Global Step: 320660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:35,148-Speed 9469.23 samples/sec Loss 3.2982 LearningRate 0.0002 Epoch: 19 Global Step: 320670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:36,252-Speed 9280.44 samples/sec Loss 3.3001 LearningRate 0.0002 Epoch: 19 Global Step: 320680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:37,359-Speed 9259.45 samples/sec Loss 3.2439 LearningRate 0.0002 Epoch: 19 Global Step: 320690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:38,481-Speed 9130.08 samples/sec Loss 3.2703 LearningRate 0.0002 Epoch: 19 Global Step: 320700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:39,626-Speed 8946.19 samples/sec Loss 3.2535 LearningRate 0.0002 Epoch: 19 Global Step: 320710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:40,730-Speed 9286.38 samples/sec Loss 3.2749 LearningRate 0.0002 Epoch: 19 Global Step: 320720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:41,848-Speed 9162.98 samples/sec Loss 3.3069 LearningRate 0.0002 Epoch: 19 Global Step: 320730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:42,973-Speed 9107.71 samples/sec Loss 3.2507 LearningRate 0.0002 Epoch: 19 Global Step: 320740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:44,130-Speed 8852.20 samples/sec Loss 3.3136 LearningRate 0.0002 Epoch: 19 Global Step: 320750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:45,219-Speed 9416.78 samples/sec Loss 3.2403 LearningRate 0.0002 Epoch: 19 Global Step: 320760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:46,341-Speed 9129.86 samples/sec Loss 3.2601 LearningRate 0.0002 Epoch: 19 Global Step: 320770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:47,454-Speed 9204.45 samples/sec Loss 3.3285 LearningRate 0.0002 Epoch: 19 Global Step: 320780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:48,573-Speed 9159.85 samples/sec Loss 3.2744 LearningRate 0.0002 Epoch: 19 Global Step: 320790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:49,690-Speed 9172.80 samples/sec Loss 3.3256 LearningRate 0.0002 Epoch: 19 Global Step: 320800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:50,802-Speed 9214.28 samples/sec Loss 3.2500 LearningRate 0.0002 Epoch: 19 Global Step: 320810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:51,904-Speed 9300.31 samples/sec Loss 3.2563 LearningRate 0.0002 Epoch: 19 Global Step: 320820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:52,999-Speed 9360.75 samples/sec Loss 3.3143 LearningRate 0.0002 Epoch: 19 Global Step: 320830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:54,098-Speed 9318.44 samples/sec Loss 3.2699 LearningRate 0.0002 Epoch: 19 Global Step: 320840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:55,165-Speed 9605.40 samples/sec Loss 3.1917 LearningRate 0.0002 Epoch: 19 Global Step: 320850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:56,294-Speed 9078.13 samples/sec Loss 3.3510 LearningRate 0.0002 Epoch: 19 Global Step: 320860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:57,417-Speed 9121.44 samples/sec Loss 3.2711 LearningRate 0.0002 Epoch: 19 Global Step: 320870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:58,502-Speed 9445.75 samples/sec Loss 3.2435 LearningRate 0.0002 Epoch: 19 Global Step: 320880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:33:59,634-Speed 9054.22 samples/sec Loss 3.2326 LearningRate 0.0002 Epoch: 19 Global Step: 320890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:34:00,754-Speed 9142.85 samples/sec Loss 3.2146 LearningRate 0.0001 Epoch: 19 Global Step: 320900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:01,875-Speed 9144.10 samples/sec Loss 3.3165 LearningRate 0.0001 Epoch: 19 Global Step: 320910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:02,995-Speed 9143.90 samples/sec Loss 3.2948 LearningRate 0.0001 Epoch: 19 Global Step: 320920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:04,133-Speed 9008.97 samples/sec Loss 3.3659 LearningRate 0.0001 Epoch: 19 Global Step: 320930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:05,221-Speed 9414.06 samples/sec Loss 3.3308 LearningRate 0.0001 Epoch: 19 Global Step: 320940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:06,346-Speed 9113.02 samples/sec Loss 3.2659 LearningRate 0.0001 Epoch: 19 Global Step: 320950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:07,457-Speed 9221.79 samples/sec Loss 3.3566 LearningRate 0.0001 Epoch: 19 Global Step: 320960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:08,531-Speed 9541.93 samples/sec Loss 3.3212 LearningRate 0.0001 Epoch: 19 Global Step: 320970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:09,640-Speed 9239.19 samples/sec Loss 3.3185 LearningRate 0.0001 Epoch: 19 Global Step: 320980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:10,780-Speed 8987.13 samples/sec Loss 3.2561 LearningRate 0.0001 Epoch: 19 Global Step: 320990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:11,839-Speed 9677.29 samples/sec Loss 3.2835 LearningRate 0.0001 Epoch: 19 Global Step: 321000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:12,953-Speed 9192.04 samples/sec Loss 3.2503 LearningRate 0.0001 Epoch: 19 Global Step: 321010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:14,075-Speed 9136.12 samples/sec Loss 3.3236 LearningRate 0.0001 Epoch: 19 Global Step: 321020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:15,227-Speed 8890.52 samples/sec Loss 3.2700 LearningRate 0.0001 Epoch: 19 Global Step: 321030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:16,347-Speed 9151.58 samples/sec Loss 3.3645 LearningRate 0.0001 Epoch: 19 Global Step: 321040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:17,467-Speed 9146.93 samples/sec Loss 3.2768 LearningRate 0.0001 Epoch: 19 Global Step: 321050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:18,578-Speed 9217.14 samples/sec Loss 3.2763 LearningRate 0.0001 Epoch: 19 Global Step: 321060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:19,695-Speed 9178.66 samples/sec Loss 3.2753 LearningRate 0.0001 Epoch: 19 Global Step: 321070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:20,797-Speed 9294.02 samples/sec Loss 3.3344 LearningRate 0.0001 Epoch: 19 Global Step: 321080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:21,905-Speed 9246.18 samples/sec Loss 3.3087 LearningRate 0.0001 Epoch: 19 Global Step: 321090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:34:23,037-Speed 9057.70 samples/sec Loss 3.3156 LearningRate 0.0001 Epoch: 19 Global Step: 321100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:34:24,140-Speed 9284.26 samples/sec Loss 3.2993 LearningRate 0.0001 Epoch: 19 Global Step: 321110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:34:25,293-Speed 8889.88 samples/sec Loss 3.2527 LearningRate 0.0001 Epoch: 19 Global Step: 321120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:34:26,459-Speed 8791.13 samples/sec Loss 3.3400 LearningRate 0.0001 Epoch: 19 Global Step: 321130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:34:27,547-Speed 9413.14 samples/sec Loss 3.2563 LearningRate 0.0001 Epoch: 19 Global Step: 321140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:34:28,646-Speed 9321.41 samples/sec Loss 3.2892 LearningRate 0.0001 Epoch: 19 Global Step: 321150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:34:29,746-Speed 9315.33 samples/sec Loss 3.2767 LearningRate 0.0001 Epoch: 19 Global Step: 321160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:34:30,867-Speed 9139.25 samples/sec Loss 3.2601 LearningRate 0.0001 Epoch: 19 Global Step: 321170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:34:31,973-Speed 9262.92 samples/sec Loss 3.2091 LearningRate 0.0001 Epoch: 19 Global Step: 321180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:34:33,105-Speed 9054.44 samples/sec Loss 3.2251 LearningRate 0.0001 Epoch: 19 Global Step: 321190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:34,245-Speed 8985.23 samples/sec Loss 3.2717 LearningRate 0.0001 Epoch: 19 Global Step: 321200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:35,369-Speed 9118.68 samples/sec Loss 3.2628 LearningRate 0.0001 Epoch: 19 Global Step: 321210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:36,480-Speed 9222.05 samples/sec Loss 3.3687 LearningRate 0.0001 Epoch: 19 Global Step: 321220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:37,569-Speed 9404.08 samples/sec Loss 3.3409 LearningRate 0.0001 Epoch: 19 Global Step: 321230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:38,698-Speed 9074.14 samples/sec Loss 3.3058 LearningRate 0.0001 Epoch: 19 Global Step: 321240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:39,822-Speed 9119.18 samples/sec Loss 3.2701 LearningRate 0.0001 Epoch: 19 Global Step: 321250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:40,898-Speed 9524.47 samples/sec Loss 3.3210 LearningRate 0.0001 Epoch: 19 Global Step: 321260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:42,011-Speed 9209.50 samples/sec Loss 3.3125 LearningRate 0.0001 Epoch: 19 Global Step: 321270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:43,114-Speed 9282.61 samples/sec Loss 3.2610 LearningRate 0.0001 Epoch: 19 Global Step: 321280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:44,225-Speed 9230.77 samples/sec Loss 3.2697 LearningRate 0.0001 Epoch: 19 Global Step: 321290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-12 00:34:45,329-Speed 9278.92 samples/sec Loss 3.2845 LearningRate 0.0001 Epoch: 19 Global Step: 321300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:46,458-Speed 9076.96 samples/sec Loss 3.3650 LearningRate 0.0001 Epoch: 19 Global Step: 321310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:47,564-Speed 9258.38 samples/sec Loss 3.2403 LearningRate 0.0001 Epoch: 19 Global Step: 321320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:48,736-Speed 8749.05 samples/sec Loss 3.2538 LearningRate 0.0001 Epoch: 19 Global Step: 321330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:49,867-Speed 9060.50 samples/sec Loss 3.2385 LearningRate 0.0001 Epoch: 19 Global Step: 321340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:50,996-Speed 9071.61 samples/sec Loss 3.2678 LearningRate 0.0001 Epoch: 19 Global Step: 321350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:52,091-Speed 9359.77 samples/sec Loss 3.3463 LearningRate 0.0001 Epoch: 19 Global Step: 321360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:53,197-Speed 9266.19 samples/sec Loss 3.3549 LearningRate 0.0001 Epoch: 19 Global Step: 321370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:54,352-Speed 8872.96 samples/sec Loss 3.2402 LearningRate 0.0001 Epoch: 19 Global Step: 321380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:55,453-Speed 9307.58 samples/sec Loss 3.1853 LearningRate 0.0001 Epoch: 19 Global Step: 321390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:56,558-Speed 9273.89 samples/sec Loss 3.2339 LearningRate 0.0001 Epoch: 19 Global Step: 321400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:57,640-Speed 9463.24 samples/sec Loss 3.2656 LearningRate 0.0001 Epoch: 19 Global Step: 321410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:58,758-Speed 9167.60 samples/sec Loss 3.3638 LearningRate 0.0001 Epoch: 19 Global Step: 321420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-12 00:34:59,819-Speed 9657.87 samples/sec Loss 3.2403 LearningRate 0.0001 Epoch: 19 Global Step: 321430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:35:00,899-Speed 9489.51 samples/sec Loss 3.2712 LearningRate 0.0001 Epoch: 19 Global Step: 321440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:35:02,000-Speed 9302.50 samples/sec Loss 3.2753 LearningRate 0.0001 Epoch: 19 Global Step: 321450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:35:03,097-Speed 9347.66 samples/sec Loss 3.3187 LearningRate 0.0001 Epoch: 19 Global Step: 321460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:35:04,220-Speed 9118.00 samples/sec Loss 3.2522 LearningRate 0.0001 Epoch: 19 Global Step: 321470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:35:05,360-Speed 8991.26 samples/sec Loss 3.3121 LearningRate 0.0001 Epoch: 19 Global Step: 321480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-12 00:35:06,491-Speed 9061.69 samples/sec Loss 3.2779 LearningRate 0.0001 Epoch: 19 Global Step: 321490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:07,633-Speed 8974.11 samples/sec Loss 3.2830 LearningRate 0.0001 Epoch: 19 Global Step: 321500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:08,763-Speed 9067.05 samples/sec Loss 3.2727 LearningRate 0.0001 Epoch: 19 Global Step: 321510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:09,878-Speed 9188.92 samples/sec Loss 3.2489 LearningRate 0.0001 Epoch: 19 Global Step: 321520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:10,998-Speed 9145.73 samples/sec Loss 3.3294 LearningRate 0.0001 Epoch: 19 Global Step: 321530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:12,123-Speed 9105.60 samples/sec Loss 3.3146 LearningRate 0.0001 Epoch: 19 Global Step: 321540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:13,196-Speed 9555.01 samples/sec Loss 3.2540 LearningRate 0.0001 Epoch: 19 Global Step: 321550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:14,302-Speed 9256.90 samples/sec Loss 3.2814 LearningRate 0.0001 Epoch: 19 Global Step: 321560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:15,467-Speed 8796.66 samples/sec Loss 3.2695 LearningRate 0.0001 Epoch: 19 Global Step: 321570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:16,591-Speed 9112.33 samples/sec Loss 3.2584 LearningRate 0.0001 Epoch: 19 Global Step: 321580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:17,733-Speed 8973.36 samples/sec Loss 3.3285 LearningRate 0.0001 Epoch: 19 Global Step: 321590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:18,871-Speed 9009.53 samples/sec Loss 3.2851 LearningRate 0.0001 Epoch: 19 Global Step: 321600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:19,987-Speed 9185.12 samples/sec Loss 3.3065 LearningRate 0.0001 Epoch: 19 Global Step: 321610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:21,119-Speed 9052.17 samples/sec Loss 3.2249 LearningRate 0.0001 Epoch: 19 Global Step: 321620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:22,222-Speed 9284.30 samples/sec Loss 3.2411 LearningRate 0.0001 Epoch: 19 Global Step: 321630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:23,330-Speed 9244.86 samples/sec Loss 3.2592 LearningRate 0.0001 Epoch: 19 Global Step: 321640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:24,473-Speed 8968.46 samples/sec Loss 3.2642 LearningRate 0.0001 Epoch: 19 Global Step: 321650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:25,647-Speed 8723.98 samples/sec Loss 3.3576 LearningRate 0.0001 Epoch: 19 Global Step: 321660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:26,766-Speed 9157.28 samples/sec Loss 3.2491 LearningRate 0.0001 Epoch: 19 Global Step: 321670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:27,879-Speed 9208.35 samples/sec Loss 3.3127 LearningRate 0.0001 Epoch: 19 Global Step: 321680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:29,042-Speed 8808.11 samples/sec Loss 3.2484 LearningRate 0.0001 Epoch: 19 Global Step: 321690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:30,162-Speed 9144.07 samples/sec Loss 3.2776 LearningRate 0.0001 Epoch: 19 Global Step: 321700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:31,291-Speed 9081.03 samples/sec Loss 3.2287 LearningRate 0.0001 Epoch: 19 Global Step: 321710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:32,390-Speed 9321.96 samples/sec Loss 3.2275 LearningRate 0.0001 Epoch: 19 Global Step: 321720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:33,491-Speed 9303.20 samples/sec Loss 3.2775 LearningRate 0.0001 Epoch: 19 Global Step: 321730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:34,663-Speed 8742.22 samples/sec Loss 3.2127 LearningRate 0.0001 Epoch: 19 Global Step: 321740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:35,761-Speed 9327.98 samples/sec Loss 3.3498 LearningRate 0.0001 Epoch: 19 Global Step: 321750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:36,869-Speed 9252.69 samples/sec Loss 3.2869 LearningRate 0.0001 Epoch: 19 Global Step: 321760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:37,976-Speed 9259.76 samples/sec Loss 3.2925 LearningRate 0.0001 Epoch: 19 Global Step: 321770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:39,185-Speed 8476.10 samples/sec Loss 3.3151 LearningRate 0.0001 Epoch: 19 Global Step: 321780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:40,290-Speed 9271.71 samples/sec Loss 3.3052 LearningRate 0.0001 Epoch: 19 Global Step: 321790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:41,382-Speed 9384.49 samples/sec Loss 3.2745 LearningRate 0.0001 Epoch: 19 Global Step: 321800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:42,471-Speed 9406.19 samples/sec Loss 3.3598 LearningRate 0.0001 Epoch: 19 Global Step: 321810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:43,567-Speed 9342.76 samples/sec Loss 3.3210 LearningRate 0.0001 Epoch: 19 Global Step: 321820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:44,680-Speed 9207.39 samples/sec Loss 3.3233 LearningRate 0.0001 Epoch: 19 Global Step: 321830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:45,770-Speed 9401.36 samples/sec Loss 3.2727 LearningRate 0.0001 Epoch: 19 Global Step: 321840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:46,892-Speed 9132.59 samples/sec Loss 3.2500 LearningRate 0.0001 Epoch: 19 Global Step: 321850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:48,012-Speed 9146.05 samples/sec Loss 3.3289 LearningRate 0.0001 Epoch: 19 Global Step: 321860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:35:49,124-Speed 9212.39 samples/sec Loss 3.2610 LearningRate 0.0001 Epoch: 19 Global Step: 321870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:50,213-Speed 9408.31 samples/sec Loss 3.1802 LearningRate 0.0001 Epoch: 19 Global Step: 321880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:51,327-Speed 9201.01 samples/sec Loss 3.3898 LearningRate 0.0001 Epoch: 19 Global Step: 321890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:52,403-Speed 9522.90 samples/sec Loss 3.3513 LearningRate 0.0001 Epoch: 19 Global Step: 321900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:53,550-Speed 8933.21 samples/sec Loss 3.3016 LearningRate 0.0001 Epoch: 19 Global Step: 321910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:54,737-Speed 8629.12 samples/sec Loss 3.3098 LearningRate 0.0001 Epoch: 19 Global Step: 321920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:55,878-Speed 8986.02 samples/sec Loss 3.2840 LearningRate 0.0001 Epoch: 19 Global Step: 321930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:56,977-Speed 9328.05 samples/sec Loss 3.2535 LearningRate 0.0001 Epoch: 19 Global Step: 321940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:58,069-Speed 9376.14 samples/sec Loss 3.2846 LearningRate 0.0001 Epoch: 19 Global Step: 321950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:35:59,223-Speed 8878.24 samples/sec Loss 3.2390 LearningRate 0.0001 Epoch: 19 Global Step: 321960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:36:00,377-Speed 8881.55 samples/sec Loss 3.2178 LearningRate 0.0001 Epoch: 19 Global Step: 321970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:36:01,485-Speed 9244.93 samples/sec Loss 3.3057 LearningRate 0.0001 Epoch: 19 Global Step: 321980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:36:02,630-Speed 8949.37 samples/sec Loss 3.2830 LearningRate 0.0001 Epoch: 19 Global Step: 321990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:36:03,732-Speed 9295.79 samples/sec Loss 3.3657 LearningRate 0.0001 Epoch: 19 Global Step: 322000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:36:25,860-[lfw][322000]XNorm: 6.541839 Training: 2022-04-12 00:36:25,861-[lfw][322000]Accuracy-Flip: 0.99667+-0.00269 Training: 2022-04-12 00:36:25,861-[lfw][322000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:36:51,110-[cfp_fp][322000]XNorm: 5.712872 Training: 2022-04-12 00:36:51,110-[cfp_fp][322000]Accuracy-Flip: 0.97371+-0.00767 Training: 2022-04-12 00:36:51,111-[cfp_fp][322000]Accuracy-Highest: 0.97514 Training: 2022-04-12 00:37:12,947-[agedb_30][322000]XNorm: 6.374946 Training: 2022-04-12 00:37:12,947-[agedb_30][322000]Accuracy-Flip: 0.97150+-0.00783 Training: 2022-04-12 00:37:12,947-[agedb_30][322000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:37:14,056-Speed 145.61 samples/sec Loss 3.3088 LearningRate 0.0001 Epoch: 19 Global Step: 322010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:15,162-Speed 9264.82 samples/sec Loss 3.3498 LearningRate 0.0001 Epoch: 19 Global Step: 322020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:16,301-Speed 8995.57 samples/sec Loss 3.3168 LearningRate 0.0001 Epoch: 19 Global Step: 322030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:17,448-Speed 8932.26 samples/sec Loss 3.2901 LearningRate 0.0001 Epoch: 19 Global Step: 322040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:18,564-Speed 9180.66 samples/sec Loss 3.2890 LearningRate 0.0001 Epoch: 19 Global Step: 322050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:19,672-Speed 9252.48 samples/sec Loss 3.3044 LearningRate 0.0001 Epoch: 19 Global Step: 322060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:20,795-Speed 9117.51 samples/sec Loss 3.2958 LearningRate 0.0001 Epoch: 19 Global Step: 322070 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:37:21,871-Speed 9526.44 samples/sec Loss 3.3874 LearningRate 0.0001 Epoch: 19 Global Step: 322080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:22,970-Speed 9319.93 samples/sec Loss 3.3205 LearningRate 0.0001 Epoch: 19 Global Step: 322090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:24,100-Speed 9067.20 samples/sec Loss 3.2691 LearningRate 0.0001 Epoch: 19 Global Step: 322100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:25,193-Speed 9375.58 samples/sec Loss 3.2962 LearningRate 0.0001 Epoch: 19 Global Step: 322110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:26,291-Speed 9377.65 samples/sec Loss 3.2625 LearningRate 0.0001 Epoch: 19 Global Step: 322120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:27,380-Speed 9404.18 samples/sec Loss 3.2585 LearningRate 0.0001 Epoch: 19 Global Step: 322130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:28,560-Speed 8681.54 samples/sec Loss 3.3603 LearningRate 0.0001 Epoch: 19 Global Step: 322140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:29,702-Speed 8977.40 samples/sec Loss 3.2468 LearningRate 0.0001 Epoch: 19 Global Step: 322150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:30,825-Speed 9119.88 samples/sec Loss 3.2416 LearningRate 0.0001 Epoch: 19 Global Step: 322160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:31,948-Speed 9118.57 samples/sec Loss 3.2643 LearningRate 0.0001 Epoch: 19 Global Step: 322170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:33,061-Speed 9217.17 samples/sec Loss 3.3292 LearningRate 0.0001 Epoch: 19 Global Step: 322180 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:37:34,147-Speed 9432.71 samples/sec Loss 3.2828 LearningRate 0.0001 Epoch: 19 Global Step: 322190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:35,270-Speed 9118.94 samples/sec Loss 3.3006 LearningRate 0.0001 Epoch: 19 Global Step: 322200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:36,373-Speed 9293.85 samples/sec Loss 3.2689 LearningRate 0.0001 Epoch: 19 Global Step: 322210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:37,447-Speed 9532.84 samples/sec Loss 3.3104 LearningRate 0.0001 Epoch: 19 Global Step: 322220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:38,621-Speed 8733.59 samples/sec Loss 3.3083 LearningRate 0.0001 Epoch: 19 Global Step: 322230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:37:39,796-Speed 8717.98 samples/sec Loss 3.3431 LearningRate 0.0001 Epoch: 19 Global Step: 322240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:37:40,876-Speed 9484.62 samples/sec Loss 3.2441 LearningRate 0.0001 Epoch: 19 Global Step: 322250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:37:42,016-Speed 8987.80 samples/sec Loss 3.3739 LearningRate 0.0001 Epoch: 19 Global Step: 322260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:37:43,183-Speed 8785.87 samples/sec Loss 3.3148 LearningRate 0.0001 Epoch: 19 Global Step: 322270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:37:44,275-Speed 9382.99 samples/sec Loss 3.3038 LearningRate 0.0001 Epoch: 19 Global Step: 322280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:37:45,372-Speed 9337.36 samples/sec Loss 3.2580 LearningRate 0.0001 Epoch: 19 Global Step: 322290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:37:46,446-Speed 9541.52 samples/sec Loss 3.2782 LearningRate 0.0001 Epoch: 19 Global Step: 322300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:37:47,553-Speed 9260.09 samples/sec Loss 3.3089 LearningRate 0.0001 Epoch: 19 Global Step: 322310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:37:48,652-Speed 9319.38 samples/sec Loss 3.3193 LearningRate 0.0001 Epoch: 19 Global Step: 322320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:37:49,726-Speed 9542.66 samples/sec Loss 3.3051 LearningRate 0.0001 Epoch: 19 Global Step: 322330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:50,825-Speed 9323.77 samples/sec Loss 3.2515 LearningRate 0.0001 Epoch: 19 Global Step: 322340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:51,938-Speed 9204.97 samples/sec Loss 3.2403 LearningRate 0.0001 Epoch: 19 Global Step: 322350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:53,058-Speed 9146.79 samples/sec Loss 3.1940 LearningRate 0.0001 Epoch: 19 Global Step: 322360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:54,225-Speed 8777.40 samples/sec Loss 3.2729 LearningRate 0.0001 Epoch: 19 Global Step: 322370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:55,351-Speed 9105.84 samples/sec Loss 3.3278 LearningRate 0.0001 Epoch: 19 Global Step: 322380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:56,440-Speed 9410.28 samples/sec Loss 3.2850 LearningRate 0.0001 Epoch: 19 Global Step: 322390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:57,539-Speed 9322.92 samples/sec Loss 3.2197 LearningRate 0.0001 Epoch: 19 Global Step: 322400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:58,657-Speed 9159.28 samples/sec Loss 3.2937 LearningRate 0.0001 Epoch: 19 Global Step: 322410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:37:59,767-Speed 9238.41 samples/sec Loss 3.2613 LearningRate 0.0001 Epoch: 19 Global Step: 322420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:00,858-Speed 9387.91 samples/sec Loss 3.2712 LearningRate 0.0001 Epoch: 19 Global Step: 322430 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:38:01,965-Speed 9260.95 samples/sec Loss 3.3508 LearningRate 0.0001 Epoch: 19 Global Step: 322440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:03,082-Speed 9172.97 samples/sec Loss 3.2871 LearningRate 0.0001 Epoch: 19 Global Step: 322450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:04,213-Speed 9063.56 samples/sec Loss 3.2644 LearningRate 0.0001 Epoch: 19 Global Step: 322460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:05,334-Speed 9136.85 samples/sec Loss 3.2713 LearningRate 0.0001 Epoch: 19 Global Step: 322470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:06,436-Speed 9296.11 samples/sec Loss 3.3064 LearningRate 0.0001 Epoch: 19 Global Step: 322480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:07,564-Speed 9086.45 samples/sec Loss 3.2877 LearningRate 0.0001 Epoch: 19 Global Step: 322490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:08,633-Speed 9583.80 samples/sec Loss 3.3341 LearningRate 0.0001 Epoch: 19 Global Step: 322500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:09,743-Speed 9223.15 samples/sec Loss 3.2687 LearningRate 0.0001 Epoch: 19 Global Step: 322510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:10,866-Speed 9123.47 samples/sec Loss 3.2817 LearningRate 0.0001 Epoch: 19 Global Step: 322520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:11,994-Speed 9082.89 samples/sec Loss 3.3254 LearningRate 0.0001 Epoch: 19 Global Step: 322530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:13,069-Speed 9533.63 samples/sec Loss 3.2454 LearningRate 0.0001 Epoch: 19 Global Step: 322540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:14,240-Speed 8752.38 samples/sec Loss 3.3008 LearningRate 0.0001 Epoch: 19 Global Step: 322550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:15,389-Speed 8918.80 samples/sec Loss 3.2223 LearningRate 0.0001 Epoch: 19 Global Step: 322560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:16,513-Speed 9115.74 samples/sec Loss 3.2258 LearningRate 0.0001 Epoch: 19 Global Step: 322570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:17,611-Speed 9331.84 samples/sec Loss 3.2718 LearningRate 0.0001 Epoch: 19 Global Step: 322580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:18,703-Speed 9382.17 samples/sec Loss 3.3592 LearningRate 0.0001 Epoch: 19 Global Step: 322590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:19,808-Speed 9276.16 samples/sec Loss 3.3263 LearningRate 0.0001 Epoch: 19 Global Step: 322600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:20,890-Speed 9469.27 samples/sec Loss 3.3512 LearningRate 0.0001 Epoch: 19 Global Step: 322610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:21,992-Speed 9295.24 samples/sec Loss 3.3564 LearningRate 0.0001 Epoch: 19 Global Step: 322620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:23,085-Speed 9376.02 samples/sec Loss 3.3507 LearningRate 0.0001 Epoch: 19 Global Step: 322630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:24,170-Speed 9447.09 samples/sec Loss 3.2478 LearningRate 0.0001 Epoch: 19 Global Step: 322640 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:38:25,245-Speed 9531.78 samples/sec Loss 3.2460 LearningRate 0.0001 Epoch: 19 Global Step: 322650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:26,347-Speed 9297.70 samples/sec Loss 3.2836 LearningRate 0.0001 Epoch: 19 Global Step: 322660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:27,479-Speed 9049.50 samples/sec Loss 3.2502 LearningRate 0.0001 Epoch: 19 Global Step: 322670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:28,604-Speed 9105.57 samples/sec Loss 3.2823 LearningRate 0.0001 Epoch: 19 Global Step: 322680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:29,751-Speed 8935.99 samples/sec Loss 3.2164 LearningRate 0.0001 Epoch: 19 Global Step: 322690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:30,888-Speed 9013.43 samples/sec Loss 3.2961 LearningRate 0.0001 Epoch: 19 Global Step: 322700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:32,014-Speed 9100.25 samples/sec Loss 3.2764 LearningRate 0.0001 Epoch: 19 Global Step: 322710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:33,164-Speed 8912.77 samples/sec Loss 3.2993 LearningRate 0.0001 Epoch: 19 Global Step: 322720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:34,348-Speed 8654.45 samples/sec Loss 3.3004 LearningRate 0.0001 Epoch: 19 Global Step: 322730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:35,494-Speed 8941.71 samples/sec Loss 3.3582 LearningRate 0.0001 Epoch: 19 Global Step: 322740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:36,619-Speed 9105.67 samples/sec Loss 3.2950 LearningRate 0.0001 Epoch: 19 Global Step: 322750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:37,756-Speed 9013.11 samples/sec Loss 3.3308 LearningRate 0.0001 Epoch: 19 Global Step: 322760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:38,899-Speed 8964.04 samples/sec Loss 3.3376 LearningRate 0.0001 Epoch: 19 Global Step: 322770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:40,024-Speed 9111.34 samples/sec Loss 3.2495 LearningRate 0.0001 Epoch: 19 Global Step: 322780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:41,113-Speed 9406.54 samples/sec Loss 3.3258 LearningRate 0.0001 Epoch: 19 Global Step: 322790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:42,259-Speed 8937.79 samples/sec Loss 3.2226 LearningRate 0.0001 Epoch: 19 Global Step: 322800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:43,380-Speed 9139.66 samples/sec Loss 3.2727 LearningRate 0.0001 Epoch: 19 Global Step: 322810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:44,490-Speed 9233.82 samples/sec Loss 3.3897 LearningRate 0.0001 Epoch: 19 Global Step: 322820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:45,637-Speed 8936.53 samples/sec Loss 3.3401 LearningRate 0.0001 Epoch: 19 Global Step: 322830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:46,763-Speed 9098.03 samples/sec Loss 3.3494 LearningRate 0.0001 Epoch: 19 Global Step: 322840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:47,896-Speed 9039.12 samples/sec Loss 3.3083 LearningRate 0.0001 Epoch: 19 Global Step: 322850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:38:49,025-Speed 9077.55 samples/sec Loss 3.2341 LearningRate 0.0001 Epoch: 19 Global Step: 322860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:50,170-Speed 8953.84 samples/sec Loss 3.3733 LearningRate 0.0001 Epoch: 19 Global Step: 322870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:51,363-Speed 8587.82 samples/sec Loss 3.2757 LearningRate 0.0001 Epoch: 19 Global Step: 322880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:52,453-Speed 9395.98 samples/sec Loss 3.3076 LearningRate 0.0001 Epoch: 19 Global Step: 322890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:53,568-Speed 9191.02 samples/sec Loss 3.2984 LearningRate 0.0001 Epoch: 19 Global Step: 322900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:54,701-Speed 9039.24 samples/sec Loss 3.2536 LearningRate 0.0001 Epoch: 19 Global Step: 322910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:55,815-Speed 9205.17 samples/sec Loss 3.2544 LearningRate 0.0001 Epoch: 19 Global Step: 322920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:56,951-Speed 9025.57 samples/sec Loss 3.2776 LearningRate 0.0001 Epoch: 19 Global Step: 322930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:58,102-Speed 8901.88 samples/sec Loss 3.3080 LearningRate 0.0001 Epoch: 19 Global Step: 322940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:38:59,246-Speed 8954.61 samples/sec Loss 3.2566 LearningRate 0.0001 Epoch: 19 Global Step: 322950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:00,388-Speed 8977.50 samples/sec Loss 3.3219 LearningRate 0.0001 Epoch: 19 Global Step: 322960 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:39:01,499-Speed 9216.12 samples/sec Loss 3.3083 LearningRate 0.0001 Epoch: 19 Global Step: 322970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:02,650-Speed 8902.90 samples/sec Loss 3.3182 LearningRate 0.0001 Epoch: 19 Global Step: 322980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:03,780-Speed 9071.90 samples/sec Loss 3.2753 LearningRate 0.0001 Epoch: 19 Global Step: 322990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:04,917-Speed 9008.38 samples/sec Loss 3.3985 LearningRate 0.0001 Epoch: 19 Global Step: 323000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:06,040-Speed 9124.28 samples/sec Loss 3.2556 LearningRate 0.0001 Epoch: 19 Global Step: 323010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:07,176-Speed 9023.03 samples/sec Loss 3.2370 LearningRate 0.0001 Epoch: 19 Global Step: 323020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:08,297-Speed 9133.27 samples/sec Loss 3.2030 LearningRate 0.0001 Epoch: 19 Global Step: 323030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:09,465-Speed 8777.86 samples/sec Loss 3.2931 LearningRate 0.0001 Epoch: 19 Global Step: 323040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:10,588-Speed 9123.03 samples/sec Loss 3.3000 LearningRate 0.0001 Epoch: 19 Global Step: 323050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:11,703-Speed 9184.56 samples/sec Loss 3.2391 LearningRate 0.0001 Epoch: 19 Global Step: 323060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:12,841-Speed 9004.27 samples/sec Loss 3.1880 LearningRate 0.0001 Epoch: 19 Global Step: 323070 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:39:13,948-Speed 9257.53 samples/sec Loss 3.3273 LearningRate 0.0001 Epoch: 19 Global Step: 323080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:15,048-Speed 9320.57 samples/sec Loss 3.3016 LearningRate 0.0001 Epoch: 19 Global Step: 323090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:16,115-Speed 9598.55 samples/sec Loss 3.2375 LearningRate 0.0001 Epoch: 19 Global Step: 323100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:17,221-Speed 9268.06 samples/sec Loss 3.3375 LearningRate 0.0001 Epoch: 19 Global Step: 323110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:18,300-Speed 9493.43 samples/sec Loss 3.2770 LearningRate 0.0001 Epoch: 19 Global Step: 323120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:19,468-Speed 8769.75 samples/sec Loss 3.3205 LearningRate 0.0001 Epoch: 19 Global Step: 323130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:20,623-Speed 8873.20 samples/sec Loss 3.2254 LearningRate 0.0001 Epoch: 19 Global Step: 323140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:21,732-Speed 9236.40 samples/sec Loss 3.3172 LearningRate 0.0001 Epoch: 19 Global Step: 323150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:22,863-Speed 9058.35 samples/sec Loss 3.3429 LearningRate 0.0001 Epoch: 19 Global Step: 323160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:24,033-Speed 8762.35 samples/sec Loss 3.3191 LearningRate 0.0001 Epoch: 19 Global Step: 323170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:25,109-Speed 9520.77 samples/sec Loss 3.2763 LearningRate 0.0001 Epoch: 19 Global Step: 323180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:26,254-Speed 8947.09 samples/sec Loss 3.3762 LearningRate 0.0001 Epoch: 19 Global Step: 323190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:27,400-Speed 8943.36 samples/sec Loss 3.2274 LearningRate 0.0001 Epoch: 19 Global Step: 323200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:28,531-Speed 9060.52 samples/sec Loss 3.2605 LearningRate 0.0001 Epoch: 19 Global Step: 323210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:29,682-Speed 8894.92 samples/sec Loss 3.2897 LearningRate 0.0001 Epoch: 19 Global Step: 323220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:30,795-Speed 9207.43 samples/sec Loss 3.2695 LearningRate 0.0001 Epoch: 19 Global Step: 323230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:31,931-Speed 9020.79 samples/sec Loss 3.3754 LearningRate 0.0001 Epoch: 19 Global Step: 323240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:33,006-Speed 9529.63 samples/sec Loss 3.2612 LearningRate 0.0001 Epoch: 19 Global Step: 323250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:34,123-Speed 9175.04 samples/sec Loss 3.3196 LearningRate 0.0001 Epoch: 19 Global Step: 323260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:35,241-Speed 9167.16 samples/sec Loss 3.3419 LearningRate 0.0001 Epoch: 19 Global Step: 323270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:36,369-Speed 9083.38 samples/sec Loss 3.2263 LearningRate 0.0001 Epoch: 19 Global Step: 323280 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:39:37,484-Speed 9185.52 samples/sec Loss 3.3299 LearningRate 0.0001 Epoch: 19 Global Step: 323290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:38,593-Speed 9242.13 samples/sec Loss 3.2694 LearningRate 0.0001 Epoch: 19 Global Step: 323300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:39,737-Speed 8952.73 samples/sec Loss 3.3088 LearningRate 0.0001 Epoch: 19 Global Step: 323310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:40,859-Speed 9136.95 samples/sec Loss 3.3146 LearningRate 0.0001 Epoch: 19 Global Step: 323320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:41,997-Speed 9000.92 samples/sec Loss 3.2748 LearningRate 0.0001 Epoch: 19 Global Step: 323330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:43,142-Speed 8948.05 samples/sec Loss 3.2579 LearningRate 0.0001 Epoch: 19 Global Step: 323340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:44,321-Speed 8689.49 samples/sec Loss 3.2847 LearningRate 0.0001 Epoch: 19 Global Step: 323350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:45,402-Speed 9482.64 samples/sec Loss 3.2723 LearningRate 0.0001 Epoch: 19 Global Step: 323360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:46,534-Speed 9049.45 samples/sec Loss 3.2892 LearningRate 0.0001 Epoch: 19 Global Step: 323370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:47,660-Speed 9102.04 samples/sec Loss 3.2944 LearningRate 0.0001 Epoch: 19 Global Step: 323380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:48,809-Speed 8920.59 samples/sec Loss 3.2737 LearningRate 0.0001 Epoch: 19 Global Step: 323390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:49,959-Speed 8908.96 samples/sec Loss 3.2614 LearningRate 0.0001 Epoch: 19 Global Step: 323400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:51,095-Speed 9017.99 samples/sec Loss 3.2783 LearningRate 0.0001 Epoch: 19 Global Step: 323410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:52,222-Speed 9087.27 samples/sec Loss 3.3263 LearningRate 0.0001 Epoch: 19 Global Step: 323420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:53,367-Speed 8953.89 samples/sec Loss 3.2363 LearningRate 0.0001 Epoch: 19 Global Step: 323430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:54,487-Speed 9145.68 samples/sec Loss 3.2769 LearningRate 0.0001 Epoch: 19 Global Step: 323440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:55,583-Speed 9355.14 samples/sec Loss 3.2200 LearningRate 0.0001 Epoch: 19 Global Step: 323450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:56,706-Speed 9120.82 samples/sec Loss 3.2876 LearningRate 0.0001 Epoch: 19 Global Step: 323460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:57,806-Speed 9308.45 samples/sec Loss 3.3051 LearningRate 0.0001 Epoch: 19 Global Step: 323470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:39:58,931-Speed 9112.01 samples/sec Loss 3.2631 LearningRate 0.0001 Epoch: 19 Global Step: 323480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:00,007-Speed 9518.41 samples/sec Loss 3.3065 LearningRate 0.0001 Epoch: 19 Global Step: 323490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:40:01,136-Speed 9075.41 samples/sec Loss 3.3229 LearningRate 0.0001 Epoch: 19 Global Step: 323500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:02,266-Speed 9069.87 samples/sec Loss 3.3119 LearningRate 0.0001 Epoch: 19 Global Step: 323510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:03,399-Speed 9046.24 samples/sec Loss 3.2431 LearningRate 0.0001 Epoch: 19 Global Step: 323520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:04,496-Speed 9339.71 samples/sec Loss 3.2991 LearningRate 0.0001 Epoch: 19 Global Step: 323530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:05,656-Speed 8829.81 samples/sec Loss 3.2884 LearningRate 0.0001 Epoch: 19 Global Step: 323540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:06,749-Speed 9372.91 samples/sec Loss 3.2963 LearningRate 0.0001 Epoch: 19 Global Step: 323550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:07,885-Speed 9018.22 samples/sec Loss 3.2396 LearningRate 0.0001 Epoch: 19 Global Step: 323560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:09,009-Speed 9120.12 samples/sec Loss 3.3066 LearningRate 0.0001 Epoch: 19 Global Step: 323570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:10,154-Speed 8947.05 samples/sec Loss 3.3182 LearningRate 0.0001 Epoch: 19 Global Step: 323580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:11,307-Speed 8885.37 samples/sec Loss 3.2386 LearningRate 0.0001 Epoch: 19 Global Step: 323590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:12,425-Speed 9168.31 samples/sec Loss 3.2685 LearningRate 0.0001 Epoch: 19 Global Step: 323600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:13,520-Speed 9353.34 samples/sec Loss 3.2808 LearningRate 0.0001 Epoch: 19 Global Step: 323610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:14,652-Speed 9066.86 samples/sec Loss 3.3221 LearningRate 0.0001 Epoch: 19 Global Step: 323620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:15,811-Speed 8834.75 samples/sec Loss 3.3542 LearningRate 0.0001 Epoch: 19 Global Step: 323630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:16,943-Speed 9053.10 samples/sec Loss 3.3109 LearningRate 0.0001 Epoch: 19 Global Step: 323640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:18,055-Speed 9212.05 samples/sec Loss 3.3699 LearningRate 0.0001 Epoch: 19 Global Step: 323650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:19,159-Speed 9280.90 samples/sec Loss 3.2407 LearningRate 0.0001 Epoch: 19 Global Step: 323660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:20,243-Speed 9455.52 samples/sec Loss 3.3151 LearningRate 0.0001 Epoch: 19 Global Step: 323670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:21,323-Speed 9488.30 samples/sec Loss 3.3191 LearningRate 0.0001 Epoch: 19 Global Step: 323680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:22,442-Speed 9151.97 samples/sec Loss 3.2816 LearningRate 0.0001 Epoch: 19 Global Step: 323690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:23,577-Speed 9029.33 samples/sec Loss 3.3001 LearningRate 0.0001 Epoch: 19 Global Step: 323700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:24,647-Speed 9578.92 samples/sec Loss 3.2657 LearningRate 0.0001 Epoch: 19 Global Step: 323710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:25,785-Speed 9008.70 samples/sec Loss 3.3152 LearningRate 0.0001 Epoch: 19 Global Step: 323720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:26,911-Speed 9098.88 samples/sec Loss 3.2771 LearningRate 0.0001 Epoch: 19 Global Step: 323730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:28,054-Speed 8961.22 samples/sec Loss 3.3798 LearningRate 0.0001 Epoch: 19 Global Step: 323740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:29,164-Speed 9226.12 samples/sec Loss 3.2792 LearningRate 0.0001 Epoch: 19 Global Step: 323750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:30,300-Speed 9027.89 samples/sec Loss 3.2598 LearningRate 0.0001 Epoch: 19 Global Step: 323760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:31,413-Speed 9204.31 samples/sec Loss 3.2004 LearningRate 0.0001 Epoch: 19 Global Step: 323770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:32,548-Speed 9031.01 samples/sec Loss 3.2795 LearningRate 0.0001 Epoch: 19 Global Step: 323780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:33,689-Speed 8979.58 samples/sec Loss 3.2156 LearningRate 0.0001 Epoch: 19 Global Step: 323790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:34,808-Speed 9150.76 samples/sec Loss 3.3054 LearningRate 0.0001 Epoch: 19 Global Step: 323800 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:40:35,921-Speed 9203.84 samples/sec Loss 3.2863 LearningRate 0.0001 Epoch: 19 Global Step: 323810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:37,019-Speed 9337.76 samples/sec Loss 3.2517 LearningRate 0.0001 Epoch: 19 Global Step: 323820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:38,122-Speed 9285.80 samples/sec Loss 3.3516 LearningRate 0.0001 Epoch: 19 Global Step: 323830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:39,255-Speed 9045.94 samples/sec Loss 3.2803 LearningRate 0.0001 Epoch: 19 Global Step: 323840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:40,364-Speed 9236.78 samples/sec Loss 3.3275 LearningRate 0.0001 Epoch: 19 Global Step: 323850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:41,472-Speed 9246.64 samples/sec Loss 3.2818 LearningRate 0.0001 Epoch: 19 Global Step: 323860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:42,575-Speed 9287.34 samples/sec Loss 3.3227 LearningRate 0.0001 Epoch: 19 Global Step: 323870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:43,685-Speed 9227.83 samples/sec Loss 3.3007 LearningRate 0.0001 Epoch: 19 Global Step: 323880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:44,775-Speed 9410.86 samples/sec Loss 3.3180 LearningRate 0.0001 Epoch: 19 Global Step: 323890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:45,896-Speed 9136.26 samples/sec Loss 3.2101 LearningRate 0.0001 Epoch: 19 Global Step: 323900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:46,965-Speed 9582.20 samples/sec Loss 3.3143 LearningRate 0.0001 Epoch: 19 Global Step: 323910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:48,105-Speed 8992.95 samples/sec Loss 3.3258 LearningRate 0.0001 Epoch: 19 Global Step: 323920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:49,229-Speed 9113.49 samples/sec Loss 3.2313 LearningRate 0.0001 Epoch: 19 Global Step: 323930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:50,364-Speed 9026.03 samples/sec Loss 3.3142 LearningRate 0.0001 Epoch: 19 Global Step: 323940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:51,470-Speed 9268.19 samples/sec Loss 3.2604 LearningRate 0.0001 Epoch: 19 Global Step: 323950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:52,583-Speed 9205.82 samples/sec Loss 3.3480 LearningRate 0.0001 Epoch: 19 Global Step: 323960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:53,710-Speed 9091.36 samples/sec Loss 3.3315 LearningRate 0.0001 Epoch: 19 Global Step: 323970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:54,792-Speed 9464.72 samples/sec Loss 3.1542 LearningRate 0.0001 Epoch: 19 Global Step: 323980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:55,886-Speed 9371.81 samples/sec Loss 3.3075 LearningRate 0.0001 Epoch: 19 Global Step: 323990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:40:56,980-Speed 9363.93 samples/sec Loss 3.2660 LearningRate 0.0001 Epoch: 19 Global Step: 324000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:41:19,091-[lfw][324000]XNorm: 6.549769 Training: 2022-04-12 00:41:19,092-[lfw][324000]Accuracy-Flip: 0.99700+-0.00287 Training: 2022-04-12 00:41:19,092-[lfw][324000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:41:44,565-[cfp_fp][324000]XNorm: 5.724918 Training: 2022-04-12 00:41:44,566-[cfp_fp][324000]Accuracy-Flip: 0.97543+-0.00823 Training: 2022-04-12 00:41:44,566-[cfp_fp][324000]Accuracy-Highest: 0.97543 Training: 2022-04-12 00:42:06,544-[agedb_30][324000]XNorm: 6.387645 Training: 2022-04-12 00:42:06,545-[agedb_30][324000]Accuracy-Flip: 0.97383+-0.00827 Training: 2022-04-12 00:42:06,545-[agedb_30][324000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:42:07,639-Speed 144.92 samples/sec Loss 3.3260 LearningRate 0.0001 Epoch: 19 Global Step: 324010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:08,739-Speed 9318.14 samples/sec Loss 3.2844 LearningRate 0.0001 Epoch: 19 Global Step: 324020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:09,890-Speed 8899.50 samples/sec Loss 3.3204 LearningRate 0.0001 Epoch: 19 Global Step: 324030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:11,035-Speed 8950.57 samples/sec Loss 3.3096 LearningRate 0.0001 Epoch: 19 Global Step: 324040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:12,124-Speed 9403.75 samples/sec Loss 3.2987 LearningRate 0.0001 Epoch: 19 Global Step: 324050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:13,260-Speed 9021.60 samples/sec Loss 3.3154 LearningRate 0.0001 Epoch: 19 Global Step: 324060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:14,449-Speed 8617.30 samples/sec Loss 3.2771 LearningRate 0.0001 Epoch: 19 Global Step: 324070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:15,538-Speed 9408.87 samples/sec Loss 3.3463 LearningRate 0.0001 Epoch: 19 Global Step: 324080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:16,692-Speed 8878.86 samples/sec Loss 3.2888 LearningRate 0.0001 Epoch: 19 Global Step: 324090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:17,831-Speed 8998.08 samples/sec Loss 3.3399 LearningRate 0.0001 Epoch: 19 Global Step: 324100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:18,964-Speed 9039.69 samples/sec Loss 3.2761 LearningRate 0.0001 Epoch: 19 Global Step: 324110 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:42:20,072-Speed 9251.21 samples/sec Loss 3.3122 LearningRate 0.0001 Epoch: 19 Global Step: 324120 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:42:21,199-Speed 9087.59 samples/sec Loss 3.2988 LearningRate 0.0001 Epoch: 19 Global Step: 324130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:22,296-Speed 9338.36 samples/sec Loss 3.2481 LearningRate 0.0001 Epoch: 19 Global Step: 324140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:23,442-Speed 8948.29 samples/sec Loss 3.3271 LearningRate 0.0001 Epoch: 19 Global Step: 324150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:24,595-Speed 8889.20 samples/sec Loss 3.3862 LearningRate 0.0001 Epoch: 19 Global Step: 324160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:25,700-Speed 9272.35 samples/sec Loss 3.2853 LearningRate 0.0001 Epoch: 19 Global Step: 324170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:26,793-Speed 9379.10 samples/sec Loss 3.3132 LearningRate 0.0001 Epoch: 19 Global Step: 324180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:27,896-Speed 9285.40 samples/sec Loss 3.2689 LearningRate 0.0001 Epoch: 19 Global Step: 324190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:28,998-Speed 9294.30 samples/sec Loss 3.3350 LearningRate 0.0001 Epoch: 19 Global Step: 324200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:30,072-Speed 9541.52 samples/sec Loss 3.2631 LearningRate 0.0001 Epoch: 19 Global Step: 324210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:31,180-Speed 9254.70 samples/sec Loss 3.3726 LearningRate 0.0001 Epoch: 19 Global Step: 324220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:32,292-Speed 9211.21 samples/sec Loss 3.2960 LearningRate 0.0001 Epoch: 19 Global Step: 324230 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:42:33,393-Speed 9309.15 samples/sec Loss 3.2590 LearningRate 0.0001 Epoch: 19 Global Step: 324240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:34,463-Speed 9572.94 samples/sec Loss 3.2995 LearningRate 0.0001 Epoch: 19 Global Step: 324250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:35,566-Speed 9285.90 samples/sec Loss 3.2970 LearningRate 0.0001 Epoch: 19 Global Step: 324260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:36,697-Speed 9063.08 samples/sec Loss 3.3211 LearningRate 0.0001 Epoch: 19 Global Step: 324270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:37,860-Speed 8814.52 samples/sec Loss 3.3223 LearningRate 0.0001 Epoch: 19 Global Step: 324280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:38,991-Speed 9054.62 samples/sec Loss 3.2754 LearningRate 0.0001 Epoch: 19 Global Step: 324290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:40,142-Speed 8900.61 samples/sec Loss 3.2611 LearningRate 0.0001 Epoch: 19 Global Step: 324300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:41,297-Speed 8873.33 samples/sec Loss 3.3245 LearningRate 0.0001 Epoch: 19 Global Step: 324310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:42,403-Speed 9265.83 samples/sec Loss 3.2835 LearningRate 0.0001 Epoch: 19 Global Step: 324320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:43,526-Speed 9122.05 samples/sec Loss 3.4040 LearningRate 0.0001 Epoch: 19 Global Step: 324330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:44,646-Speed 9148.13 samples/sec Loss 3.2292 LearningRate 0.0001 Epoch: 19 Global Step: 324340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:45,737-Speed 9395.14 samples/sec Loss 3.3474 LearningRate 0.0001 Epoch: 19 Global Step: 324350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:46,839-Speed 9297.66 samples/sec Loss 3.2206 LearningRate 0.0001 Epoch: 19 Global Step: 324360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:42:47,963-Speed 9112.86 samples/sec Loss 3.3201 LearningRate 0.0001 Epoch: 19 Global Step: 324370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:42:49,027-Speed 9628.82 samples/sec Loss 3.3366 LearningRate 0.0001 Epoch: 19 Global Step: 324380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:42:50,140-Speed 9209.49 samples/sec Loss 3.3273 LearningRate 0.0001 Epoch: 19 Global Step: 324390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:42:51,264-Speed 9116.80 samples/sec Loss 3.2057 LearningRate 0.0001 Epoch: 19 Global Step: 324400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:42:52,394-Speed 9062.70 samples/sec Loss 3.2384 LearningRate 0.0001 Epoch: 19 Global Step: 324410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:42:53,541-Speed 8936.82 samples/sec Loss 3.2283 LearningRate 0.0001 Epoch: 19 Global Step: 324420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:42:54,661-Speed 9145.62 samples/sec Loss 3.3702 LearningRate 0.0001 Epoch: 19 Global Step: 324430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:42:55,783-Speed 9130.39 samples/sec Loss 3.2481 LearningRate 0.0001 Epoch: 19 Global Step: 324440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:42:56,930-Speed 8937.74 samples/sec Loss 3.2504 LearningRate 0.0001 Epoch: 19 Global Step: 324450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:42:58,080-Speed 8909.63 samples/sec Loss 3.4196 LearningRate 0.0001 Epoch: 19 Global Step: 324460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:42:59,209-Speed 9072.35 samples/sec Loss 3.2646 LearningRate 0.0001 Epoch: 19 Global Step: 324470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:00,311-Speed 9295.99 samples/sec Loss 3.3151 LearningRate 0.0001 Epoch: 19 Global Step: 324480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:01,409-Speed 9339.27 samples/sec Loss 3.3683 LearningRate 0.0001 Epoch: 19 Global Step: 324490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:02,513-Speed 9286.06 samples/sec Loss 3.2714 LearningRate 0.0001 Epoch: 19 Global Step: 324500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:03,618-Speed 9271.72 samples/sec Loss 3.2404 LearningRate 0.0001 Epoch: 19 Global Step: 324510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:04,762-Speed 8955.00 samples/sec Loss 3.3727 LearningRate 0.0001 Epoch: 19 Global Step: 324520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:05,872-Speed 9225.70 samples/sec Loss 3.2122 LearningRate 0.0001 Epoch: 19 Global Step: 324530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:06,979-Speed 9256.40 samples/sec Loss 3.3018 LearningRate 0.0001 Epoch: 19 Global Step: 324540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:08,138-Speed 8840.16 samples/sec Loss 3.2332 LearningRate 0.0001 Epoch: 19 Global Step: 324550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:09,262-Speed 9115.57 samples/sec Loss 3.2508 LearningRate 0.0001 Epoch: 19 Global Step: 324560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:10,371-Speed 9242.18 samples/sec Loss 3.2952 LearningRate 0.0001 Epoch: 19 Global Step: 324570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:11,435-Speed 9626.41 samples/sec Loss 3.3079 LearningRate 0.0001 Epoch: 19 Global Step: 324580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:12,522-Speed 9427.57 samples/sec Loss 3.2657 LearningRate 0.0001 Epoch: 19 Global Step: 324590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:13,597-Speed 9525.46 samples/sec Loss 3.3018 LearningRate 0.0001 Epoch: 19 Global Step: 324600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:14,726-Speed 9075.23 samples/sec Loss 3.4165 LearningRate 0.0001 Epoch: 19 Global Step: 324610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:15,881-Speed 8878.79 samples/sec Loss 3.4163 LearningRate 0.0001 Epoch: 19 Global Step: 324620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:16,971-Speed 9393.99 samples/sec Loss 3.2949 LearningRate 0.0001 Epoch: 19 Global Step: 324630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:18,104-Speed 9045.59 samples/sec Loss 3.2564 LearningRate 0.0001 Epoch: 19 Global Step: 324640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:19,187-Speed 9458.27 samples/sec Loss 3.2261 LearningRate 0.0001 Epoch: 19 Global Step: 324650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:20,302-Speed 9196.06 samples/sec Loss 3.3138 LearningRate 0.0001 Epoch: 19 Global Step: 324660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:21,420-Speed 9168.69 samples/sec Loss 3.2774 LearningRate 0.0001 Epoch: 19 Global Step: 324670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:22,537-Speed 9174.83 samples/sec Loss 3.3036 LearningRate 0.0001 Epoch: 19 Global Step: 324680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:23,610-Speed 9542.47 samples/sec Loss 3.3418 LearningRate 0.0001 Epoch: 19 Global Step: 324690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:24,748-Speed 9004.54 samples/sec Loss 3.3089 LearningRate 0.0001 Epoch: 19 Global Step: 324700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:25,845-Speed 9343.24 samples/sec Loss 3.2599 LearningRate 0.0001 Epoch: 19 Global Step: 324710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:26,952-Speed 9253.26 samples/sec Loss 3.3375 LearningRate 0.0001 Epoch: 19 Global Step: 324720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:28,137-Speed 8646.00 samples/sec Loss 3.2895 LearningRate 0.0001 Epoch: 19 Global Step: 324730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:29,279-Speed 8970.01 samples/sec Loss 3.2916 LearningRate 0.0001 Epoch: 19 Global Step: 324740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:30,378-Speed 9324.70 samples/sec Loss 3.3360 LearningRate 0.0001 Epoch: 19 Global Step: 324750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:31,500-Speed 9137.29 samples/sec Loss 3.3169 LearningRate 0.0001 Epoch: 19 Global Step: 324760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:32,645-Speed 8954.02 samples/sec Loss 3.3042 LearningRate 0.0001 Epoch: 19 Global Step: 324770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:33,761-Speed 9176.18 samples/sec Loss 3.2066 LearningRate 0.0001 Epoch: 19 Global Step: 324780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:34,873-Speed 9220.01 samples/sec Loss 3.2321 LearningRate 0.0001 Epoch: 19 Global Step: 324790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:36,013-Speed 8986.61 samples/sec Loss 3.3691 LearningRate 0.0001 Epoch: 19 Global Step: 324800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:37,124-Speed 9221.36 samples/sec Loss 3.3168 LearningRate 0.0001 Epoch: 19 Global Step: 324810 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:43:38,225-Speed 9312.55 samples/sec Loss 3.2723 LearningRate 0.0001 Epoch: 19 Global Step: 324820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:39,321-Speed 9343.60 samples/sec Loss 3.2173 LearningRate 0.0001 Epoch: 19 Global Step: 324830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:40,404-Speed 9459.68 samples/sec Loss 3.3357 LearningRate 0.0001 Epoch: 19 Global Step: 324840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:41,496-Speed 9384.70 samples/sec Loss 3.3326 LearningRate 0.0001 Epoch: 19 Global Step: 324850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:42,635-Speed 8993.19 samples/sec Loss 3.2834 LearningRate 0.0001 Epoch: 19 Global Step: 324860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:43,730-Speed 9359.33 samples/sec Loss 3.2778 LearningRate 0.0001 Epoch: 19 Global Step: 324870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:44,848-Speed 9166.12 samples/sec Loss 3.3209 LearningRate 0.0001 Epoch: 19 Global Step: 324880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:45,940-Speed 9377.22 samples/sec Loss 3.2272 LearningRate 0.0001 Epoch: 19 Global Step: 324890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:47,073-Speed 9050.50 samples/sec Loss 3.3483 LearningRate 0.0001 Epoch: 19 Global Step: 324900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:48,200-Speed 9084.45 samples/sec Loss 3.3614 LearningRate 0.0001 Epoch: 19 Global Step: 324910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:49,293-Speed 9379.38 samples/sec Loss 3.3065 LearningRate 0.0001 Epoch: 19 Global Step: 324920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:50,415-Speed 9132.21 samples/sec Loss 3.3242 LearningRate 0.0001 Epoch: 19 Global Step: 324930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:51,565-Speed 8909.64 samples/sec Loss 3.2792 LearningRate 0.0001 Epoch: 19 Global Step: 324940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:52,695-Speed 9075.37 samples/sec Loss 3.3032 LearningRate 0.0001 Epoch: 19 Global Step: 324950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:53,794-Speed 9315.81 samples/sec Loss 3.3133 LearningRate 0.0001 Epoch: 19 Global Step: 324960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:54,944-Speed 8916.61 samples/sec Loss 3.3177 LearningRate 0.0001 Epoch: 19 Global Step: 324970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:56,037-Speed 9366.24 samples/sec Loss 3.2848 LearningRate 0.0001 Epoch: 19 Global Step: 324980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:43:57,118-Speed 9482.57 samples/sec Loss 3.3160 LearningRate 0.0001 Epoch: 19 Global Step: 324990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:58,263-Speed 8948.59 samples/sec Loss 3.3187 LearningRate 0.0001 Epoch: 19 Global Step: 325000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:43:59,398-Speed 9026.99 samples/sec Loss 3.1996 LearningRate 0.0001 Epoch: 19 Global Step: 325010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:00,501-Speed 9285.73 samples/sec Loss 3.3523 LearningRate 0.0001 Epoch: 19 Global Step: 325020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:01,569-Speed 9601.43 samples/sec Loss 3.3090 LearningRate 0.0001 Epoch: 19 Global Step: 325030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:02,672-Speed 9281.61 samples/sec Loss 3.2754 LearningRate 0.0001 Epoch: 19 Global Step: 325040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:03,791-Speed 9162.49 samples/sec Loss 3.3418 LearningRate 0.0001 Epoch: 19 Global Step: 325050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:04,921-Speed 9062.01 samples/sec Loss 3.3341 LearningRate 0.0001 Epoch: 19 Global Step: 325060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:06,056-Speed 9033.39 samples/sec Loss 3.3181 LearningRate 0.0001 Epoch: 19 Global Step: 325070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:07,188-Speed 9049.81 samples/sec Loss 3.3979 LearningRate 0.0001 Epoch: 19 Global Step: 325080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:08,309-Speed 9141.98 samples/sec Loss 3.2821 LearningRate 0.0001 Epoch: 19 Global Step: 325090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:09,429-Speed 9144.44 samples/sec Loss 3.2372 LearningRate 0.0001 Epoch: 19 Global Step: 325100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:10,546-Speed 9175.62 samples/sec Loss 3.2500 LearningRate 0.0001 Epoch: 19 Global Step: 325110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:11,661-Speed 9194.22 samples/sec Loss 3.3262 LearningRate 0.0001 Epoch: 19 Global Step: 325120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:12,778-Speed 9171.97 samples/sec Loss 3.3195 LearningRate 0.0001 Epoch: 19 Global Step: 325130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:13,880-Speed 9297.76 samples/sec Loss 3.3119 LearningRate 0.0001 Epoch: 19 Global Step: 325140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:14,989-Speed 9234.76 samples/sec Loss 3.2602 LearningRate 0.0001 Epoch: 19 Global Step: 325150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:16,110-Speed 9143.98 samples/sec Loss 3.2814 LearningRate 0.0001 Epoch: 19 Global Step: 325160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:17,252-Speed 8967.13 samples/sec Loss 3.2937 LearningRate 0.0001 Epoch: 19 Global Step: 325170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:18,395-Speed 8964.86 samples/sec Loss 3.2198 LearningRate 0.0001 Epoch: 19 Global Step: 325180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:19,514-Speed 9154.06 samples/sec Loss 3.3291 LearningRate 0.0001 Epoch: 19 Global Step: 325190 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:44:20,576-Speed 9651.93 samples/sec Loss 3.3247 LearningRate 0.0001 Epoch: 19 Global Step: 325200 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:44:21,693-Speed 9172.37 samples/sec Loss 3.2661 LearningRate 0.0001 Epoch: 19 Global Step: 325210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:22,793-Speed 9317.10 samples/sec Loss 3.3444 LearningRate 0.0001 Epoch: 19 Global Step: 325220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:23,911-Speed 9161.75 samples/sec Loss 3.3288 LearningRate 0.0001 Epoch: 19 Global Step: 325230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:25,075-Speed 8801.32 samples/sec Loss 3.2691 LearningRate 0.0001 Epoch: 19 Global Step: 325240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:26,202-Speed 9091.70 samples/sec Loss 3.3175 LearningRate 0.0001 Epoch: 19 Global Step: 325250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:27,320-Speed 9162.49 samples/sec Loss 3.3212 LearningRate 0.0001 Epoch: 19 Global Step: 325260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:28,410-Speed 9403.68 samples/sec Loss 3.2981 LearningRate 0.0001 Epoch: 19 Global Step: 325270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:29,507-Speed 9337.51 samples/sec Loss 3.3124 LearningRate 0.0001 Epoch: 19 Global Step: 325280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:30,613-Speed 9270.13 samples/sec Loss 3.2639 LearningRate 0.0001 Epoch: 19 Global Step: 325290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:31,751-Speed 9001.94 samples/sec Loss 3.3000 LearningRate 0.0001 Epoch: 19 Global Step: 325300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:32,866-Speed 9193.22 samples/sec Loss 3.3427 LearningRate 0.0001 Epoch: 19 Global Step: 325310 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:44:34,003-Speed 9008.62 samples/sec Loss 3.2074 LearningRate 0.0001 Epoch: 19 Global Step: 325320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:35,126-Speed 9124.92 samples/sec Loss 3.3028 LearningRate 0.0001 Epoch: 19 Global Step: 325330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:36,242-Speed 9177.06 samples/sec Loss 3.3211 LearningRate 0.0001 Epoch: 19 Global Step: 325340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:37,388-Speed 8938.44 samples/sec Loss 3.2736 LearningRate 0.0001 Epoch: 19 Global Step: 325350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:38,546-Speed 8850.31 samples/sec Loss 3.2910 LearningRate 0.0001 Epoch: 19 Global Step: 325360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:39,671-Speed 9107.44 samples/sec Loss 3.3589 LearningRate 0.0001 Epoch: 19 Global Step: 325370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:40,770-Speed 9325.73 samples/sec Loss 3.3174 LearningRate 0.0001 Epoch: 19 Global Step: 325380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:41,854-Speed 9451.72 samples/sec Loss 3.2764 LearningRate 0.0001 Epoch: 19 Global Step: 325390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:42,943-Speed 9406.52 samples/sec Loss 3.3195 LearningRate 0.0001 Epoch: 19 Global Step: 325400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:44,079-Speed 9019.92 samples/sec Loss 3.3037 LearningRate 0.0001 Epoch: 19 Global Step: 325410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:45,171-Speed 9384.80 samples/sec Loss 3.1650 LearningRate 0.0001 Epoch: 19 Global Step: 325420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:46,298-Speed 9087.32 samples/sec Loss 3.3419 LearningRate 0.0001 Epoch: 19 Global Step: 325430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:47,428-Speed 9073.86 samples/sec Loss 3.2574 LearningRate 0.0001 Epoch: 19 Global Step: 325440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:48,501-Speed 9546.89 samples/sec Loss 3.1733 LearningRate 0.0001 Epoch: 19 Global Step: 325450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:49,655-Speed 8878.41 samples/sec Loss 3.2757 LearningRate 0.0001 Epoch: 19 Global Step: 325460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:50,796-Speed 8978.24 samples/sec Loss 3.2626 LearningRate 0.0001 Epoch: 19 Global Step: 325470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:51,939-Speed 8965.60 samples/sec Loss 3.3154 LearningRate 0.0001 Epoch: 19 Global Step: 325480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:53,059-Speed 9151.88 samples/sec Loss 3.3010 LearningRate 0.0001 Epoch: 19 Global Step: 325490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:54,197-Speed 9003.70 samples/sec Loss 3.4011 LearningRate 0.0001 Epoch: 19 Global Step: 325500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:55,332-Speed 9025.19 samples/sec Loss 3.2629 LearningRate 0.0001 Epoch: 19 Global Step: 325510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:56,422-Speed 9397.17 samples/sec Loss 3.2368 LearningRate 0.0001 Epoch: 19 Global Step: 325520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:57,531-Speed 9244.44 samples/sec Loss 3.2604 LearningRate 0.0001 Epoch: 19 Global Step: 325530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:58,632-Speed 9303.18 samples/sec Loss 3.3042 LearningRate 0.0001 Epoch: 19 Global Step: 325540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:44:59,757-Speed 9104.78 samples/sec Loss 3.2837 LearningRate 0.0001 Epoch: 19 Global Step: 325550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:00,857-Speed 9318.32 samples/sec Loss 3.2557 LearningRate 0.0001 Epoch: 19 Global Step: 325560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:02,020-Speed 8814.54 samples/sec Loss 3.3666 LearningRate 0.0001 Epoch: 19 Global Step: 325570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:03,127-Speed 9255.94 samples/sec Loss 3.2768 LearningRate 0.0001 Epoch: 19 Global Step: 325580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:04,299-Speed 8736.52 samples/sec Loss 3.3257 LearningRate 0.0001 Epoch: 19 Global Step: 325590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:05,391-Speed 9380.82 samples/sec Loss 3.2880 LearningRate 0.0001 Epoch: 19 Global Step: 325600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:06,535-Speed 8959.35 samples/sec Loss 3.3260 LearningRate 0.0001 Epoch: 19 Global Step: 325610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:07,631-Speed 9349.36 samples/sec Loss 3.2568 LearningRate 0.0001 Epoch: 19 Global Step: 325620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:08,718-Speed 9437.03 samples/sec Loss 3.3001 LearningRate 0.0001 Epoch: 19 Global Step: 325630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:09,844-Speed 9094.41 samples/sec Loss 3.2430 LearningRate 0.0001 Epoch: 19 Global Step: 325640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:10,948-Speed 9278.01 samples/sec Loss 3.2583 LearningRate 0.0001 Epoch: 19 Global Step: 325650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:12,053-Speed 9271.58 samples/sec Loss 3.2744 LearningRate 0.0001 Epoch: 19 Global Step: 325660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:13,140-Speed 9426.64 samples/sec Loss 3.2601 LearningRate 0.0001 Epoch: 19 Global Step: 325670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:14,254-Speed 9202.46 samples/sec Loss 3.3385 LearningRate 0.0001 Epoch: 19 Global Step: 325680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:15,327-Speed 9543.13 samples/sec Loss 3.3576 LearningRate 0.0001 Epoch: 19 Global Step: 325690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:16,445-Speed 9165.22 samples/sec Loss 3.2756 LearningRate 0.0001 Epoch: 19 Global Step: 325700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:17,559-Speed 9195.83 samples/sec Loss 3.2308 LearningRate 0.0001 Epoch: 19 Global Step: 325710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:18,676-Speed 9171.27 samples/sec Loss 3.2788 LearningRate 0.0001 Epoch: 19 Global Step: 325720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:45:19,839-Speed 8812.77 samples/sec Loss 3.3015 LearningRate 0.0001 Epoch: 19 Global Step: 325730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:20,987-Speed 8927.47 samples/sec Loss 3.2632 LearningRate 0.0001 Epoch: 19 Global Step: 325740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:22,083-Speed 9345.42 samples/sec Loss 3.3280 LearningRate 0.0001 Epoch: 19 Global Step: 325750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:23,176-Speed 9380.04 samples/sec Loss 3.3299 LearningRate 0.0001 Epoch: 19 Global Step: 325760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:24,298-Speed 9125.62 samples/sec Loss 3.2110 LearningRate 0.0001 Epoch: 19 Global Step: 325770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:25,444-Speed 8948.90 samples/sec Loss 3.2982 LearningRate 0.0001 Epoch: 19 Global Step: 325780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:26,572-Speed 9082.31 samples/sec Loss 3.3482 LearningRate 0.0001 Epoch: 19 Global Step: 325790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:27,680-Speed 9246.96 samples/sec Loss 3.2445 LearningRate 0.0001 Epoch: 19 Global Step: 325800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:28,757-Speed 9517.34 samples/sec Loss 3.2720 LearningRate 0.0001 Epoch: 19 Global Step: 325810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:29,797-Speed 9850.59 samples/sec Loss 3.2820 LearningRate 0.0001 Epoch: 19 Global Step: 325820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:30,932-Speed 9023.75 samples/sec Loss 3.2605 LearningRate 0.0001 Epoch: 19 Global Step: 325830 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:45:32,059-Speed 9094.66 samples/sec Loss 3.3067 LearningRate 0.0001 Epoch: 19 Global Step: 325840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:33,216-Speed 8860.31 samples/sec Loss 3.3499 LearningRate 0.0001 Epoch: 19 Global Step: 325850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:34,291-Speed 9528.74 samples/sec Loss 3.3028 LearningRate 0.0001 Epoch: 19 Global Step: 325860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:35,425-Speed 9033.41 samples/sec Loss 3.2716 LearningRate 0.0001 Epoch: 19 Global Step: 325870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:36,585-Speed 8832.11 samples/sec Loss 3.1905 LearningRate 0.0001 Epoch: 19 Global Step: 325880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:37,728-Speed 8960.51 samples/sec Loss 3.3331 LearningRate 0.0001 Epoch: 19 Global Step: 325890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:38,864-Speed 9021.52 samples/sec Loss 3.2583 LearningRate 0.0001 Epoch: 19 Global Step: 325900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:40,003-Speed 8994.40 samples/sec Loss 3.3374 LearningRate 0.0001 Epoch: 19 Global Step: 325910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:41,082-Speed 9499.80 samples/sec Loss 3.3054 LearningRate 0.0001 Epoch: 19 Global Step: 325920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:45:42,196-Speed 9194.03 samples/sec Loss 3.3065 LearningRate 0.0001 Epoch: 19 Global Step: 325930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:45:43,343-Speed 8936.42 samples/sec Loss 3.2540 LearningRate 0.0001 Epoch: 19 Global Step: 325940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:45:44,448-Speed 9272.57 samples/sec Loss 3.3115 LearningRate 0.0001 Epoch: 19 Global Step: 325950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:45:45,521-Speed 9545.34 samples/sec Loss 3.2588 LearningRate 0.0001 Epoch: 19 Global Step: 325960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:45:46,641-Speed 9149.10 samples/sec Loss 3.3507 LearningRate 0.0001 Epoch: 19 Global Step: 325970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:45:47,717-Speed 9528.25 samples/sec Loss 3.3268 LearningRate 0.0001 Epoch: 19 Global Step: 325980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:45:48,801-Speed 9451.37 samples/sec Loss 3.2652 LearningRate 0.0001 Epoch: 19 Global Step: 325990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:45:49,871-Speed 9570.21 samples/sec Loss 3.3027 LearningRate 0.0001 Epoch: 19 Global Step: 326000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:46:11,846-[lfw][326000]XNorm: 6.548740 Training: 2022-04-12 00:46:11,847-[lfw][326000]Accuracy-Flip: 0.99650+-0.00263 Training: 2022-04-12 00:46:11,847-[lfw][326000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:46:37,252-[cfp_fp][326000]XNorm: 5.721344 Training: 2022-04-12 00:46:37,253-[cfp_fp][326000]Accuracy-Flip: 0.97186+-0.00892 Training: 2022-04-12 00:46:37,253-[cfp_fp][326000]Accuracy-Highest: 0.97543 Training: 2022-04-12 00:46:59,122-[agedb_30][326000]XNorm: 6.378732 Training: 2022-04-12 00:46:59,122-[agedb_30][326000]Accuracy-Flip: 0.97217+-0.00803 Training: 2022-04-12 00:46:59,122-[agedb_30][326000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:47:00,237-Speed 145.53 samples/sec Loss 3.2839 LearningRate 0.0001 Epoch: 19 Global Step: 326010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:01,352-Speed 9220.53 samples/sec Loss 3.2458 LearningRate 0.0001 Epoch: 19 Global Step: 326020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:02,511-Speed 8839.45 samples/sec Loss 3.2810 LearningRate 0.0001 Epoch: 19 Global Step: 326030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:03,634-Speed 9122.17 samples/sec Loss 3.3563 LearningRate 0.0001 Epoch: 19 Global Step: 326040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:04,744-Speed 9236.98 samples/sec Loss 3.4477 LearningRate 0.0001 Epoch: 19 Global Step: 326050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:05,849-Speed 9264.94 samples/sec Loss 3.2926 LearningRate 0.0001 Epoch: 19 Global Step: 326060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:06,979-Speed 9070.33 samples/sec Loss 3.3483 LearningRate 0.0001 Epoch: 19 Global Step: 326070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:08,104-Speed 9106.60 samples/sec Loss 3.2510 LearningRate 0.0001 Epoch: 19 Global Step: 326080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:09,227-Speed 9130.55 samples/sec Loss 3.3628 LearningRate 0.0001 Epoch: 19 Global Step: 326090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:10,380-Speed 8883.81 samples/sec Loss 3.3451 LearningRate 0.0001 Epoch: 19 Global Step: 326100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:11,530-Speed 8906.10 samples/sec Loss 3.3114 LearningRate 0.0001 Epoch: 19 Global Step: 326110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:12,688-Speed 8845.55 samples/sec Loss 3.2867 LearningRate 0.0001 Epoch: 19 Global Step: 326120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:13,811-Speed 9124.16 samples/sec Loss 3.3004 LearningRate 0.0001 Epoch: 19 Global Step: 326130 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:47:14,964-Speed 8892.45 samples/sec Loss 3.3422 LearningRate 0.0001 Epoch: 19 Global Step: 326140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:16,080-Speed 9178.42 samples/sec Loss 3.1896 LearningRate 0.0001 Epoch: 19 Global Step: 326150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:17,213-Speed 9043.57 samples/sec Loss 3.3454 LearningRate 0.0001 Epoch: 19 Global Step: 326160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:18,360-Speed 8933.04 samples/sec Loss 3.3264 LearningRate 0.0001 Epoch: 19 Global Step: 326170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:19,486-Speed 9095.81 samples/sec Loss 3.2478 LearningRate 0.0001 Epoch: 19 Global Step: 326180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:20,578-Speed 9393.03 samples/sec Loss 3.1820 LearningRate 0.0001 Epoch: 19 Global Step: 326190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:21,676-Speed 9324.03 samples/sec Loss 3.3316 LearningRate 0.0001 Epoch: 19 Global Step: 326200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:22,749-Speed 9548.80 samples/sec Loss 3.2772 LearningRate 0.0001 Epoch: 19 Global Step: 326210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:23,885-Speed 9022.30 samples/sec Loss 3.2812 LearningRate 0.0001 Epoch: 19 Global Step: 326220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:25,020-Speed 9028.35 samples/sec Loss 3.3419 LearningRate 0.0001 Epoch: 19 Global Step: 326230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:26,097-Speed 9509.23 samples/sec Loss 3.2233 LearningRate 0.0001 Epoch: 19 Global Step: 326240 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:47:27,163-Speed 9616.08 samples/sec Loss 3.3207 LearningRate 0.0001 Epoch: 19 Global Step: 326250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:28,325-Speed 8814.15 samples/sec Loss 3.3306 LearningRate 0.0001 Epoch: 19 Global Step: 326260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:29,441-Speed 9179.33 samples/sec Loss 3.2828 LearningRate 0.0001 Epoch: 19 Global Step: 326270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:30,568-Speed 9092.45 samples/sec Loss 3.2780 LearningRate 0.0001 Epoch: 19 Global Step: 326280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:31,726-Speed 8853.86 samples/sec Loss 3.3743 LearningRate 0.0001 Epoch: 19 Global Step: 326290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:32,853-Speed 9088.89 samples/sec Loss 3.2611 LearningRate 0.0001 Epoch: 19 Global Step: 326300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:33,999-Speed 8939.97 samples/sec Loss 3.2179 LearningRate 0.0001 Epoch: 19 Global Step: 326310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:35,095-Speed 9348.67 samples/sec Loss 3.3304 LearningRate 0.0001 Epoch: 19 Global Step: 326320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:36,238-Speed 8970.02 samples/sec Loss 3.2101 LearningRate 0.0001 Epoch: 19 Global Step: 326330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:37,355-Speed 9172.65 samples/sec Loss 3.2954 LearningRate 0.0001 Epoch: 19 Global Step: 326340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:47:38,528-Speed 8738.92 samples/sec Loss 3.2995 LearningRate 0.0001 Epoch: 19 Global Step: 326350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:39,631-Speed 9291.11 samples/sec Loss 3.2515 LearningRate 0.0000 Epoch: 19 Global Step: 326360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:40,768-Speed 9007.75 samples/sec Loss 3.3195 LearningRate 0.0000 Epoch: 19 Global Step: 326370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:41,908-Speed 8985.31 samples/sec Loss 3.3163 LearningRate 0.0000 Epoch: 19 Global Step: 326380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:42,996-Speed 9421.18 samples/sec Loss 3.2688 LearningRate 0.0000 Epoch: 19 Global Step: 326390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:44,120-Speed 9114.58 samples/sec Loss 3.2936 LearningRate 0.0000 Epoch: 19 Global Step: 326400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:45,249-Speed 9072.15 samples/sec Loss 3.3066 LearningRate 0.0000 Epoch: 19 Global Step: 326410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:46,345-Speed 9352.59 samples/sec Loss 3.3359 LearningRate 0.0000 Epoch: 19 Global Step: 326420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:47,482-Speed 9012.05 samples/sec Loss 3.3015 LearningRate 0.0000 Epoch: 19 Global Step: 326430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:48,650-Speed 8766.61 samples/sec Loss 3.2636 LearningRate 0.0000 Epoch: 19 Global Step: 326440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:49,781-Speed 9063.17 samples/sec Loss 3.3815 LearningRate 0.0000 Epoch: 19 Global Step: 326450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:50,853-Speed 9562.40 samples/sec Loss 3.2186 LearningRate 0.0000 Epoch: 19 Global Step: 326460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:51,937-Speed 9457.30 samples/sec Loss 3.1690 LearningRate 0.0000 Epoch: 19 Global Step: 326470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:53,002-Speed 9615.54 samples/sec Loss 3.1867 LearningRate 0.0000 Epoch: 19 Global Step: 326480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:54,123-Speed 9145.92 samples/sec Loss 3.2924 LearningRate 0.0000 Epoch: 19 Global Step: 326490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:55,272-Speed 8909.89 samples/sec Loss 3.2996 LearningRate 0.0000 Epoch: 19 Global Step: 326500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:56,381-Speed 9238.13 samples/sec Loss 3.2641 LearningRate 0.0000 Epoch: 19 Global Step: 326510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:57,485-Speed 9283.42 samples/sec Loss 3.2923 LearningRate 0.0000 Epoch: 19 Global Step: 326520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:58,554-Speed 9585.33 samples/sec Loss 3.2988 LearningRate 0.0000 Epoch: 19 Global Step: 326530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:47:59,647-Speed 9373.06 samples/sec Loss 3.2999 LearningRate 0.0000 Epoch: 19 Global Step: 326540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:00,765-Speed 9165.19 samples/sec Loss 3.2551 LearningRate 0.0000 Epoch: 19 Global Step: 326550 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:48:01,905-Speed 8994.32 samples/sec Loss 3.2734 LearningRate 0.0000 Epoch: 19 Global Step: 326560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:48:03,053-Speed 8923.07 samples/sec Loss 3.3214 LearningRate 0.0000 Epoch: 19 Global Step: 326570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:04,222-Speed 8759.01 samples/sec Loss 3.2818 LearningRate 0.0000 Epoch: 19 Global Step: 326580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:05,400-Speed 8697.12 samples/sec Loss 3.2905 LearningRate 0.0000 Epoch: 19 Global Step: 326590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:06,536-Speed 9021.96 samples/sec Loss 3.3272 LearningRate 0.0000 Epoch: 19 Global Step: 326600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:07,632-Speed 9350.01 samples/sec Loss 3.3358 LearningRate 0.0000 Epoch: 19 Global Step: 326610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:08,741-Speed 9244.68 samples/sec Loss 3.2833 LearningRate 0.0000 Epoch: 19 Global Step: 326620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:09,872-Speed 9056.29 samples/sec Loss 3.2886 LearningRate 0.0000 Epoch: 19 Global Step: 326630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:10,990-Speed 9164.51 samples/sec Loss 3.3369 LearningRate 0.0000 Epoch: 19 Global Step: 326640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:12,077-Speed 9426.76 samples/sec Loss 3.2580 LearningRate 0.0000 Epoch: 19 Global Step: 326650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:13,237-Speed 8832.80 samples/sec Loss 3.2795 LearningRate 0.0000 Epoch: 19 Global Step: 326660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:14,358-Speed 9135.93 samples/sec Loss 3.3099 LearningRate 0.0000 Epoch: 19 Global Step: 326670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:48:15,483-Speed 9106.96 samples/sec Loss 3.2896 LearningRate 0.0000 Epoch: 19 Global Step: 326680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:48:16,581-Speed 9336.68 samples/sec Loss 3.2137 LearningRate 0.0000 Epoch: 19 Global Step: 326690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:48:17,712-Speed 9051.92 samples/sec Loss 3.2575 LearningRate 0.0000 Epoch: 19 Global Step: 326700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:48:18,832-Speed 9153.14 samples/sec Loss 3.2540 LearningRate 0.0000 Epoch: 19 Global Step: 326710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:48:19,962-Speed 9062.03 samples/sec Loss 3.2809 LearningRate 0.0000 Epoch: 19 Global Step: 326720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:48:21,078-Speed 9184.36 samples/sec Loss 3.3219 LearningRate 0.0000 Epoch: 19 Global Step: 326730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:48:22,214-Speed 9020.87 samples/sec Loss 3.3005 LearningRate 0.0000 Epoch: 19 Global Step: 326740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:48:23,339-Speed 9108.88 samples/sec Loss 3.3068 LearningRate 0.0000 Epoch: 19 Global Step: 326750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:48:24,452-Speed 9200.23 samples/sec Loss 3.3291 LearningRate 0.0000 Epoch: 19 Global Step: 326760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:48:25,595-Speed 8964.04 samples/sec Loss 3.3306 LearningRate 0.0000 Epoch: 19 Global Step: 326770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:26,730-Speed 9032.93 samples/sec Loss 3.3169 LearningRate 0.0000 Epoch: 19 Global Step: 326780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:27,819-Speed 9406.24 samples/sec Loss 3.3193 LearningRate 0.0000 Epoch: 19 Global Step: 326790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:28,964-Speed 8951.73 samples/sec Loss 3.2497 LearningRate 0.0000 Epoch: 19 Global Step: 326800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:30,050-Speed 9433.89 samples/sec Loss 3.3952 LearningRate 0.0000 Epoch: 19 Global Step: 326810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:31,130-Speed 9482.38 samples/sec Loss 3.1759 LearningRate 0.0000 Epoch: 19 Global Step: 326820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:32,199-Speed 9590.01 samples/sec Loss 3.3502 LearningRate 0.0000 Epoch: 19 Global Step: 326830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:33,291-Speed 9380.48 samples/sec Loss 3.2518 LearningRate 0.0000 Epoch: 19 Global Step: 326840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:34,424-Speed 9045.78 samples/sec Loss 3.2996 LearningRate 0.0000 Epoch: 19 Global Step: 326850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:35,560-Speed 9020.70 samples/sec Loss 3.3188 LearningRate 0.0000 Epoch: 19 Global Step: 326860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:36,663-Speed 9282.42 samples/sec Loss 3.3034 LearningRate 0.0000 Epoch: 19 Global Step: 326870 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:48:37,755-Speed 9389.43 samples/sec Loss 3.2646 LearningRate 0.0000 Epoch: 19 Global Step: 326880 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:48:38,875-Speed 9148.03 samples/sec Loss 3.2217 LearningRate 0.0000 Epoch: 19 Global Step: 326890 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:48:39,955-Speed 9490.68 samples/sec Loss 3.2851 LearningRate 0.0000 Epoch: 19 Global Step: 326900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:41,018-Speed 9635.96 samples/sec Loss 3.2445 LearningRate 0.0000 Epoch: 19 Global Step: 326910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:42,099-Speed 9474.02 samples/sec Loss 3.3624 LearningRate 0.0000 Epoch: 19 Global Step: 326920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:43,194-Speed 9360.64 samples/sec Loss 3.3253 LearningRate 0.0000 Epoch: 19 Global Step: 326930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:44,335-Speed 8980.36 samples/sec Loss 3.3294 LearningRate 0.0000 Epoch: 19 Global Step: 326940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:45,483-Speed 8927.81 samples/sec Loss 3.2485 LearningRate 0.0000 Epoch: 19 Global Step: 326950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:46,580-Speed 9341.71 samples/sec Loss 3.2982 LearningRate 0.0000 Epoch: 19 Global Step: 326960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:47,703-Speed 9123.97 samples/sec Loss 3.2550 LearningRate 0.0000 Epoch: 19 Global Step: 326970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:48,818-Speed 9185.64 samples/sec Loss 3.2051 LearningRate 0.0000 Epoch: 19 Global Step: 326980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:49,912-Speed 9362.84 samples/sec Loss 3.3226 LearningRate 0.0000 Epoch: 19 Global Step: 326990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:51,035-Speed 9129.17 samples/sec Loss 3.3780 LearningRate 0.0000 Epoch: 19 Global Step: 327000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:52,122-Speed 9425.17 samples/sec Loss 3.3074 LearningRate 0.0000 Epoch: 19 Global Step: 327010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:53,271-Speed 8912.75 samples/sec Loss 3.2745 LearningRate 0.0000 Epoch: 19 Global Step: 327020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:54,400-Speed 9081.52 samples/sec Loss 3.2708 LearningRate 0.0000 Epoch: 19 Global Step: 327030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:55,487-Speed 9425.05 samples/sec Loss 3.2661 LearningRate 0.0000 Epoch: 19 Global Step: 327040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:56,572-Speed 9436.61 samples/sec Loss 3.2818 LearningRate 0.0000 Epoch: 19 Global Step: 327050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:57,681-Speed 9242.68 samples/sec Loss 3.2287 LearningRate 0.0000 Epoch: 19 Global Step: 327060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:48:58,874-Speed 8586.39 samples/sec Loss 3.3258 LearningRate 0.0000 Epoch: 19 Global Step: 327070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:00,048-Speed 8727.51 samples/sec Loss 3.2791 LearningRate 0.0000 Epoch: 19 Global Step: 327080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:01,200-Speed 8890.35 samples/sec Loss 3.2854 LearningRate 0.0000 Epoch: 19 Global Step: 327090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:02,294-Speed 9376.29 samples/sec Loss 3.2403 LearningRate 0.0000 Epoch: 19 Global Step: 327100 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:49:03,397-Speed 9290.07 samples/sec Loss 3.3357 LearningRate 0.0000 Epoch: 19 Global Step: 327110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:04,538-Speed 8982.72 samples/sec Loss 3.3074 LearningRate 0.0000 Epoch: 19 Global Step: 327120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:05,690-Speed 8889.49 samples/sec Loss 3.3415 LearningRate 0.0000 Epoch: 19 Global Step: 327130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:06,891-Speed 8532.98 samples/sec Loss 3.3176 LearningRate 0.0000 Epoch: 19 Global Step: 327140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:07,979-Speed 9418.59 samples/sec Loss 3.3326 LearningRate 0.0000 Epoch: 19 Global Step: 327150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:09,100-Speed 9138.69 samples/sec Loss 3.4121 LearningRate 0.0000 Epoch: 19 Global Step: 327160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:10,219-Speed 9157.78 samples/sec Loss 3.2928 LearningRate 0.0000 Epoch: 19 Global Step: 327170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:11,307-Speed 9420.42 samples/sec Loss 3.3422 LearningRate 0.0000 Epoch: 19 Global Step: 327180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:12,408-Speed 9300.71 samples/sec Loss 3.3213 LearningRate 0.0000 Epoch: 19 Global Step: 327190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:13,543-Speed 9031.08 samples/sec Loss 3.3378 LearningRate 0.0000 Epoch: 19 Global Step: 327200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:14,668-Speed 9107.38 samples/sec Loss 3.3905 LearningRate 0.0000 Epoch: 19 Global Step: 327210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:15,811-Speed 8963.08 samples/sec Loss 3.3082 LearningRate 0.0000 Epoch: 19 Global Step: 327220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:16,896-Speed 9442.73 samples/sec Loss 3.2565 LearningRate 0.0000 Epoch: 19 Global Step: 327230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:18,091-Speed 8569.87 samples/sec Loss 3.1994 LearningRate 0.0000 Epoch: 19 Global Step: 327240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:19,206-Speed 9188.05 samples/sec Loss 3.3039 LearningRate 0.0000 Epoch: 19 Global Step: 327250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:20,316-Speed 9236.66 samples/sec Loss 3.2520 LearningRate 0.0000 Epoch: 19 Global Step: 327260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:21,382-Speed 9611.40 samples/sec Loss 3.3031 LearningRate 0.0000 Epoch: 19 Global Step: 327270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:22,498-Speed 9181.78 samples/sec Loss 3.2374 LearningRate 0.0000 Epoch: 19 Global Step: 327280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:23,635-Speed 9015.60 samples/sec Loss 3.2713 LearningRate 0.0000 Epoch: 19 Global Step: 327290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:24,761-Speed 9092.77 samples/sec Loss 3.3457 LearningRate 0.0000 Epoch: 19 Global Step: 327300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:25,897-Speed 9017.64 samples/sec Loss 3.3030 LearningRate 0.0000 Epoch: 19 Global Step: 327310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:26,972-Speed 9538.76 samples/sec Loss 3.3262 LearningRate 0.0000 Epoch: 19 Global Step: 327320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:28,071-Speed 9314.93 samples/sec Loss 3.3015 LearningRate 0.0000 Epoch: 19 Global Step: 327330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:29,171-Speed 9318.33 samples/sec Loss 3.3059 LearningRate 0.0000 Epoch: 19 Global Step: 327340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:30,301-Speed 9066.42 samples/sec Loss 3.3230 LearningRate 0.0000 Epoch: 19 Global Step: 327350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:31,448-Speed 8928.23 samples/sec Loss 3.2025 LearningRate 0.0000 Epoch: 19 Global Step: 327360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:32,590-Speed 8979.32 samples/sec Loss 3.2741 LearningRate 0.0000 Epoch: 19 Global Step: 327370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:33,786-Speed 8563.34 samples/sec Loss 3.3491 LearningRate 0.0000 Epoch: 19 Global Step: 327380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:34,879-Speed 9380.79 samples/sec Loss 3.2333 LearningRate 0.0000 Epoch: 19 Global Step: 327390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:35,933-Speed 9714.58 samples/sec Loss 3.3523 LearningRate 0.0000 Epoch: 19 Global Step: 327400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:37,050-Speed 9176.31 samples/sec Loss 3.3625 LearningRate 0.0000 Epoch: 19 Global Step: 327410 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:49:38,161-Speed 9223.55 samples/sec Loss 3.3378 LearningRate 0.0000 Epoch: 19 Global Step: 327420 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:49:39,261-Speed 9313.94 samples/sec Loss 3.3039 LearningRate 0.0000 Epoch: 19 Global Step: 327430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:40,392-Speed 9060.80 samples/sec Loss 3.2682 LearningRate 0.0000 Epoch: 19 Global Step: 327440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:41,494-Speed 9305.93 samples/sec Loss 3.3048 LearningRate 0.0000 Epoch: 19 Global Step: 327450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:42,608-Speed 9188.63 samples/sec Loss 3.2406 LearningRate 0.0000 Epoch: 19 Global Step: 327460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:43,769-Speed 8830.03 samples/sec Loss 3.3312 LearningRate 0.0000 Epoch: 19 Global Step: 327470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:44,869-Speed 9318.73 samples/sec Loss 3.2678 LearningRate 0.0000 Epoch: 19 Global Step: 327480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:45,931-Speed 9639.92 samples/sec Loss 3.3646 LearningRate 0.0000 Epoch: 19 Global Step: 327490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:47,070-Speed 8996.17 samples/sec Loss 3.3111 LearningRate 0.0000 Epoch: 19 Global Step: 327500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:48,182-Speed 9216.38 samples/sec Loss 3.2829 LearningRate 0.0000 Epoch: 19 Global Step: 327510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:49,282-Speed 9315.53 samples/sec Loss 3.2258 LearningRate 0.0000 Epoch: 19 Global Step: 327520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:50,402-Speed 9144.16 samples/sec Loss 3.2929 LearningRate 0.0000 Epoch: 19 Global Step: 327530 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:49:51,509-Speed 9256.96 samples/sec Loss 3.3057 LearningRate 0.0000 Epoch: 19 Global Step: 327540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:52,692-Speed 8662.58 samples/sec Loss 3.3750 LearningRate 0.0000 Epoch: 19 Global Step: 327550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:53,774-Speed 9468.91 samples/sec Loss 3.3580 LearningRate 0.0000 Epoch: 19 Global Step: 327560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:54,886-Speed 9211.10 samples/sec Loss 3.3504 LearningRate 0.0000 Epoch: 19 Global Step: 327570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:55,998-Speed 9212.31 samples/sec Loss 3.3037 LearningRate 0.0000 Epoch: 19 Global Step: 327580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:57,074-Speed 9527.58 samples/sec Loss 3.3058 LearningRate 0.0000 Epoch: 19 Global Step: 327590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:58,163-Speed 9407.63 samples/sec Loss 3.2805 LearningRate 0.0000 Epoch: 19 Global Step: 327600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:49:59,287-Speed 9114.47 samples/sec Loss 3.2598 LearningRate 0.0000 Epoch: 19 Global Step: 327610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:00,419-Speed 9057.85 samples/sec Loss 3.3773 LearningRate 0.0000 Epoch: 19 Global Step: 327620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:01,497-Speed 9500.79 samples/sec Loss 3.2701 LearningRate 0.0000 Epoch: 19 Global Step: 327630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:02,575-Speed 9506.07 samples/sec Loss 3.2661 LearningRate 0.0000 Epoch: 19 Global Step: 327640 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:50:03,674-Speed 9323.38 samples/sec Loss 3.2902 LearningRate 0.0000 Epoch: 19 Global Step: 327650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:04,767-Speed 9370.15 samples/sec Loss 3.4144 LearningRate 0.0000 Epoch: 19 Global Step: 327660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:05,884-Speed 9174.04 samples/sec Loss 3.2606 LearningRate 0.0000 Epoch: 19 Global Step: 327670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:06,988-Speed 9279.49 samples/sec Loss 3.2693 LearningRate 0.0000 Epoch: 19 Global Step: 327680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:08,090-Speed 9304.73 samples/sec Loss 3.3332 LearningRate 0.0000 Epoch: 19 Global Step: 327690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:09,233-Speed 8961.16 samples/sec Loss 3.2522 LearningRate 0.0000 Epoch: 19 Global Step: 327700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:10,295-Speed 9647.32 samples/sec Loss 3.3685 LearningRate 0.0000 Epoch: 19 Global Step: 327710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:11,377-Speed 9465.49 samples/sec Loss 3.2273 LearningRate 0.0000 Epoch: 19 Global Step: 327720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:12,470-Speed 9381.53 samples/sec Loss 3.2775 LearningRate 0.0000 Epoch: 19 Global Step: 327730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:13,544-Speed 9543.60 samples/sec Loss 3.2419 LearningRate 0.0000 Epoch: 19 Global Step: 327740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:14,699-Speed 8869.80 samples/sec Loss 3.3591 LearningRate 0.0000 Epoch: 19 Global Step: 327750 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:50:15,814-Speed 9191.24 samples/sec Loss 3.3101 LearningRate 0.0000 Epoch: 19 Global Step: 327760 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:50:16,900-Speed 9431.19 samples/sec Loss 3.3368 LearningRate 0.0000 Epoch: 19 Global Step: 327770 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:50:18,014-Speed 9203.15 samples/sec Loss 3.2687 LearningRate 0.0000 Epoch: 19 Global Step: 327780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:19,159-Speed 8942.90 samples/sec Loss 3.3467 LearningRate 0.0000 Epoch: 19 Global Step: 327790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:20,234-Speed 9536.05 samples/sec Loss 3.3167 LearningRate 0.0000 Epoch: 19 Global Step: 327800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:21,356-Speed 9127.66 samples/sec Loss 3.3482 LearningRate 0.0000 Epoch: 19 Global Step: 327810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:22,425-Speed 9579.98 samples/sec Loss 3.2923 LearningRate 0.0000 Epoch: 19 Global Step: 327820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:23,553-Speed 9088.27 samples/sec Loss 3.2361 LearningRate 0.0000 Epoch: 19 Global Step: 327830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:24,664-Speed 9220.49 samples/sec Loss 3.2546 LearningRate 0.0000 Epoch: 19 Global Step: 327840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:25,831-Speed 8780.16 samples/sec Loss 3.2733 LearningRate 0.0000 Epoch: 19 Global Step: 327850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:26,990-Speed 8841.43 samples/sec Loss 3.2349 LearningRate 0.0000 Epoch: 19 Global Step: 327860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:28,098-Speed 9245.93 samples/sec Loss 3.2601 LearningRate 0.0000 Epoch: 19 Global Step: 327870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:29,221-Speed 9130.09 samples/sec Loss 3.3170 LearningRate 0.0000 Epoch: 19 Global Step: 327880 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:50:30,395-Speed 8724.05 samples/sec Loss 3.3538 LearningRate 0.0000 Epoch: 19 Global Step: 327890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:31,494-Speed 9322.12 samples/sec Loss 3.3786 LearningRate 0.0000 Epoch: 19 Global Step: 327900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:32,642-Speed 8926.56 samples/sec Loss 3.3242 LearningRate 0.0000 Epoch: 19 Global Step: 327910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:33,820-Speed 8701.46 samples/sec Loss 3.2828 LearningRate 0.0000 Epoch: 19 Global Step: 327920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:34,906-Speed 9433.58 samples/sec Loss 3.2791 LearningRate 0.0000 Epoch: 19 Global Step: 327930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:36,012-Speed 9266.39 samples/sec Loss 3.2910 LearningRate 0.0000 Epoch: 19 Global Step: 327940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:37,106-Speed 9370.72 samples/sec Loss 3.1996 LearningRate 0.0000 Epoch: 19 Global Step: 327950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:38,206-Speed 9310.70 samples/sec Loss 3.2699 LearningRate 0.0000 Epoch: 19 Global Step: 327960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:39,284-Speed 9503.26 samples/sec Loss 3.3356 LearningRate 0.0000 Epoch: 19 Global Step: 327970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:40,402-Speed 9166.05 samples/sec Loss 3.2941 LearningRate 0.0000 Epoch: 19 Global Step: 327980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:41,531-Speed 9069.24 samples/sec Loss 3.3323 LearningRate 0.0000 Epoch: 19 Global Step: 327990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:50:42,634-Speed 9290.21 samples/sec Loss 3.3207 LearningRate 0.0000 Epoch: 19 Global Step: 328000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:51:04,508-[lfw][328000]XNorm: 6.515726 Training: 2022-04-12 00:51:04,508-[lfw][328000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-04-12 00:51:04,508-[lfw][328000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:51:29,940-[cfp_fp][328000]XNorm: 5.690911 Training: 2022-04-12 00:51:29,941-[cfp_fp][328000]Accuracy-Flip: 0.97286+-0.00838 Training: 2022-04-12 00:51:29,941-[cfp_fp][328000]Accuracy-Highest: 0.97543 Training: 2022-04-12 00:51:51,814-[agedb_30][328000]XNorm: 6.350155 Training: 2022-04-12 00:51:51,815-[agedb_30][328000]Accuracy-Flip: 0.97217+-0.00806 Training: 2022-04-12 00:51:51,815-[agedb_30][328000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:51:52,916-Speed 145.70 samples/sec Loss 3.2589 LearningRate 0.0000 Epoch: 19 Global Step: 328010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:51:53,993-Speed 9520.11 samples/sec Loss 3.3285 LearningRate 0.0000 Epoch: 19 Global Step: 328020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:51:55,067-Speed 9532.41 samples/sec Loss 3.2296 LearningRate 0.0000 Epoch: 19 Global Step: 328030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:51:56,162-Speed 9365.11 samples/sec Loss 3.3119 LearningRate 0.0000 Epoch: 19 Global Step: 328040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:51:57,266-Speed 9277.50 samples/sec Loss 3.3819 LearningRate 0.0000 Epoch: 19 Global Step: 328050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:51:58,374-Speed 9245.98 samples/sec Loss 3.2495 LearningRate 0.0000 Epoch: 19 Global Step: 328060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:51:59,456-Speed 9470.82 samples/sec Loss 3.3074 LearningRate 0.0000 Epoch: 19 Global Step: 328070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:52:00,583-Speed 9095.89 samples/sec Loss 3.2388 LearningRate 0.0000 Epoch: 19 Global Step: 328080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:52:01,697-Speed 9197.05 samples/sec Loss 3.3069 LearningRate 0.0000 Epoch: 19 Global Step: 328090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:52:02,797-Speed 9311.07 samples/sec Loss 3.2792 LearningRate 0.0000 Epoch: 19 Global Step: 328100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:52:03,894-Speed 9341.13 samples/sec Loss 3.2507 LearningRate 0.0000 Epoch: 19 Global Step: 328110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:52:05,009-Speed 9186.04 samples/sec Loss 3.2416 LearningRate 0.0000 Epoch: 19 Global Step: 328120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:52:06,104-Speed 9360.04 samples/sec Loss 3.2700 LearningRate 0.0000 Epoch: 19 Global Step: 328130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:07,211-Speed 9255.69 samples/sec Loss 3.2851 LearningRate 0.0000 Epoch: 19 Global Step: 328140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:08,303-Speed 9382.08 samples/sec Loss 3.3012 LearningRate 0.0000 Epoch: 19 Global Step: 328150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:09,441-Speed 9006.41 samples/sec Loss 3.3038 LearningRate 0.0000 Epoch: 19 Global Step: 328160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:10,566-Speed 9101.71 samples/sec Loss 3.2703 LearningRate 0.0000 Epoch: 19 Global Step: 328170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:11,694-Speed 9081.28 samples/sec Loss 3.2706 LearningRate 0.0000 Epoch: 19 Global Step: 328180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:12,798-Speed 9286.41 samples/sec Loss 3.2410 LearningRate 0.0000 Epoch: 19 Global Step: 328190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:13,921-Speed 9120.56 samples/sec Loss 3.2355 LearningRate 0.0000 Epoch: 19 Global Step: 328200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:15,003-Speed 9470.85 samples/sec Loss 3.3016 LearningRate 0.0000 Epoch: 19 Global Step: 328210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:16,084-Speed 9483.18 samples/sec Loss 3.2798 LearningRate 0.0000 Epoch: 19 Global Step: 328220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:17,271-Speed 8629.84 samples/sec Loss 3.2373 LearningRate 0.0000 Epoch: 19 Global Step: 328230 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:52:18,414-Speed 8960.28 samples/sec Loss 3.2319 LearningRate 0.0000 Epoch: 19 Global Step: 328240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:19,529-Speed 9193.17 samples/sec Loss 3.2814 LearningRate 0.0000 Epoch: 19 Global Step: 328250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:20,649-Speed 9152.19 samples/sec Loss 3.2395 LearningRate 0.0000 Epoch: 19 Global Step: 328260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:21,747-Speed 9327.53 samples/sec Loss 3.2947 LearningRate 0.0000 Epoch: 19 Global Step: 328270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:22,836-Speed 9411.96 samples/sec Loss 3.2910 LearningRate 0.0000 Epoch: 19 Global Step: 328280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:23,966-Speed 9064.08 samples/sec Loss 3.2800 LearningRate 0.0000 Epoch: 19 Global Step: 328290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:25,101-Speed 9028.59 samples/sec Loss 3.2914 LearningRate 0.0000 Epoch: 19 Global Step: 328300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:26,239-Speed 9006.05 samples/sec Loss 3.3100 LearningRate 0.0000 Epoch: 19 Global Step: 328310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:27,354-Speed 9189.38 samples/sec Loss 3.2968 LearningRate 0.0000 Epoch: 19 Global Step: 328320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:28,505-Speed 8902.39 samples/sec Loss 3.3155 LearningRate 0.0000 Epoch: 19 Global Step: 328330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:29,620-Speed 9188.85 samples/sec Loss 3.3403 LearningRate 0.0000 Epoch: 19 Global Step: 328340 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:52:30,743-Speed 9124.62 samples/sec Loss 3.3664 LearningRate 0.0000 Epoch: 19 Global Step: 328350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:31,819-Speed 9534.49 samples/sec Loss 3.2683 LearningRate 0.0000 Epoch: 19 Global Step: 328360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:32,929-Speed 9226.49 samples/sec Loss 3.3566 LearningRate 0.0000 Epoch: 19 Global Step: 328370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:34,112-Speed 8662.43 samples/sec Loss 3.2484 LearningRate 0.0000 Epoch: 19 Global Step: 328380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:35,266-Speed 8877.10 samples/sec Loss 3.3180 LearningRate 0.0000 Epoch: 19 Global Step: 328390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:36,358-Speed 9380.69 samples/sec Loss 3.3400 LearningRate 0.0000 Epoch: 19 Global Step: 328400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:37,530-Speed 8745.59 samples/sec Loss 3.3036 LearningRate 0.0000 Epoch: 19 Global Step: 328410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:38,639-Speed 9237.87 samples/sec Loss 3.2205 LearningRate 0.0000 Epoch: 19 Global Step: 328420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:39,741-Speed 9302.83 samples/sec Loss 3.2752 LearningRate 0.0000 Epoch: 19 Global Step: 328430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:40,848-Speed 9256.68 samples/sec Loss 3.3304 LearningRate 0.0000 Epoch: 19 Global Step: 328440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:41,975-Speed 9091.78 samples/sec Loss 3.2941 LearningRate 0.0000 Epoch: 19 Global Step: 328450 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:52:43,064-Speed 9414.09 samples/sec Loss 3.2425 LearningRate 0.0000 Epoch: 19 Global Step: 328460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:44,163-Speed 9333.35 samples/sec Loss 3.3563 LearningRate 0.0000 Epoch: 19 Global Step: 328470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:45,254-Speed 9390.73 samples/sec Loss 3.2950 LearningRate 0.0000 Epoch: 19 Global Step: 328480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:46,398-Speed 8956.14 samples/sec Loss 3.2981 LearningRate 0.0000 Epoch: 19 Global Step: 328490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:47,540-Speed 8978.79 samples/sec Loss 3.2364 LearningRate 0.0000 Epoch: 19 Global Step: 328500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:48,667-Speed 9091.02 samples/sec Loss 3.2952 LearningRate 0.0000 Epoch: 19 Global Step: 328510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:49,817-Speed 8904.92 samples/sec Loss 3.2780 LearningRate 0.0000 Epoch: 19 Global Step: 328520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:50,869-Speed 9742.41 samples/sec Loss 3.2942 LearningRate 0.0000 Epoch: 19 Global Step: 328530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:51,955-Speed 9434.17 samples/sec Loss 3.3186 LearningRate 0.0000 Epoch: 19 Global Step: 328540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:53,033-Speed 9499.50 samples/sec Loss 3.3352 LearningRate 0.0000 Epoch: 19 Global Step: 328550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:54,116-Speed 9460.52 samples/sec Loss 3.3127 LearningRate 0.0000 Epoch: 19 Global Step: 328560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:55,176-Speed 9669.18 samples/sec Loss 3.3663 LearningRate 0.0000 Epoch: 19 Global Step: 328570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:56,304-Speed 9087.81 samples/sec Loss 3.2808 LearningRate 0.0000 Epoch: 19 Global Step: 328580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:57,416-Speed 9216.19 samples/sec Loss 3.2613 LearningRate 0.0000 Epoch: 19 Global Step: 328590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:58,502-Speed 9433.52 samples/sec Loss 3.2797 LearningRate 0.0000 Epoch: 19 Global Step: 328600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:52:59,621-Speed 9152.51 samples/sec Loss 3.2823 LearningRate 0.0000 Epoch: 19 Global Step: 328610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:00,704-Speed 9463.57 samples/sec Loss 3.3145 LearningRate 0.0000 Epoch: 19 Global Step: 328620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:01,824-Speed 9156.85 samples/sec Loss 3.2744 LearningRate 0.0000 Epoch: 19 Global Step: 328630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:02,934-Speed 9224.32 samples/sec Loss 3.2955 LearningRate 0.0000 Epoch: 19 Global Step: 328640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:04,120-Speed 8638.01 samples/sec Loss 3.3091 LearningRate 0.0000 Epoch: 19 Global Step: 328650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:05,255-Speed 9028.16 samples/sec Loss 3.2956 LearningRate 0.0000 Epoch: 19 Global Step: 328660 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:53:06,311-Speed 9699.07 samples/sec Loss 3.3571 LearningRate 0.0000 Epoch: 19 Global Step: 328670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:07,432-Speed 9146.90 samples/sec Loss 3.2360 LearningRate 0.0000 Epoch: 19 Global Step: 328680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:08,558-Speed 9102.83 samples/sec Loss 3.2665 LearningRate 0.0000 Epoch: 19 Global Step: 328690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:09,701-Speed 8957.21 samples/sec Loss 3.3069 LearningRate 0.0000 Epoch: 19 Global Step: 328700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:10,832-Speed 9060.46 samples/sec Loss 3.2288 LearningRate 0.0000 Epoch: 19 Global Step: 328710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:11,950-Speed 9165.46 samples/sec Loss 3.3762 LearningRate 0.0000 Epoch: 19 Global Step: 328720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:13,057-Speed 9255.90 samples/sec Loss 3.2885 LearningRate 0.0000 Epoch: 19 Global Step: 328730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:14,146-Speed 9415.50 samples/sec Loss 3.3275 LearningRate 0.0000 Epoch: 19 Global Step: 328740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:15,265-Speed 9152.07 samples/sec Loss 3.2788 LearningRate 0.0000 Epoch: 19 Global Step: 328750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:16,352-Speed 9429.21 samples/sec Loss 3.3118 LearningRate 0.0000 Epoch: 19 Global Step: 328760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:17,440-Speed 9420.55 samples/sec Loss 3.2958 LearningRate 0.0000 Epoch: 19 Global Step: 328770 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:53:18,569-Speed 9077.43 samples/sec Loss 3.2656 LearningRate 0.0000 Epoch: 19 Global Step: 328780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:19,692-Speed 9119.04 samples/sec Loss 3.3003 LearningRate 0.0000 Epoch: 19 Global Step: 328790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:20,797-Speed 9270.39 samples/sec Loss 3.2379 LearningRate 0.0000 Epoch: 19 Global Step: 328800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:21,900-Speed 9293.09 samples/sec Loss 3.2667 LearningRate 0.0000 Epoch: 19 Global Step: 328810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:22,983-Speed 9459.16 samples/sec Loss 3.3400 LearningRate 0.0000 Epoch: 19 Global Step: 328820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:24,112-Speed 9078.50 samples/sec Loss 3.2581 LearningRate 0.0000 Epoch: 19 Global Step: 328830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:25,222-Speed 9223.70 samples/sec Loss 3.1944 LearningRate 0.0000 Epoch: 19 Global Step: 328840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:26,361-Speed 9001.37 samples/sec Loss 3.3453 LearningRate 0.0000 Epoch: 19 Global Step: 328850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:27,481-Speed 9147.12 samples/sec Loss 3.2595 LearningRate 0.0000 Epoch: 19 Global Step: 328860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:28,625-Speed 8957.32 samples/sec Loss 3.3015 LearningRate 0.0000 Epoch: 19 Global Step: 328870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:29,811-Speed 8642.65 samples/sec Loss 3.3100 LearningRate 0.0000 Epoch: 19 Global Step: 328880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:30,886-Speed 9525.16 samples/sec Loss 3.2864 LearningRate 0.0000 Epoch: 19 Global Step: 328890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:31,992-Speed 9266.78 samples/sec Loss 3.2485 LearningRate 0.0000 Epoch: 19 Global Step: 328900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:33,089-Speed 9337.03 samples/sec Loss 3.2455 LearningRate 0.0000 Epoch: 19 Global Step: 328910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:34,193-Speed 9278.73 samples/sec Loss 3.2826 LearningRate 0.0000 Epoch: 19 Global Step: 328920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:35,271-Speed 9507.43 samples/sec Loss 3.2173 LearningRate 0.0000 Epoch: 19 Global Step: 328930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:36,330-Speed 9679.72 samples/sec Loss 3.3264 LearningRate 0.0000 Epoch: 19 Global Step: 328940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:37,495-Speed 8794.24 samples/sec Loss 3.2533 LearningRate 0.0000 Epoch: 19 Global Step: 328950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:38,693-Speed 8557.92 samples/sec Loss 3.2241 LearningRate 0.0000 Epoch: 19 Global Step: 328960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:39,775-Speed 9469.36 samples/sec Loss 3.3053 LearningRate 0.0000 Epoch: 19 Global Step: 328970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:40,854-Speed 9491.61 samples/sec Loss 3.3901 LearningRate 0.0000 Epoch: 19 Global Step: 328980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:41,947-Speed 9376.15 samples/sec Loss 3.2493 LearningRate 0.0000 Epoch: 19 Global Step: 328990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:43,041-Speed 9364.31 samples/sec Loss 3.2474 LearningRate 0.0000 Epoch: 19 Global Step: 329000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:44,134-Speed 9376.13 samples/sec Loss 3.2547 LearningRate 0.0000 Epoch: 19 Global Step: 329010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:45,211-Speed 9512.81 samples/sec Loss 3.3523 LearningRate 0.0000 Epoch: 19 Global Step: 329020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:46,296-Speed 9442.10 samples/sec Loss 3.2900 LearningRate 0.0000 Epoch: 19 Global Step: 329030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:47,432-Speed 9021.98 samples/sec Loss 3.3140 LearningRate 0.0000 Epoch: 19 Global Step: 329040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:48,588-Speed 8860.02 samples/sec Loss 3.1978 LearningRate 0.0000 Epoch: 19 Global Step: 329050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:49,750-Speed 8823.31 samples/sec Loss 3.3456 LearningRate 0.0000 Epoch: 19 Global Step: 329060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:53:50,853-Speed 9287.92 samples/sec Loss 3.2928 LearningRate 0.0000 Epoch: 19 Global Step: 329070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:51,956-Speed 9290.30 samples/sec Loss 3.2672 LearningRate 0.0000 Epoch: 19 Global Step: 329080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:53,094-Speed 9005.84 samples/sec Loss 3.2793 LearningRate 0.0000 Epoch: 19 Global Step: 329090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:54,247-Speed 8882.13 samples/sec Loss 3.2605 LearningRate 0.0000 Epoch: 19 Global Step: 329100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:55,378-Speed 9061.18 samples/sec Loss 3.2170 LearningRate 0.0000 Epoch: 19 Global Step: 329110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:56,489-Speed 9230.16 samples/sec Loss 3.2043 LearningRate 0.0000 Epoch: 19 Global Step: 329120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:57,651-Speed 8820.41 samples/sec Loss 3.3169 LearningRate 0.0000 Epoch: 19 Global Step: 329130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:58,776-Speed 9105.83 samples/sec Loss 3.3133 LearningRate 0.0000 Epoch: 19 Global Step: 329140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:53:59,899-Speed 9123.47 samples/sec Loss 3.2984 LearningRate 0.0000 Epoch: 19 Global Step: 329150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:54:01,002-Speed 9292.65 samples/sec Loss 3.2151 LearningRate 0.0000 Epoch: 19 Global Step: 329160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:54:02,108-Speed 9259.30 samples/sec Loss 3.3207 LearningRate 0.0000 Epoch: 19 Global Step: 329170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:03,235-Speed 9091.99 samples/sec Loss 3.3186 LearningRate 0.0000 Epoch: 19 Global Step: 329180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:04,308-Speed 9546.12 samples/sec Loss 3.2480 LearningRate 0.0000 Epoch: 19 Global Step: 329190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:05,370-Speed 9653.24 samples/sec Loss 3.2825 LearningRate 0.0000 Epoch: 19 Global Step: 329200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:06,501-Speed 9052.68 samples/sec Loss 3.3136 LearningRate 0.0000 Epoch: 19 Global Step: 329210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:07,674-Speed 8740.80 samples/sec Loss 3.3281 LearningRate 0.0000 Epoch: 19 Global Step: 329220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:08,814-Speed 8987.88 samples/sec Loss 3.3387 LearningRate 0.0000 Epoch: 19 Global Step: 329230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:09,944-Speed 9068.30 samples/sec Loss 3.2860 LearningRate 0.0000 Epoch: 19 Global Step: 329240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:11,075-Speed 9062.23 samples/sec Loss 3.1992 LearningRate 0.0000 Epoch: 19 Global Step: 329250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:12,212-Speed 9009.99 samples/sec Loss 3.3118 LearningRate 0.0000 Epoch: 19 Global Step: 329260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:13,310-Speed 9332.19 samples/sec Loss 3.3476 LearningRate 0.0000 Epoch: 19 Global Step: 329270 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:54:14,424-Speed 9206.87 samples/sec Loss 3.3431 LearningRate 0.0000 Epoch: 19 Global Step: 329280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:15,505-Speed 9472.55 samples/sec Loss 3.2647 LearningRate 0.0000 Epoch: 19 Global Step: 329290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:16,638-Speed 9047.40 samples/sec Loss 3.2940 LearningRate 0.0000 Epoch: 19 Global Step: 329300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:17,772-Speed 9030.26 samples/sec Loss 3.2992 LearningRate 0.0000 Epoch: 19 Global Step: 329310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:18,903-Speed 9058.90 samples/sec Loss 3.2807 LearningRate 0.0000 Epoch: 19 Global Step: 329320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:19,998-Speed 9361.10 samples/sec Loss 3.2212 LearningRate 0.0000 Epoch: 19 Global Step: 329330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:21,102-Speed 9275.85 samples/sec Loss 3.3307 LearningRate 0.0000 Epoch: 19 Global Step: 329340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:22,219-Speed 9171.96 samples/sec Loss 3.1885 LearningRate 0.0000 Epoch: 19 Global Step: 329350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:23,391-Speed 8745.67 samples/sec Loss 3.3664 LearningRate 0.0000 Epoch: 19 Global Step: 329360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:24,567-Speed 8713.29 samples/sec Loss 3.2767 LearningRate 0.0000 Epoch: 19 Global Step: 329370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:25,713-Speed 8937.97 samples/sec Loss 3.2599 LearningRate 0.0000 Epoch: 19 Global Step: 329380 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:54:26,831-Speed 9164.76 samples/sec Loss 3.2421 LearningRate 0.0000 Epoch: 19 Global Step: 329390 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:54:27,992-Speed 8830.20 samples/sec Loss 3.3041 LearningRate 0.0000 Epoch: 19 Global Step: 329400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:29,109-Speed 9170.07 samples/sec Loss 3.2415 LearningRate 0.0000 Epoch: 19 Global Step: 329410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:30,259-Speed 8912.14 samples/sec Loss 3.2955 LearningRate 0.0000 Epoch: 19 Global Step: 329420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:31,364-Speed 9272.52 samples/sec Loss 3.3137 LearningRate 0.0000 Epoch: 19 Global Step: 329430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:32,467-Speed 9287.14 samples/sec Loss 3.4026 LearningRate 0.0000 Epoch: 19 Global Step: 329440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:33,533-Speed 9611.37 samples/sec Loss 3.3001 LearningRate 0.0000 Epoch: 19 Global Step: 329450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:34,637-Speed 9278.16 samples/sec Loss 3.2805 LearningRate 0.0000 Epoch: 19 Global Step: 329460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:35,749-Speed 9214.32 samples/sec Loss 3.2846 LearningRate 0.0000 Epoch: 19 Global Step: 329470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:36,840-Speed 9393.80 samples/sec Loss 3.2225 LearningRate 0.0000 Epoch: 19 Global Step: 329480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:37,917-Speed 9521.18 samples/sec Loss 3.3243 LearningRate 0.0000 Epoch: 19 Global Step: 329490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:39,044-Speed 9090.26 samples/sec Loss 3.3229 LearningRate 0.0000 Epoch: 19 Global Step: 329500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:40,153-Speed 9232.15 samples/sec Loss 3.2927 LearningRate 0.0000 Epoch: 19 Global Step: 329510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:41,306-Speed 8889.61 samples/sec Loss 3.3232 LearningRate 0.0000 Epoch: 19 Global Step: 329520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:42,419-Speed 9202.97 samples/sec Loss 3.2266 LearningRate 0.0000 Epoch: 19 Global Step: 329530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:43,526-Speed 9258.50 samples/sec Loss 3.3476 LearningRate 0.0000 Epoch: 19 Global Step: 329540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:44,651-Speed 9108.97 samples/sec Loss 3.2380 LearningRate 0.0000 Epoch: 19 Global Step: 329550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:45,728-Speed 9516.19 samples/sec Loss 3.3595 LearningRate 0.0000 Epoch: 19 Global Step: 329560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:46,862-Speed 9036.62 samples/sec Loss 3.2453 LearningRate 0.0000 Epoch: 19 Global Step: 329570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:47,974-Speed 9211.97 samples/sec Loss 3.2315 LearningRate 0.0000 Epoch: 19 Global Step: 329580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:49,128-Speed 8877.11 samples/sec Loss 3.3477 LearningRate 0.0000 Epoch: 19 Global Step: 329590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:50,224-Speed 9353.25 samples/sec Loss 3.2861 LearningRate 0.0000 Epoch: 19 Global Step: 329600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:51,337-Speed 9207.14 samples/sec Loss 3.2969 LearningRate 0.0000 Epoch: 19 Global Step: 329610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:52,451-Speed 9197.63 samples/sec Loss 3.2492 LearningRate 0.0000 Epoch: 19 Global Step: 329620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:53,596-Speed 8944.61 samples/sec Loss 3.3100 LearningRate 0.0000 Epoch: 19 Global Step: 329630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:54,743-Speed 8931.53 samples/sec Loss 3.2675 LearningRate 0.0000 Epoch: 19 Global Step: 329640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:55,868-Speed 9111.30 samples/sec Loss 3.3054 LearningRate 0.0000 Epoch: 19 Global Step: 329650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:56,949-Speed 9482.59 samples/sec Loss 3.3186 LearningRate 0.0000 Epoch: 19 Global Step: 329660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:58,019-Speed 9573.50 samples/sec Loss 3.2413 LearningRate 0.0000 Epoch: 19 Global Step: 329670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:54:59,128-Speed 9234.26 samples/sec Loss 3.3360 LearningRate 0.0000 Epoch: 19 Global Step: 329680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:00,306-Speed 8701.65 samples/sec Loss 3.3115 LearningRate 0.0000 Epoch: 19 Global Step: 329690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:01,432-Speed 9097.20 samples/sec Loss 3.2021 LearningRate 0.0000 Epoch: 19 Global Step: 329700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:02,526-Speed 9362.54 samples/sec Loss 3.3088 LearningRate 0.0000 Epoch: 19 Global Step: 329710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:03,628-Speed 9299.16 samples/sec Loss 3.2987 LearningRate 0.0000 Epoch: 19 Global Step: 329720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:04,809-Speed 8679.74 samples/sec Loss 3.2750 LearningRate 0.0000 Epoch: 19 Global Step: 329730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:05,925-Speed 9175.58 samples/sec Loss 3.3146 LearningRate 0.0000 Epoch: 19 Global Step: 329740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:07,048-Speed 9134.62 samples/sec Loss 3.2776 LearningRate 0.0000 Epoch: 19 Global Step: 329750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:08,173-Speed 9108.14 samples/sec Loss 3.2980 LearningRate 0.0000 Epoch: 19 Global Step: 329760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:09,322-Speed 8918.39 samples/sec Loss 3.2910 LearningRate 0.0000 Epoch: 19 Global Step: 329770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:10,426-Speed 9284.70 samples/sec Loss 3.2964 LearningRate 0.0000 Epoch: 19 Global Step: 329780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:11,506-Speed 9484.78 samples/sec Loss 3.2640 LearningRate 0.0000 Epoch: 19 Global Step: 329790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:12,600-Speed 9362.37 samples/sec Loss 3.2601 LearningRate 0.0000 Epoch: 19 Global Step: 329800 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:55:13,782-Speed 8673.05 samples/sec Loss 3.2125 LearningRate 0.0000 Epoch: 19 Global Step: 329810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:14,895-Speed 9202.02 samples/sec Loss 3.3274 LearningRate 0.0000 Epoch: 19 Global Step: 329820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:15,985-Speed 9406.76 samples/sec Loss 3.3011 LearningRate 0.0000 Epoch: 19 Global Step: 329830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:17,086-Speed 9299.17 samples/sec Loss 3.2071 LearningRate 0.0000 Epoch: 19 Global Step: 329840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:18,177-Speed 9395.10 samples/sec Loss 3.2528 LearningRate 0.0000 Epoch: 19 Global Step: 329850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:19,256-Speed 9493.72 samples/sec Loss 3.2770 LearningRate 0.0000 Epoch: 19 Global Step: 329860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:20,330-Speed 9540.61 samples/sec Loss 3.3131 LearningRate 0.0000 Epoch: 19 Global Step: 329870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:21,441-Speed 9221.20 samples/sec Loss 3.3300 LearningRate 0.0000 Epoch: 19 Global Step: 329880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:22,551-Speed 9228.63 samples/sec Loss 3.2315 LearningRate 0.0000 Epoch: 19 Global Step: 329890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:23,693-Speed 8977.10 samples/sec Loss 3.3650 LearningRate 0.0000 Epoch: 19 Global Step: 329900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:24,829-Speed 9015.36 samples/sec Loss 3.3742 LearningRate 0.0000 Epoch: 19 Global Step: 329910 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:55:25,924-Speed 9365.16 samples/sec Loss 3.2167 LearningRate 0.0000 Epoch: 19 Global Step: 329920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:27,004-Speed 9482.97 samples/sec Loss 3.2089 LearningRate 0.0000 Epoch: 19 Global Step: 329930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:28,105-Speed 9309.49 samples/sec Loss 3.3508 LearningRate 0.0000 Epoch: 19 Global Step: 329940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:29,216-Speed 9217.26 samples/sec Loss 3.3291 LearningRate 0.0000 Epoch: 19 Global Step: 329950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:30,319-Speed 9290.13 samples/sec Loss 3.3192 LearningRate 0.0000 Epoch: 19 Global Step: 329960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:31,457-Speed 9010.32 samples/sec Loss 3.3037 LearningRate 0.0000 Epoch: 19 Global Step: 329970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:32,581-Speed 9115.76 samples/sec Loss 3.2742 LearningRate 0.0000 Epoch: 19 Global Step: 329980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:33,736-Speed 8866.35 samples/sec Loss 3.2220 LearningRate 0.0000 Epoch: 19 Global Step: 329990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:34,881-Speed 8950.99 samples/sec Loss 3.2903 LearningRate 0.0000 Epoch: 19 Global Step: 330000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:55:56,753-[lfw][330000]XNorm: 6.533692 Training: 2022-04-12 00:55:56,754-[lfw][330000]Accuracy-Flip: 0.99633+-0.00287 Training: 2022-04-12 00:55:56,754-[lfw][330000]Accuracy-Highest: 0.99750 Training: 2022-04-12 00:56:22,045-[cfp_fp][330000]XNorm: 5.707386 Training: 2022-04-12 00:56:22,046-[cfp_fp][330000]Accuracy-Flip: 0.97386+-0.00888 Training: 2022-04-12 00:56:22,046-[cfp_fp][330000]Accuracy-Highest: 0.97543 Training: 2022-04-12 00:56:43,894-[agedb_30][330000]XNorm: 6.365573 Training: 2022-04-12 00:56:43,894-[agedb_30][330000]Accuracy-Flip: 0.97400+-0.00720 Training: 2022-04-12 00:56:43,894-[agedb_30][330000]Accuracy-Highest: 0.97417 Training: 2022-04-12 00:56:45,053-Speed 145.93 samples/sec Loss 3.2732 LearningRate 0.0000 Epoch: 19 Global Step: 330010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:46,187-Speed 9031.81 samples/sec Loss 3.3010 LearningRate 0.0000 Epoch: 19 Global Step: 330020 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:56:47,310-Speed 9119.94 samples/sec Loss 3.2775 LearningRate 0.0000 Epoch: 19 Global Step: 330030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:48,388-Speed 9513.77 samples/sec Loss 3.2932 LearningRate 0.0000 Epoch: 19 Global Step: 330040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:49,535-Speed 8929.03 samples/sec Loss 3.3366 LearningRate 0.0000 Epoch: 19 Global Step: 330050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:50,655-Speed 9152.40 samples/sec Loss 3.1703 LearningRate 0.0000 Epoch: 19 Global Step: 330060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:51,786-Speed 9057.37 samples/sec Loss 3.2951 LearningRate 0.0000 Epoch: 19 Global Step: 330070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:52,884-Speed 9330.47 samples/sec Loss 3.2730 LearningRate 0.0000 Epoch: 19 Global Step: 330080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:54,029-Speed 8946.49 samples/sec Loss 3.3658 LearningRate 0.0000 Epoch: 19 Global Step: 330090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:55,155-Speed 9099.66 samples/sec Loss 3.2280 LearningRate 0.0000 Epoch: 19 Global Step: 330100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:56,256-Speed 9306.10 samples/sec Loss 3.2998 LearningRate 0.0000 Epoch: 19 Global Step: 330110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:57,403-Speed 8933.03 samples/sec Loss 3.2664 LearningRate 0.0000 Epoch: 19 Global Step: 330120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:58,494-Speed 9391.87 samples/sec Loss 3.3686 LearningRate 0.0000 Epoch: 19 Global Step: 330130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:56:59,595-Speed 9309.42 samples/sec Loss 3.2610 LearningRate 0.0000 Epoch: 19 Global Step: 330140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:00,710-Speed 9188.16 samples/sec Loss 3.3301 LearningRate 0.0000 Epoch: 19 Global Step: 330150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:01,923-Speed 8446.29 samples/sec Loss 3.3137 LearningRate 0.0000 Epoch: 19 Global Step: 330160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:03,017-Speed 9363.59 samples/sec Loss 3.2414 LearningRate 0.0000 Epoch: 19 Global Step: 330170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:04,130-Speed 9214.89 samples/sec Loss 3.2495 LearningRate 0.0000 Epoch: 19 Global Step: 330180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:05,237-Speed 9255.48 samples/sec Loss 3.2689 LearningRate 0.0000 Epoch: 19 Global Step: 330190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:06,367-Speed 9065.87 samples/sec Loss 3.3149 LearningRate 0.0000 Epoch: 19 Global Step: 330200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:07,455-Speed 9413.30 samples/sec Loss 3.2506 LearningRate 0.0000 Epoch: 19 Global Step: 330210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:08,575-Speed 9146.56 samples/sec Loss 3.3055 LearningRate 0.0000 Epoch: 19 Global Step: 330220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:09,670-Speed 9360.50 samples/sec Loss 3.2679 LearningRate 0.0000 Epoch: 19 Global Step: 330230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:10,766-Speed 9342.69 samples/sec Loss 3.2677 LearningRate 0.0000 Epoch: 19 Global Step: 330240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:11,876-Speed 9236.88 samples/sec Loss 3.2271 LearningRate 0.0000 Epoch: 19 Global Step: 330250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:13,011-Speed 9037.65 samples/sec Loss 3.2859 LearningRate 0.0000 Epoch: 19 Global Step: 330260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:14,100-Speed 9413.67 samples/sec Loss 3.3035 LearningRate 0.0000 Epoch: 19 Global Step: 330270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:15,229-Speed 9079.19 samples/sec Loss 3.2718 LearningRate 0.0000 Epoch: 19 Global Step: 330280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:16,355-Speed 9098.75 samples/sec Loss 3.2277 LearningRate 0.0000 Epoch: 19 Global Step: 330290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:17,470-Speed 9188.92 samples/sec Loss 3.2969 LearningRate 0.0000 Epoch: 19 Global Step: 330300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:18,586-Speed 9184.01 samples/sec Loss 3.3188 LearningRate 0.0000 Epoch: 19 Global Step: 330310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:19,758-Speed 8745.51 samples/sec Loss 3.3135 LearningRate 0.0000 Epoch: 19 Global Step: 330320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:20,901-Speed 8959.84 samples/sec Loss 3.2368 LearningRate 0.0000 Epoch: 19 Global Step: 330330 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:57:22,008-Speed 9251.69 samples/sec Loss 3.3275 LearningRate 0.0000 Epoch: 19 Global Step: 330340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:23,118-Speed 9232.28 samples/sec Loss 3.3237 LearningRate 0.0000 Epoch: 19 Global Step: 330350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:24,272-Speed 8880.55 samples/sec Loss 3.2500 LearningRate 0.0000 Epoch: 19 Global Step: 330360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:25,398-Speed 9100.28 samples/sec Loss 3.2535 LearningRate 0.0000 Epoch: 19 Global Step: 330370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:26,529-Speed 9058.00 samples/sec Loss 3.2732 LearningRate 0.0000 Epoch: 19 Global Step: 330380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:27,660-Speed 9060.26 samples/sec Loss 3.3130 LearningRate 0.0000 Epoch: 19 Global Step: 330390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:28,801-Speed 8974.92 samples/sec Loss 3.3059 LearningRate 0.0000 Epoch: 19 Global Step: 330400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:29,932-Speed 9061.29 samples/sec Loss 3.2778 LearningRate 0.0000 Epoch: 19 Global Step: 330410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:31,077-Speed 8943.10 samples/sec Loss 3.2387 LearningRate 0.0000 Epoch: 19 Global Step: 330420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:32,189-Speed 9224.33 samples/sec Loss 3.4047 LearningRate 0.0000 Epoch: 19 Global Step: 330430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:33,306-Speed 9174.90 samples/sec Loss 3.2659 LearningRate 0.0000 Epoch: 19 Global Step: 330440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:34,416-Speed 9227.17 samples/sec Loss 3.2557 LearningRate 0.0000 Epoch: 19 Global Step: 330450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:35,536-Speed 9151.17 samples/sec Loss 3.3379 LearningRate 0.0000 Epoch: 19 Global Step: 330460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:36,671-Speed 9031.19 samples/sec Loss 3.3172 LearningRate 0.0000 Epoch: 19 Global Step: 330470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:37,782-Speed 9225.57 samples/sec Loss 3.2390 LearningRate 0.0000 Epoch: 19 Global Step: 330480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:38,895-Speed 9199.66 samples/sec Loss 3.2840 LearningRate 0.0000 Epoch: 19 Global Step: 330490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:40,045-Speed 8914.37 samples/sec Loss 3.3304 LearningRate 0.0000 Epoch: 19 Global Step: 330500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:41,167-Speed 9126.46 samples/sec Loss 3.2603 LearningRate 0.0000 Epoch: 19 Global Step: 330510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:42,288-Speed 9145.86 samples/sec Loss 3.2652 LearningRate 0.0000 Epoch: 19 Global Step: 330520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:43,448-Speed 8830.60 samples/sec Loss 3.3536 LearningRate 0.0000 Epoch: 19 Global Step: 330530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:44,555-Speed 9255.45 samples/sec Loss 3.2736 LearningRate 0.0000 Epoch: 19 Global Step: 330540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:45,629-Speed 9541.19 samples/sec Loss 3.3802 LearningRate 0.0000 Epoch: 19 Global Step: 330550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:57:46,759-Speed 9068.47 samples/sec Loss 3.2640 LearningRate 0.0000 Epoch: 19 Global Step: 330560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:57:47,879-Speed 9152.12 samples/sec Loss 3.3357 LearningRate 0.0000 Epoch: 19 Global Step: 330570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:57:48,968-Speed 9406.61 samples/sec Loss 3.3337 LearningRate 0.0000 Epoch: 19 Global Step: 330580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:57:50,108-Speed 8985.84 samples/sec Loss 3.2721 LearningRate 0.0000 Epoch: 19 Global Step: 330590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:57:51,250-Speed 8977.74 samples/sec Loss 3.3116 LearningRate 0.0000 Epoch: 19 Global Step: 330600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:57:52,413-Speed 8810.51 samples/sec Loss 3.2657 LearningRate 0.0000 Epoch: 19 Global Step: 330610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:57:53,518-Speed 9268.57 samples/sec Loss 3.2602 LearningRate 0.0000 Epoch: 19 Global Step: 330620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:57:54,647-Speed 9079.60 samples/sec Loss 3.3163 LearningRate 0.0000 Epoch: 19 Global Step: 330630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:57:55,770-Speed 9121.43 samples/sec Loss 3.2444 LearningRate 0.0000 Epoch: 19 Global Step: 330640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:57:56,933-Speed 8809.91 samples/sec Loss 3.3468 LearningRate 0.0000 Epoch: 19 Global Step: 330650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:58,033-Speed 9311.84 samples/sec Loss 3.2300 LearningRate 0.0000 Epoch: 19 Global Step: 330660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:57:59,134-Speed 9307.12 samples/sec Loss 3.2822 LearningRate 0.0000 Epoch: 19 Global Step: 330670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:00,234-Speed 9318.68 samples/sec Loss 3.2290 LearningRate 0.0000 Epoch: 19 Global Step: 330680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:01,336-Speed 9296.71 samples/sec Loss 3.3152 LearningRate 0.0000 Epoch: 19 Global Step: 330690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:02,451-Speed 9191.16 samples/sec Loss 3.2915 LearningRate 0.0000 Epoch: 19 Global Step: 330700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:03,562-Speed 9221.46 samples/sec Loss 3.2618 LearningRate 0.0000 Epoch: 19 Global Step: 330710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:04,637-Speed 9530.67 samples/sec Loss 3.3093 LearningRate 0.0000 Epoch: 19 Global Step: 330720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:05,740-Speed 9285.03 samples/sec Loss 3.2982 LearningRate 0.0000 Epoch: 19 Global Step: 330730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:06,854-Speed 9204.93 samples/sec Loss 3.3592 LearningRate 0.0000 Epoch: 19 Global Step: 330740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:07,969-Speed 9188.87 samples/sec Loss 3.3045 LearningRate 0.0000 Epoch: 19 Global Step: 330750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:09,120-Speed 8900.61 samples/sec Loss 3.1943 LearningRate 0.0000 Epoch: 19 Global Step: 330760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:10,237-Speed 9166.74 samples/sec Loss 3.2696 LearningRate 0.0000 Epoch: 19 Global Step: 330770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:11,338-Speed 9318.36 samples/sec Loss 3.2374 LearningRate 0.0000 Epoch: 19 Global Step: 330780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:12,437-Speed 9321.08 samples/sec Loss 3.3527 LearningRate 0.0000 Epoch: 19 Global Step: 330790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:13,568-Speed 9062.33 samples/sec Loss 3.2660 LearningRate 0.0000 Epoch: 19 Global Step: 330800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:14,709-Speed 8976.41 samples/sec Loss 3.2622 LearningRate 0.0000 Epoch: 19 Global Step: 330810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:15,850-Speed 8980.65 samples/sec Loss 3.3335 LearningRate 0.0000 Epoch: 19 Global Step: 330820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:16,932-Speed 9468.98 samples/sec Loss 3.3102 LearningRate 0.0000 Epoch: 19 Global Step: 330830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:18,058-Speed 9100.10 samples/sec Loss 3.2603 LearningRate 0.0000 Epoch: 19 Global Step: 330840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:19,171-Speed 9211.97 samples/sec Loss 3.2554 LearningRate 0.0000 Epoch: 19 Global Step: 330850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:58:20,260-Speed 9409.84 samples/sec Loss 3.2964 LearningRate 0.0000 Epoch: 19 Global Step: 330860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:21,365-Speed 9272.05 samples/sec Loss 3.3299 LearningRate 0.0000 Epoch: 19 Global Step: 330870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:22,530-Speed 8792.84 samples/sec Loss 3.2553 LearningRate 0.0000 Epoch: 19 Global Step: 330880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:23,638-Speed 9251.08 samples/sec Loss 3.2893 LearningRate 0.0000 Epoch: 19 Global Step: 330890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:24,726-Speed 9415.52 samples/sec Loss 3.2276 LearningRate 0.0000 Epoch: 19 Global Step: 330900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:25,869-Speed 8959.55 samples/sec Loss 3.3177 LearningRate 0.0000 Epoch: 19 Global Step: 330910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:26,981-Speed 9216.99 samples/sec Loss 3.3368 LearningRate 0.0000 Epoch: 19 Global Step: 330920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:28,085-Speed 9282.84 samples/sec Loss 3.3321 LearningRate 0.0000 Epoch: 19 Global Step: 330930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:29,246-Speed 8825.11 samples/sec Loss 3.2645 LearningRate 0.0000 Epoch: 19 Global Step: 330940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:30,388-Speed 8976.21 samples/sec Loss 3.2490 LearningRate 0.0000 Epoch: 19 Global Step: 330950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:31,479-Speed 9393.09 samples/sec Loss 3.3469 LearningRate 0.0000 Epoch: 19 Global Step: 330960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:32,591-Speed 9207.80 samples/sec Loss 3.3647 LearningRate 0.0000 Epoch: 19 Global Step: 330970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:33,682-Speed 9390.11 samples/sec Loss 3.2961 LearningRate 0.0000 Epoch: 19 Global Step: 330980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:34,772-Speed 9402.49 samples/sec Loss 3.2691 LearningRate 0.0000 Epoch: 19 Global Step: 330990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:35,911-Speed 8991.57 samples/sec Loss 3.3106 LearningRate 0.0000 Epoch: 19 Global Step: 331000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:37,063-Speed 8895.00 samples/sec Loss 3.2982 LearningRate 0.0000 Epoch: 19 Global Step: 331010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:38,253-Speed 8612.91 samples/sec Loss 3.3132 LearningRate 0.0000 Epoch: 19 Global Step: 331020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:39,392-Speed 8988.96 samples/sec Loss 3.2377 LearningRate 0.0000 Epoch: 19 Global Step: 331030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:40,508-Speed 9182.41 samples/sec Loss 3.2922 LearningRate 0.0000 Epoch: 19 Global Step: 331040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:41,607-Speed 9325.17 samples/sec Loss 3.3340 LearningRate 0.0000 Epoch: 19 Global Step: 331050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:42,743-Speed 9018.57 samples/sec Loss 3.3289 LearningRate 0.0000 Epoch: 19 Global Step: 331060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:43,927-Speed 8656.54 samples/sec Loss 3.3072 LearningRate 0.0000 Epoch: 19 Global Step: 331070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:45,028-Speed 9305.96 samples/sec Loss 3.2838 LearningRate 0.0000 Epoch: 19 Global Step: 331080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:46,138-Speed 9228.69 samples/sec Loss 3.3202 LearningRate 0.0000 Epoch: 19 Global Step: 331090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:47,227-Speed 9411.98 samples/sec Loss 3.2817 LearningRate 0.0000 Epoch: 19 Global Step: 331100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:48,323-Speed 9349.18 samples/sec Loss 3.2964 LearningRate 0.0000 Epoch: 19 Global Step: 331110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:49,425-Speed 9298.98 samples/sec Loss 3.2924 LearningRate 0.0000 Epoch: 19 Global Step: 331120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:50,514-Speed 9405.57 samples/sec Loss 3.2292 LearningRate 0.0000 Epoch: 19 Global Step: 331130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:51,634-Speed 9147.01 samples/sec Loss 3.2832 LearningRate 0.0000 Epoch: 19 Global Step: 331140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:52,775-Speed 8982.59 samples/sec Loss 3.3174 LearningRate 0.0000 Epoch: 19 Global Step: 331150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:53,892-Speed 9173.29 samples/sec Loss 3.3007 LearningRate 0.0000 Epoch: 19 Global Step: 331160 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:58:55,023-Speed 9062.00 samples/sec Loss 3.2626 LearningRate 0.0000 Epoch: 19 Global Step: 331170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:56,148-Speed 9102.22 samples/sec Loss 3.2447 LearningRate 0.0000 Epoch: 19 Global Step: 331180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:57,316-Speed 8771.17 samples/sec Loss 3.3052 LearningRate 0.0000 Epoch: 19 Global Step: 331190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:58,421-Speed 9272.66 samples/sec Loss 3.3073 LearningRate 0.0000 Epoch: 19 Global Step: 331200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:58:59,530-Speed 9238.41 samples/sec Loss 3.2940 LearningRate 0.0000 Epoch: 19 Global Step: 331210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:00,622-Speed 9384.16 samples/sec Loss 3.2583 LearningRate 0.0000 Epoch: 19 Global Step: 331220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:01,791-Speed 8768.73 samples/sec Loss 3.2558 LearningRate 0.0000 Epoch: 19 Global Step: 331230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:02,964-Speed 8733.03 samples/sec Loss 3.2469 LearningRate 0.0000 Epoch: 19 Global Step: 331240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:04,057-Speed 9374.04 samples/sec Loss 3.2609 LearningRate 0.0000 Epoch: 19 Global Step: 331250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:05,178-Speed 9144.00 samples/sec Loss 3.2839 LearningRate 0.0000 Epoch: 19 Global Step: 331260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:06,233-Speed 9705.50 samples/sec Loss 3.2146 LearningRate 0.0000 Epoch: 19 Global Step: 331270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:07,327-Speed 9364.50 samples/sec Loss 3.2964 LearningRate 0.0000 Epoch: 19 Global Step: 331280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:08,412-Speed 9443.27 samples/sec Loss 3.2914 LearningRate 0.0000 Epoch: 19 Global Step: 331290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:09,515-Speed 9292.76 samples/sec Loss 3.2847 LearningRate 0.0000 Epoch: 19 Global Step: 331300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:10,591-Speed 9520.06 samples/sec Loss 3.3257 LearningRate 0.0000 Epoch: 19 Global Step: 331310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:11,675-Speed 9450.08 samples/sec Loss 3.2313 LearningRate 0.0000 Epoch: 19 Global Step: 331320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:12,835-Speed 8835.80 samples/sec Loss 3.2885 LearningRate 0.0000 Epoch: 19 Global Step: 331330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:13,936-Speed 9305.85 samples/sec Loss 3.3241 LearningRate 0.0000 Epoch: 19 Global Step: 331340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:15,075-Speed 9001.98 samples/sec Loss 3.2973 LearningRate 0.0000 Epoch: 19 Global Step: 331350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:16,220-Speed 8942.31 samples/sec Loss 3.2913 LearningRate 0.0000 Epoch: 19 Global Step: 331360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:17,387-Speed 8779.49 samples/sec Loss 3.3207 LearningRate 0.0000 Epoch: 19 Global Step: 331370 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 00:59:18,505-Speed 9165.87 samples/sec Loss 3.3194 LearningRate 0.0000 Epoch: 19 Global Step: 331380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:19,684-Speed 8694.93 samples/sec Loss 3.2205 LearningRate 0.0000 Epoch: 19 Global Step: 331390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:20,795-Speed 9222.39 samples/sec Loss 3.2846 LearningRate 0.0000 Epoch: 19 Global Step: 331400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:21,909-Speed 9202.95 samples/sec Loss 3.2861 LearningRate 0.0000 Epoch: 19 Global Step: 331410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:22,999-Speed 9397.88 samples/sec Loss 3.3423 LearningRate 0.0000 Epoch: 19 Global Step: 331420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:24,113-Speed 9197.95 samples/sec Loss 3.3001 LearningRate 0.0000 Epoch: 19 Global Step: 331430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:25,250-Speed 9010.33 samples/sec Loss 3.2263 LearningRate 0.0000 Epoch: 19 Global Step: 331440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:26,381-Speed 9055.10 samples/sec Loss 3.3192 LearningRate 0.0000 Epoch: 19 Global Step: 331450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:27,507-Speed 9100.31 samples/sec Loss 3.2830 LearningRate 0.0000 Epoch: 19 Global Step: 331460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:28,575-Speed 9609.03 samples/sec Loss 3.3383 LearningRate 0.0000 Epoch: 19 Global Step: 331470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:29,740-Speed 8797.56 samples/sec Loss 3.3383 LearningRate 0.0000 Epoch: 19 Global Step: 331480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:30,912-Speed 8742.54 samples/sec Loss 3.3056 LearningRate 0.0000 Epoch: 19 Global Step: 331490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:32,039-Speed 9091.85 samples/sec Loss 3.2700 LearningRate 0.0000 Epoch: 19 Global Step: 331500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:33,158-Speed 9161.29 samples/sec Loss 3.1974 LearningRate 0.0000 Epoch: 19 Global Step: 331510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:34,297-Speed 8991.56 samples/sec Loss 3.2867 LearningRate 0.0000 Epoch: 19 Global Step: 331520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:35,415-Speed 9163.67 samples/sec Loss 3.2598 LearningRate 0.0000 Epoch: 19 Global Step: 331530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:36,536-Speed 9136.75 samples/sec Loss 3.3054 LearningRate 0.0000 Epoch: 19 Global Step: 331540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:37,645-Speed 9245.97 samples/sec Loss 3.2341 LearningRate 0.0000 Epoch: 19 Global Step: 331550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:38,785-Speed 8986.59 samples/sec Loss 3.2696 LearningRate 0.0000 Epoch: 19 Global Step: 331560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:39,874-Speed 9408.59 samples/sec Loss 3.2934 LearningRate 0.0000 Epoch: 19 Global Step: 331570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:41,011-Speed 9015.00 samples/sec Loss 3.3378 LearningRate 0.0000 Epoch: 19 Global Step: 331580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:42,116-Speed 9268.84 samples/sec Loss 3.2761 LearningRate 0.0000 Epoch: 19 Global Step: 331590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:43,233-Speed 9182.24 samples/sec Loss 3.3078 LearningRate 0.0000 Epoch: 19 Global Step: 331600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:44,325-Speed 9376.82 samples/sec Loss 3.2560 LearningRate 0.0000 Epoch: 19 Global Step: 331610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:45,435-Speed 9233.03 samples/sec Loss 3.2328 LearningRate 0.0000 Epoch: 19 Global Step: 331620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:46,531-Speed 9347.66 samples/sec Loss 3.2724 LearningRate 0.0000 Epoch: 19 Global Step: 331630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 00:59:47,660-Speed 9072.59 samples/sec Loss 3.3535 LearningRate 0.0000 Epoch: 19 Global Step: 331640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:48,776-Speed 9182.45 samples/sec Loss 3.2389 LearningRate 0.0000 Epoch: 19 Global Step: 331650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:49,926-Speed 8917.40 samples/sec Loss 3.3209 LearningRate 0.0000 Epoch: 19 Global Step: 331660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:51,053-Speed 9088.01 samples/sec Loss 3.3941 LearningRate 0.0000 Epoch: 19 Global Step: 331670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:52,148-Speed 9362.03 samples/sec Loss 3.3189 LearningRate 0.0000 Epoch: 19 Global Step: 331680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:53,277-Speed 9071.92 samples/sec Loss 3.2795 LearningRate 0.0000 Epoch: 19 Global Step: 331690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:54,424-Speed 8935.01 samples/sec Loss 3.3349 LearningRate 0.0000 Epoch: 19 Global Step: 331700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:55,572-Speed 8920.01 samples/sec Loss 3.2496 LearningRate 0.0000 Epoch: 19 Global Step: 331710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:56,666-Speed 9369.03 samples/sec Loss 3.3109 LearningRate 0.0000 Epoch: 19 Global Step: 331720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:57,798-Speed 9049.89 samples/sec Loss 3.2159 LearningRate 0.0000 Epoch: 19 Global Step: 331730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 00:59:58,898-Speed 9318.00 samples/sec Loss 3.2732 LearningRate 0.0000 Epoch: 19 Global Step: 331740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:00,009-Speed 9221.25 samples/sec Loss 3.2577 LearningRate 0.0000 Epoch: 19 Global Step: 331750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:01,132-Speed 9125.76 samples/sec Loss 3.3672 LearningRate 0.0000 Epoch: 19 Global Step: 331760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:02,270-Speed 9000.42 samples/sec Loss 3.3409 LearningRate 0.0000 Epoch: 19 Global Step: 331770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:03,408-Speed 9004.84 samples/sec Loss 3.2712 LearningRate 0.0000 Epoch: 19 Global Step: 331780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:04,508-Speed 9316.06 samples/sec Loss 3.3494 LearningRate 0.0000 Epoch: 19 Global Step: 331790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:05,589-Speed 9474.91 samples/sec Loss 3.2931 LearningRate 0.0000 Epoch: 19 Global Step: 331800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:06,716-Speed 9094.73 samples/sec Loss 3.2987 LearningRate 0.0000 Epoch: 19 Global Step: 331810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:07,852-Speed 9020.51 samples/sec Loss 3.3217 LearningRate 0.0000 Epoch: 19 Global Step: 331820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:08,952-Speed 9316.53 samples/sec Loss 3.2872 LearningRate 0.0000 Epoch: 19 Global Step: 331830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:10,011-Speed 9674.77 samples/sec Loss 3.3330 LearningRate 0.0000 Epoch: 19 Global Step: 331840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:11,162-Speed 8901.42 samples/sec Loss 3.2757 LearningRate 0.0000 Epoch: 19 Global Step: 331850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:12,286-Speed 9109.67 samples/sec Loss 3.3024 LearningRate 0.0000 Epoch: 19 Global Step: 331860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:13,414-Speed 9085.36 samples/sec Loss 3.2944 LearningRate 0.0000 Epoch: 19 Global Step: 331870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:14,550-Speed 9021.50 samples/sec Loss 3.2739 LearningRate 0.0000 Epoch: 19 Global Step: 331880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:15,679-Speed 9076.49 samples/sec Loss 3.3015 LearningRate 0.0000 Epoch: 19 Global Step: 331890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:16,794-Speed 9192.98 samples/sec Loss 3.3226 LearningRate 0.0000 Epoch: 19 Global Step: 331900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:17,876-Speed 9465.58 samples/sec Loss 3.3142 LearningRate 0.0000 Epoch: 19 Global Step: 331910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:18,965-Speed 9407.88 samples/sec Loss 3.3203 LearningRate 0.0000 Epoch: 19 Global Step: 331920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:20,062-Speed 9347.46 samples/sec Loss 3.2620 LearningRate 0.0000 Epoch: 19 Global Step: 331930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:21,143-Speed 9472.61 samples/sec Loss 3.3353 LearningRate 0.0000 Epoch: 19 Global Step: 331940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:22,240-Speed 9343.95 samples/sec Loss 3.2416 LearningRate 0.0000 Epoch: 19 Global Step: 331950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:23,380-Speed 8983.13 samples/sec Loss 3.3194 LearningRate 0.0000 Epoch: 19 Global Step: 331960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:24,525-Speed 8946.37 samples/sec Loss 3.2807 LearningRate 0.0000 Epoch: 19 Global Step: 331970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:25,666-Speed 8987.65 samples/sec Loss 3.2978 LearningRate 0.0000 Epoch: 19 Global Step: 331980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:26,795-Speed 9071.90 samples/sec Loss 3.2595 LearningRate 0.0000 Epoch: 19 Global Step: 331990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:27,892-Speed 9335.05 samples/sec Loss 3.1670 LearningRate 0.0000 Epoch: 19 Global Step: 332000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:00:49,671-[lfw][332000]XNorm: 6.504402 Training: 2022-04-12 01:00:49,671-[lfw][332000]Accuracy-Flip: 0.99667+-0.00269 Training: 2022-04-12 01:00:49,672-[lfw][332000]Accuracy-Highest: 0.99750 Training: 2022-04-12 01:01:14,848-[cfp_fp][332000]XNorm: 5.682261 Training: 2022-04-12 01:01:14,849-[cfp_fp][332000]Accuracy-Flip: 0.97443+-0.00812 Training: 2022-04-12 01:01:14,849-[cfp_fp][332000]Accuracy-Highest: 0.97543 Training: 2022-04-12 01:01:36,562-[agedb_30][332000]XNorm: 6.337553 Training: 2022-04-12 01:01:36,563-[agedb_30][332000]Accuracy-Flip: 0.97217+-0.00817 Training: 2022-04-12 01:01:36,563-[agedb_30][332000]Accuracy-Highest: 0.97417 Training: 2022-04-12 01:01:37,701-Speed 146.69 samples/sec Loss 3.2398 LearningRate 0.0000 Epoch: 19 Global Step: 332010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:38,814-Speed 9204.38 samples/sec Loss 3.2623 LearningRate 0.0000 Epoch: 19 Global Step: 332020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:39,896-Speed 9475.86 samples/sec Loss 3.3320 LearningRate 0.0000 Epoch: 19 Global Step: 332030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:40,956-Speed 9659.08 samples/sec Loss 3.3023 LearningRate 0.0000 Epoch: 19 Global Step: 332040 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:01:42,051-Speed 9360.06 samples/sec Loss 3.2722 LearningRate 0.0000 Epoch: 19 Global Step: 332050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:43,187-Speed 9020.47 samples/sec Loss 3.2972 LearningRate 0.0000 Epoch: 19 Global Step: 332060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:44,278-Speed 9395.38 samples/sec Loss 3.2994 LearningRate 0.0000 Epoch: 19 Global Step: 332070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:45,372-Speed 9358.86 samples/sec Loss 3.3487 LearningRate 0.0000 Epoch: 19 Global Step: 332080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:46,445-Speed 9547.34 samples/sec Loss 3.2009 LearningRate 0.0000 Epoch: 19 Global Step: 332090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:47,519-Speed 9544.26 samples/sec Loss 3.2757 LearningRate 0.0000 Epoch: 19 Global Step: 332100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:48,674-Speed 8871.57 samples/sec Loss 3.2584 LearningRate 0.0000 Epoch: 19 Global Step: 332110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:49,774-Speed 9316.79 samples/sec Loss 3.2400 LearningRate 0.0000 Epoch: 19 Global Step: 332120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:50,936-Speed 8814.05 samples/sec Loss 3.4490 LearningRate 0.0000 Epoch: 19 Global Step: 332130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:52,031-Speed 9357.46 samples/sec Loss 3.3160 LearningRate 0.0000 Epoch: 19 Global Step: 332140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:53,106-Speed 9532.36 samples/sec Loss 3.2285 LearningRate 0.0000 Epoch: 19 Global Step: 332150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:54,188-Speed 9472.39 samples/sec Loss 3.3444 LearningRate 0.0000 Epoch: 19 Global Step: 332160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:55,289-Speed 9306.57 samples/sec Loss 3.2415 LearningRate 0.0000 Epoch: 19 Global Step: 332170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:56,428-Speed 8995.18 samples/sec Loss 3.3190 LearningRate 0.0000 Epoch: 19 Global Step: 332180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:57,580-Speed 8890.87 samples/sec Loss 3.3323 LearningRate 0.0000 Epoch: 19 Global Step: 332190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:58,691-Speed 9222.63 samples/sec Loss 3.3825 LearningRate 0.0000 Epoch: 19 Global Step: 332200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:01:59,751-Speed 9666.86 samples/sec Loss 3.2665 LearningRate 0.0000 Epoch: 19 Global Step: 332210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:00,826-Speed 9527.22 samples/sec Loss 3.2939 LearningRate 0.0000 Epoch: 19 Global Step: 332220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:01,905-Speed 9500.38 samples/sec Loss 3.2947 LearningRate 0.0000 Epoch: 19 Global Step: 332230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:03,029-Speed 9114.42 samples/sec Loss 3.2498 LearningRate 0.0000 Epoch: 19 Global Step: 332240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:04,240-Speed 8462.31 samples/sec Loss 3.2827 LearningRate 0.0000 Epoch: 19 Global Step: 332250 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:02:05,335-Speed 9352.13 samples/sec Loss 3.3626 LearningRate 0.0000 Epoch: 19 Global Step: 332260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:06,468-Speed 9047.99 samples/sec Loss 3.1880 LearningRate 0.0000 Epoch: 19 Global Step: 332270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:07,614-Speed 8938.15 samples/sec Loss 3.2499 LearningRate 0.0000 Epoch: 19 Global Step: 332280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:08,735-Speed 9144.04 samples/sec Loss 3.2140 LearningRate 0.0000 Epoch: 19 Global Step: 332290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:09,871-Speed 9020.70 samples/sec Loss 3.3006 LearningRate 0.0000 Epoch: 19 Global Step: 332300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:11,025-Speed 8879.48 samples/sec Loss 3.2452 LearningRate 0.0000 Epoch: 19 Global Step: 332310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:12,153-Speed 9084.02 samples/sec Loss 3.2957 LearningRate 0.0000 Epoch: 19 Global Step: 332320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:13,263-Speed 9235.26 samples/sec Loss 3.3151 LearningRate 0.0000 Epoch: 19 Global Step: 332330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:14,359-Speed 9344.77 samples/sec Loss 3.3457 LearningRate 0.0000 Epoch: 19 Global Step: 332340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:15,426-Speed 9599.11 samples/sec Loss 3.3019 LearningRate 0.0000 Epoch: 19 Global Step: 332350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:16,518-Speed 9382.08 samples/sec Loss 3.2592 LearningRate 0.0000 Epoch: 19 Global Step: 332360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:17,657-Speed 8996.56 samples/sec Loss 3.3222 LearningRate 0.0000 Epoch: 19 Global Step: 332370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:18,772-Speed 9188.42 samples/sec Loss 3.3869 LearningRate 0.0000 Epoch: 19 Global Step: 332380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:19,859-Speed 9434.80 samples/sec Loss 3.2653 LearningRate 0.0000 Epoch: 19 Global Step: 332390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:20,975-Speed 9182.19 samples/sec Loss 3.3734 LearningRate 0.0000 Epoch: 19 Global Step: 332400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:22,110-Speed 9024.23 samples/sec Loss 3.3564 LearningRate 0.0000 Epoch: 19 Global Step: 332410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:23,226-Speed 9178.39 samples/sec Loss 3.2562 LearningRate 0.0000 Epoch: 19 Global Step: 332420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:24,341-Speed 9190.98 samples/sec Loss 3.2419 LearningRate 0.0000 Epoch: 19 Global Step: 332430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:25,427-Speed 9429.18 samples/sec Loss 3.2983 LearningRate 0.0000 Epoch: 19 Global Step: 332440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:26,526-Speed 9325.85 samples/sec Loss 3.2735 LearningRate 0.0000 Epoch: 19 Global Step: 332450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:27,655-Speed 9074.17 samples/sec Loss 3.2766 LearningRate 0.0000 Epoch: 19 Global Step: 332460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:02:28,748-Speed 9373.42 samples/sec Loss 3.2481 LearningRate 0.0000 Epoch: 19 Global Step: 332470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:29,880-Speed 9056.07 samples/sec Loss 3.2899 LearningRate 0.0000 Epoch: 19 Global Step: 332480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:30,990-Speed 9235.77 samples/sec Loss 3.2881 LearningRate 0.0000 Epoch: 19 Global Step: 332490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:32,100-Speed 9227.31 samples/sec Loss 3.2490 LearningRate 0.0000 Epoch: 19 Global Step: 332500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:33,232-Speed 9051.37 samples/sec Loss 3.3209 LearningRate 0.0000 Epoch: 19 Global Step: 332510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:34,366-Speed 9037.02 samples/sec Loss 3.2712 LearningRate 0.0000 Epoch: 19 Global Step: 332520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:35,445-Speed 9496.35 samples/sec Loss 3.2425 LearningRate 0.0000 Epoch: 19 Global Step: 332530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:36,520-Speed 9530.30 samples/sec Loss 3.2136 LearningRate 0.0000 Epoch: 19 Global Step: 332540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:37,700-Speed 8686.42 samples/sec Loss 3.3362 LearningRate 0.0000 Epoch: 19 Global Step: 332550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:38,799-Speed 9321.79 samples/sec Loss 3.2810 LearningRate 0.0000 Epoch: 19 Global Step: 332560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:39,936-Speed 9006.51 samples/sec Loss 3.3484 LearningRate 0.0000 Epoch: 19 Global Step: 332570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:41,070-Speed 9040.28 samples/sec Loss 3.2247 LearningRate 0.0000 Epoch: 19 Global Step: 332580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:42,216-Speed 8936.77 samples/sec Loss 3.2334 LearningRate 0.0000 Epoch: 19 Global Step: 332590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:43,354-Speed 9009.19 samples/sec Loss 3.3362 LearningRate 0.0000 Epoch: 19 Global Step: 332600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:02:44,455-Speed 9306.00 samples/sec Loss 3.3198 LearningRate 0.0000 Epoch: 19 Global Step: 332610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:02:45,549-Speed 9363.66 samples/sec Loss 3.3067 LearningRate 0.0000 Epoch: 19 Global Step: 332620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:02:46,715-Speed 8789.29 samples/sec Loss 3.2763 LearningRate 0.0000 Epoch: 19 Global Step: 332630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:02:47,829-Speed 9199.89 samples/sec Loss 3.2309 LearningRate 0.0000 Epoch: 19 Global Step: 332640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:02:48,986-Speed 8856.67 samples/sec Loss 3.3417 LearningRate 0.0000 Epoch: 19 Global Step: 332650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:02:50,077-Speed 9393.68 samples/sec Loss 3.3619 LearningRate 0.0000 Epoch: 19 Global Step: 332660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:02:51,211-Speed 9033.80 samples/sec Loss 3.2785 LearningRate 0.0000 Epoch: 19 Global Step: 332670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:02:52,323-Speed 9216.28 samples/sec Loss 3.3413 LearningRate 0.0000 Epoch: 19 Global Step: 332680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:02:53,448-Speed 9102.11 samples/sec Loss 3.3829 LearningRate 0.0000 Epoch: 19 Global Step: 332690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:02:54,629-Speed 8678.03 samples/sec Loss 3.2387 LearningRate 0.0000 Epoch: 19 Global Step: 332700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:55,747-Speed 9163.61 samples/sec Loss 3.3294 LearningRate 0.0000 Epoch: 19 Global Step: 332710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:56,874-Speed 9093.23 samples/sec Loss 3.2517 LearningRate 0.0000 Epoch: 19 Global Step: 332720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:57,994-Speed 9143.27 samples/sec Loss 3.2296 LearningRate 0.0000 Epoch: 19 Global Step: 332730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:02:59,119-Speed 9107.66 samples/sec Loss 3.2779 LearningRate 0.0000 Epoch: 19 Global Step: 332740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:00,226-Speed 9254.27 samples/sec Loss 3.3266 LearningRate 0.0000 Epoch: 19 Global Step: 332750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:01,337-Speed 9222.98 samples/sec Loss 3.2672 LearningRate 0.0000 Epoch: 19 Global Step: 332760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:02,474-Speed 9016.50 samples/sec Loss 3.3241 LearningRate 0.0000 Epoch: 19 Global Step: 332770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:03,597-Speed 9123.94 samples/sec Loss 3.2949 LearningRate 0.0000 Epoch: 19 Global Step: 332780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:04,688-Speed 9383.87 samples/sec Loss 3.2754 LearningRate 0.0000 Epoch: 19 Global Step: 332790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:05,795-Speed 9259.98 samples/sec Loss 3.3176 LearningRate 0.0000 Epoch: 19 Global Step: 332800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:06,903-Speed 9245.29 samples/sec Loss 3.3177 LearningRate 0.0000 Epoch: 19 Global Step: 332810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:08,022-Speed 9161.81 samples/sec Loss 3.2522 LearningRate 0.0000 Epoch: 19 Global Step: 332820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:09,141-Speed 9156.49 samples/sec Loss 3.2922 LearningRate 0.0000 Epoch: 19 Global Step: 332830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:10,257-Speed 9176.30 samples/sec Loss 3.2618 LearningRate 0.0000 Epoch: 19 Global Step: 332840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:11,355-Speed 9331.97 samples/sec Loss 3.2822 LearningRate 0.0000 Epoch: 19 Global Step: 332850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:12,482-Speed 9091.97 samples/sec Loss 3.3070 LearningRate 0.0000 Epoch: 19 Global Step: 332860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:13,577-Speed 9360.10 samples/sec Loss 3.3202 LearningRate 0.0000 Epoch: 19 Global Step: 332870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:14,664-Speed 9426.47 samples/sec Loss 3.2435 LearningRate 0.0000 Epoch: 19 Global Step: 332880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:15,770-Speed 9259.38 samples/sec Loss 3.2090 LearningRate 0.0000 Epoch: 19 Global Step: 332890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:16,922-Speed 8898.60 samples/sec Loss 3.3221 LearningRate 0.0000 Epoch: 19 Global Step: 332900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:18,054-Speed 9054.05 samples/sec Loss 3.2999 LearningRate 0.0000 Epoch: 19 Global Step: 332910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:19,171-Speed 9166.67 samples/sec Loss 3.2145 LearningRate 0.0000 Epoch: 19 Global Step: 332920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:20,281-Speed 9240.62 samples/sec Loss 3.3263 LearningRate 0.0000 Epoch: 19 Global Step: 332930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:21,372-Speed 9396.49 samples/sec Loss 3.3485 LearningRate 0.0000 Epoch: 19 Global Step: 332940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:22,516-Speed 8951.95 samples/sec Loss 3.2957 LearningRate 0.0000 Epoch: 19 Global Step: 332950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:23,636-Speed 9150.53 samples/sec Loss 3.2737 LearningRate 0.0000 Epoch: 19 Global Step: 332960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:24,736-Speed 9314.14 samples/sec Loss 3.2977 LearningRate 0.0000 Epoch: 19 Global Step: 332970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:25,840-Speed 9284.85 samples/sec Loss 3.2908 LearningRate 0.0000 Epoch: 19 Global Step: 332980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:26,956-Speed 9181.43 samples/sec Loss 3.2743 LearningRate 0.0000 Epoch: 19 Global Step: 332990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:28,067-Speed 9217.35 samples/sec Loss 3.2534 LearningRate 0.0000 Epoch: 19 Global Step: 333000 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:03:29,198-Speed 9056.57 samples/sec Loss 3.3077 LearningRate 0.0000 Epoch: 19 Global Step: 333010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:30,294-Speed 9347.40 samples/sec Loss 3.2955 LearningRate 0.0000 Epoch: 19 Global Step: 333020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:31,398-Speed 9283.64 samples/sec Loss 3.2495 LearningRate 0.0000 Epoch: 19 Global Step: 333030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:32,543-Speed 8952.85 samples/sec Loss 3.3486 LearningRate 0.0000 Epoch: 19 Global Step: 333040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:33,665-Speed 9132.68 samples/sec Loss 3.3443 LearningRate 0.0000 Epoch: 19 Global Step: 333050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:34,770-Speed 9268.18 samples/sec Loss 3.3299 LearningRate 0.0000 Epoch: 19 Global Step: 333060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:35,879-Speed 9240.24 samples/sec Loss 3.3149 LearningRate 0.0000 Epoch: 19 Global Step: 333070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:36,994-Speed 9189.17 samples/sec Loss 3.2992 LearningRate 0.0000 Epoch: 19 Global Step: 333080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:38,118-Speed 9114.24 samples/sec Loss 3.2788 LearningRate 0.0000 Epoch: 19 Global Step: 333090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:39,237-Speed 9159.67 samples/sec Loss 3.2914 LearningRate 0.0000 Epoch: 19 Global Step: 333100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:40,369-Speed 9052.93 samples/sec Loss 3.2200 LearningRate 0.0000 Epoch: 19 Global Step: 333110 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:03:41,473-Speed 9277.23 samples/sec Loss 3.2511 LearningRate 0.0000 Epoch: 19 Global Step: 333120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:42,594-Speed 9140.19 samples/sec Loss 3.2673 LearningRate 0.0000 Epoch: 19 Global Step: 333130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:43,771-Speed 8705.69 samples/sec Loss 3.3193 LearningRate 0.0000 Epoch: 19 Global Step: 333140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:44,848-Speed 9521.72 samples/sec Loss 3.2532 LearningRate 0.0000 Epoch: 19 Global Step: 333150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:45,954-Speed 9263.67 samples/sec Loss 3.3166 LearningRate 0.0000 Epoch: 19 Global Step: 333160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:47,101-Speed 8932.70 samples/sec Loss 3.3154 LearningRate 0.0000 Epoch: 19 Global Step: 333170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:48,215-Speed 9196.33 samples/sec Loss 3.3406 LearningRate 0.0000 Epoch: 19 Global Step: 333180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:49,313-Speed 9328.58 samples/sec Loss 3.2252 LearningRate 0.0000 Epoch: 19 Global Step: 333190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:50,457-Speed 8960.06 samples/sec Loss 3.2860 LearningRate 0.0000 Epoch: 19 Global Step: 333200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:51,618-Speed 8821.04 samples/sec Loss 3.3256 LearningRate 0.0000 Epoch: 19 Global Step: 333210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:52,743-Speed 9111.43 samples/sec Loss 3.3692 LearningRate 0.0000 Epoch: 19 Global Step: 333220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:53,889-Speed 8940.65 samples/sec Loss 3.3723 LearningRate 0.0000 Epoch: 19 Global Step: 333230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:55,027-Speed 9000.67 samples/sec Loss 3.3203 LearningRate 0.0000 Epoch: 19 Global Step: 333240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:56,131-Speed 9282.29 samples/sec Loss 3.3050 LearningRate 0.0000 Epoch: 19 Global Step: 333250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:57,183-Speed 9738.08 samples/sec Loss 3.2314 LearningRate 0.0000 Epoch: 19 Global Step: 333260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:58,277-Speed 9361.40 samples/sec Loss 3.2804 LearningRate 0.0000 Epoch: 19 Global Step: 333270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:03:59,403-Speed 9104.41 samples/sec Loss 3.2841 LearningRate 0.0000 Epoch: 19 Global Step: 333280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:00,544-Speed 8978.32 samples/sec Loss 3.2884 LearningRate 0.0000 Epoch: 19 Global Step: 333290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:01,680-Speed 9017.31 samples/sec Loss 3.2915 LearningRate 0.0000 Epoch: 19 Global Step: 333300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:02,813-Speed 9044.78 samples/sec Loss 3.3368 LearningRate 0.0000 Epoch: 19 Global Step: 333310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:03,936-Speed 9126.99 samples/sec Loss 3.2673 LearningRate 0.0000 Epoch: 19 Global Step: 333320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:04:05,070-Speed 9034.74 samples/sec Loss 3.3377 LearningRate 0.0000 Epoch: 19 Global Step: 333330 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:04:06,143-Speed 9546.94 samples/sec Loss 3.2699 LearningRate 0.0000 Epoch: 19 Global Step: 333340 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:04:07,275-Speed 9053.86 samples/sec Loss 3.3366 LearningRate 0.0000 Epoch: 19 Global Step: 333350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:04:08,345-Speed 9570.13 samples/sec Loss 3.2887 LearningRate 0.0000 Epoch: 19 Global Step: 333360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:09,461-Speed 9183.26 samples/sec Loss 3.2461 LearningRate 0.0000 Epoch: 19 Global Step: 333370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:10,556-Speed 9356.61 samples/sec Loss 3.2043 LearningRate 0.0000 Epoch: 19 Global Step: 333380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:11,650-Speed 9369.74 samples/sec Loss 3.2621 LearningRate 0.0000 Epoch: 19 Global Step: 333390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:12,764-Speed 9198.65 samples/sec Loss 3.3158 LearningRate 0.0000 Epoch: 19 Global Step: 333400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:13,924-Speed 8830.33 samples/sec Loss 3.3329 LearningRate 0.0000 Epoch: 19 Global Step: 333410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:15,057-Speed 9041.48 samples/sec Loss 3.2689 LearningRate 0.0000 Epoch: 19 Global Step: 333420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:16,183-Speed 9102.90 samples/sec Loss 3.3077 LearningRate 0.0000 Epoch: 19 Global Step: 333430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:17,244-Speed 9655.73 samples/sec Loss 3.3684 LearningRate 0.0000 Epoch: 19 Global Step: 333440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:18,387-Speed 8964.92 samples/sec Loss 3.2889 LearningRate 0.0000 Epoch: 19 Global Step: 333450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:19,498-Speed 9221.09 samples/sec Loss 3.2848 LearningRate 0.0000 Epoch: 19 Global Step: 333460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:20,611-Speed 9211.95 samples/sec Loss 3.3271 LearningRate 0.0000 Epoch: 19 Global Step: 333470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:21,727-Speed 9184.47 samples/sec Loss 3.3543 LearningRate 0.0000 Epoch: 19 Global Step: 333480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:22,831-Speed 9273.34 samples/sec Loss 3.3069 LearningRate 0.0000 Epoch: 19 Global Step: 333490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:23,993-Speed 8817.53 samples/sec Loss 3.2281 LearningRate 0.0000 Epoch: 19 Global Step: 333500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:25,138-Speed 8950.32 samples/sec Loss 3.3200 LearningRate 0.0000 Epoch: 19 Global Step: 333510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:26,246-Speed 9247.70 samples/sec Loss 3.3886 LearningRate 0.0000 Epoch: 19 Global Step: 333520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:27,374-Speed 9088.87 samples/sec Loss 3.3948 LearningRate 0.0000 Epoch: 19 Global Step: 333530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:28,498-Speed 9115.83 samples/sec Loss 3.3219 LearningRate 0.0000 Epoch: 19 Global Step: 333540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:29,605-Speed 9255.00 samples/sec Loss 3.3117 LearningRate 0.0000 Epoch: 19 Global Step: 333550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:30,753-Speed 8920.91 samples/sec Loss 3.3055 LearningRate 0.0000 Epoch: 19 Global Step: 333560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:04:31,872-Speed 9157.95 samples/sec Loss 3.3396 LearningRate 0.0000 Epoch: 19 Global Step: 333570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-12 01:04:33,002-Speed 9069.47 samples/sec Loss 3.2137 LearningRate 0.0000 Epoch: 19 Global Step: 333580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:34,108-Speed 9261.34 samples/sec Loss 3.2862 LearningRate 0.0000 Epoch: 19 Global Step: 333590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:35,207-Speed 9328.72 samples/sec Loss 3.3368 LearningRate 0.0000 Epoch: 19 Global Step: 333600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:36,335-Speed 9075.87 samples/sec Loss 3.3406 LearningRate 0.0000 Epoch: 19 Global Step: 333610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:37,453-Speed 9171.66 samples/sec Loss 3.3134 LearningRate 0.0000 Epoch: 19 Global Step: 333620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:38,617-Speed 8797.32 samples/sec Loss 3.3383 LearningRate 0.0000 Epoch: 19 Global Step: 333630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:39,700-Speed 9463.57 samples/sec Loss 3.3868 LearningRate 0.0000 Epoch: 19 Global Step: 333640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:40,816-Speed 9184.25 samples/sec Loss 3.2512 LearningRate 0.0000 Epoch: 19 Global Step: 333650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:41,955-Speed 8989.89 samples/sec Loss 3.2848 LearningRate 0.0000 Epoch: 19 Global Step: 333660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:43,122-Speed 8788.78 samples/sec Loss 3.2724 LearningRate 0.0000 Epoch: 19 Global Step: 333670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:44,317-Speed 8575.12 samples/sec Loss 3.2629 LearningRate 0.0000 Epoch: 19 Global Step: 333680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:45,422-Speed 9271.60 samples/sec Loss 3.2388 LearningRate 0.0000 Epoch: 19 Global Step: 333690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:46,523-Speed 9309.43 samples/sec Loss 3.2668 LearningRate 0.0000 Epoch: 19 Global Step: 333700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:47,655-Speed 9050.75 samples/sec Loss 3.2845 LearningRate 0.0000 Epoch: 19 Global Step: 333710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:48,793-Speed 9002.44 samples/sec Loss 3.3189 LearningRate 0.0000 Epoch: 19 Global Step: 333720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:49,912-Speed 9162.52 samples/sec Loss 3.3094 LearningRate 0.0000 Epoch: 19 Global Step: 333730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:51,000-Speed 9421.08 samples/sec Loss 3.3412 LearningRate 0.0000 Epoch: 19 Global Step: 333740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:52,134-Speed 9033.78 samples/sec Loss 3.2917 LearningRate 0.0000 Epoch: 19 Global Step: 333750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:53,239-Speed 9268.22 samples/sec Loss 3.2281 LearningRate 0.0000 Epoch: 19 Global Step: 333760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:54,346-Speed 9258.97 samples/sec Loss 3.3730 LearningRate 0.0000 Epoch: 19 Global Step: 333770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:55,460-Speed 9198.14 samples/sec Loss 3.2675 LearningRate 0.0000 Epoch: 19 Global Step: 333780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:56,617-Speed 8851.07 samples/sec Loss 3.3304 LearningRate 0.0000 Epoch: 19 Global Step: 333790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-12 01:04:57,693-Speed 9530.24 samples/sec Loss 3.3148 LearningRate 0.0000 Epoch: 19 Global Step: 333800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:04:59,087-Speed 7350.47 samples/sec Loss 3.2615 LearningRate 0.0000 Epoch: 19 Global Step: 333810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-12 01:05:00,121-Speed 9903.82 samples/sec Loss 3.2785 LearningRate 0.0000 Epoch: 19 Global Step: 333820 Fp16 Grad Scale: 32768 Required: -0 hours